Research Engineer/Research Scientist - Personal AGI, North Stars
OpenAI · San Francisco
About the Team The Personal AGI team seeks to empower all of humanity to benefit from frontier intelligence in whatever way they choose. We are responsible for training models to deploy to millions of users globally via ChatGPT, the API, and future products. We aim to evolve ChatGPT from a chatbot to an infinitely capable and personalized superassistant supporting human flourishing. We work on defining, measuring, and improving capabilities across the training stack. Our focus areas include but are not limited to model behavior, personalization, safety, factuality, instruction following, personality, interactivity, multilingual fluency, world interaction, and bringing agents to everyone. We chart the course for what to strive towards. We partner closely with research and product teams across the company ensuring that our models are safe, efficient, and reliable. About the Role You’ll work as a Research Engineer / Scientist on the North Stars team within the broader Personal AGI research org. You will work on bringing the next generation of AI-enabled experiences to all of humanity by closing the capability overhang between power users and the average consumer, including areas like tool-use, feature discovery, connectors, and instruction following. You will think deeply about the current bottlenecks in model behavior, translate these insights into robust evals, training data, reward signals, and model and harness improvements. We're looking for individuals with strong ML engineering skills and research experience passionate about creative, product-driven research. This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees. In this role, you will: - Own and pursue a research agenda to improve model capability and performance. - Collaborate closely with the other research and product teams, allowing customers to optimize their own models. - Build robust evaluations