Research, Post-Training

Cognition · San Francisco

On-site

RLHFGoDeep learningPythonPyTorchTensorFlowEval harnessesLLM-as-judgeObservabilityGPUTensorRTLatencyThroughput

WE ARE AN APPLIED AI LAB BUILDING END-TO-END SOFTWARE AGENTS. We're the makers of Devin, the first AI software engineer, and Windsurf, the AI-native IDE. Together, they represent our vision for collaborative AI teammates that enable engineers to focus on more interesting problems and empower teams to strive for more ambitious goals. Our team is small and talent-dense. Among our founding team, we have world-class competitive programmers, former founders, and leaders from companies at the cutting edge of AI including Scale AI, Palantir, Cursor, Waymo, Tesla, Lunchclub, Modal, Google DeepMind, and Nuro. Building Devin is just the first step—our hardest challenges still lie ahead. If you’re excited to solve some of the world’s biggest problems and build AI that can reason on real-world tasks, apply to join us. ROLE MISSION Post-training is the critical bridge between raw model capability and a system that is actually useful, safe, and effective in the real world. You will shape how our agents learn by iterating on training recipes, evaluations, and alignment methods that directly determine what Devin and our future systems can do. This role blends deep research and hands-on engineering. We don't distinguish between the two. WHAT YOU'LL ACCOMPLISH - Post-Training Recipe Development: Iterate on the full stack of datasets, training stages, and hyperparameters that determine model behavior. Measure how choices compound across evals and production performance, not just isolated benchmarks. - Evaluation Design and Integrity: Build evals that actually capture what matters. The loop never ends: define, optimize, realize the gaps, and rebuild. You'll be responsible for making numbers go up and making sure the numbers mean something. - Deep Understanding: When training produces results that don't make sense, you dig until you understand why. The goal isn't just to fix it; it's to carry that understanding forward to the next problem. - Alignment and Agent Behavio

Apply on company site →