← All jobs

Software Engineer, RL Data

Anthropic · London, UK; San Francisco, CA | New York City, NY

On-site
MCPKubernetesDockerPythonTypeScriptAnthropic APIEmbeddingsRerankingTool useOrchestrationEval harnessesLLM-as-judgeObservabilityAWSGCPAzureCI/CDGPUPyTorchDeep learningRLHFDPO

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role This is a senior, foundational role on a new team: you'll make architecture decisions the rest of the team builds on, and help shape what we build first. The work is hands-on and varied. Some weeks you'll be deep in pipeline or infrastructure engineering; others you'll be tuning prompts until the output is good, or sitting with a research team that depends on your systems and shipping the fixes they need. We're looking for experienced engineers who own outcomes end-to-end — down to reading transcripts, supporting users, and wrangling vendors. Anthropic's RL Data team builds the systems that produce high-quality reinforcement learning data for Claude: data collection pipelines, human feedback tooling, the execution environments RL tasks run in, and the quality assurance that keeps training data trustworthy at scale. Our goal is to make Claude great at real work — especially the work that matters most, like AI safety research and beneficial deployments of AI. (To be upfront: this is dual-use work — it advances general capabilities too.) Key responsibilities Own significant parts of our stack end-to-end, from technical architecture through the unglamorous operational work that makes it succeed. Build data collection pipelines, read the transcripts they produce, and iterate on prompts, evals, and graders until the output is good. Develop and improve QA frameworks to catch reward hacking and ensure environment quality. Build interfaces that make collecting human data fast and painless for the people providing it. Harden execution environments — sandboxing, snapshotting, tool coverage — so tasks hold up at training scale. Emb

Apply on company site →