← All jobs

Engineering Manager, Agent Orchestration

Decagon · San Francisco

$280k–430k/yr On-site Manager
Tool useOrchestrationObservabilityLatencyEval harnessesLLM-as-judgeThroughputGPUPythonTypeScriptKubernetesDockerCI/CDAWSGCPAzureMulti-agentMCP

About Decagon Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences. Our technology enables industry-defining enterprises like Avis Budget Group, Block’s Cash App and Square, Chime, Oura Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying interactions across voice, chat, email, SMS, and every other channel. We’re building a future where customer experiences are being redefined from support tickets and hold music to faster resolutions, richer conversations, and deeper relationships. We’re proud to be backed by world-class investors who share that vision, including a16z, Accel, Bain Capital Ventures, Coatue, and Index Ventures, along with many others. We’re an in-office company, driven by a shared commitment to excellence and velocity. Our values — Just Get It Done, Invent What Customers Want, Winner’s Mindset, and The Polymath Principle — shape how we work and grow as a team. About the Team The Agent Orchestration team builds the runtime and model orchestration layer that powers Decagon’s agents in production. This is the orchestration layer that turns workflows, tools and guardrails into a reliable, low-latency, and delightful experience for end users. At the core of this work is the agent harness: the routing, execution logic, tool orchestration, and control-plane systems that determine how an agent behaves in a live conversation. The team owns the full execution lifecycle of each conversation—from selecting workflows and orchestrating multiple models (e.g., router/planner/supervisor patterns), to coordinating tool calls, enforcing safety constraints, and communicating back to the user. The team operates across both real-time systems (e.g., voice interactions with strict latency requirements) and longer-horizon execution (supporting more complex reasoning and workflows). Our research shows that an agent’s task execution reliability increasingly depends on th

Apply on company site →