Principal Engineer, AI Inference Reliability
Cerebras · US and Canada Offices
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership https://openai.com/index/cerebras-partnership/ with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. About the role We’re looking for a hands-on Reliability Tech Lead (IC) to own the mission of making Cerebras Inference the most reliable AI service in the world. You will drive reliability strategy and execution across our inference stack, from client SDKs and public-cloud multi-region deployments to wafer-scale systems in specialized data centers. In this role, you will define SLOs and incident-response frameworks, design and implement reliability mechanisms at scale, and partner across hundreds of engineers to ensure our service meets world-class reliability standards. If you are passionate about building and operating massive-scale, low-latency, high-reliability distributed systems, we want to hear from you. Responsibilities: - Define and drive reliability strategy: establish SLOs and ensure alignment across engineering. - Design and implement reliability mechanisms: build and evolve systems for fault detection, graceful degradation, failover, throttling, and recovery across multiple regions and data centers. - Lead large-scale incident management: own postmortems, root-cause analysis, and prevention loops for reliability-related incidents. - Architect for reliability and observability: influence system desig