← All jobs

Member of Technical Staff (Software Engineer)

Cerebras · Headquarters/Sunnyvale Office

On-site Staff
OrchestrationGPULatencyKubernetesDockerPythonTypeScriptJavaScriptC++OpenAI APIThroughput

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership https://openai.com/index/cerebras-partnership/ with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. About The Role We are seeking a Software Engineer to develop and maintain high-performance, low-latency inference infrastructure. This role focuses on deploying and optimizing scalable inference services, collaborating with cross-functional teams, and ensuring reliable, production-ready machine learning infrastructure. Responsibilities - Implement infrastructure to support high-performance, low-latency inference service. - Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads. - Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs. - Integrate inference services with containerized environments using Docker and Kubernetes for orchestration. - Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies. - Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks. - Collaborate with machine learning engineers to validate inference accuracy and performance against functional and latency requirements. - Triage and resolve defects in the service by analyzing log

Apply on company site →