← All jobs

Staff Software Engineer, Inference

CoreWeave · Sunnyvale, CA / Bellevue, WA

On-site Staff
vLLMOrchestrationGPUTritonTensorRTLatencyThroughputKubernetesRayPythonGoC++DockerQuantizationDeep learningNLPComputer visionCI/CD

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at  www.coreweave.com . What You’ll Do: Inference Platform Team The Inference team builds and operates CoreWeave’s Kubernetes-native inference platform, powering low-latency, high-throughput AI workloads at massive scale. The team is responsible for request routing, scheduling, GPU resource management, and system-wide optimizations that drive performance, efficiency, and reliability across real-time inference systems. About the role: As a Staff Software Engineer (IC5) on the Inference team, you will act as a technical leader driving architecture, performance, and reliability across multiple services and teams. Your day-to-day will involve leading cross-team design initiatives, optimizing inference performance (latency, throughput, and GPU utilization), and improving system reliability at scale. You will work deeply in distributed systems and Kubernetes-based infrastructure, focusing on areas like scheduling, batching, and memory optimization. This role requires hands-on technical leadership and the ability to influence engineering direction across the organization. Who You Are: 8–12+ years of experience building and operating large-scale distributed systems or cloud platforms Proven experience leading cross-team technical initiatives impacting multiple services or organizations Strong programming skills in Go, Python, or C++ Deep expertise in Kubernetes at production scale, including orchestration, scheduling, and service design Strong understanding of distribut

Apply on company site →