Software Engineer, Model Performance Systems

Baseten · San Francisco

$160k–200k/yr Hybrid

OrchestrationObservabilityGPUQuantizationLatencyThroughputPyTorchDeep learningCI/CDPythonC++DockerKubernetesTensorRTTensorFlowComputer visionNLPEmbeddingsRAGRerankingMCPLangChainLlamaIndexDSPyvLLMHugging FaceOpenAI APIAnthropic APIPineconeWeaviatepgvectorQdrantMilvusTritonRayKubeflowSageMakerVertex AIAzure MLAWSAzureGCPTypeScriptJavaScriptGoRustTool useMulti-agentEval harnessesLLM-as-judgeLangSmithArizeLoRAPEFTRLHFDPODistillation

ABOUT BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $1.5B Series F https://www.baseten.co/blog/announcing-our-series-f/, led by Altimeter Capital, Conviction Partners, and Spark Capital. Join us and help build the platform engineers turn to to ship AI products. THE OPPORTUNITY We are looking for early-career Software Engineers to join our team. This is a specialized role sitting at the intersection of high-performance computing (HPC) and Large Language Model (LLM) engineering. You will be responsible for building the automated "speedometer and diagnostic" suite for our next-generation AI infrastructure. In this role, you won’t just be using models; you will be tearing them apart to see how they run on the metal. You will build tools that measure GPU FLOPS, stress-test InfiniBand clusters, and define the benchmarks that ensure our systems are production-ready. RESPONSIBILITIES - Performance Benchmarking: Run and automate standard LLM quality benchmarks (GSM8K, MMLU) alongside custom performance suites for specific workloads (e.g., long-context window, KV cache reuse). - Infrastructure Validation: Create automated acceptance tests for new GPU clusters across x86 and ARM systems, measuring GPU memory bandwidth, networking throughput, and multi-node networking performance. - Model Dev Experience: Develop and maintain internal GPU-enabled development environments (similar to GitHub Codespaces). You will ensure the team has seamless, high-performance "dev machines" optimized for model experimentation. - Tool Development: Build and contribute to tools such as InferenceMAX and genai-bench to automate model evaluatio

Apply on company site →