Machine Learning Infrastructure Engineer- Model Inference

Abridge · SF Office

$221k–260k/yr Hybrid

vLLMOrchestrationGPUTritonLatencyThroughputPyTorchTensorFlowKubernetesDockerTensorRTTool use

ABOUT ABRIDGE Abridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients. Our enterprise-grade technology transforms patient-clinician conversations into structured clinical notes in real-time, with deep EMR integrations. Powered by Linked Evidence and our purpose-built, auditable AI, we are the only company that maps AI-generated summaries to ground truth, helping providers quickly trust and verify the output. As pioneers in generative AI for healthcare, we are setting the industry standards for the responsible deployment of AI across health systems. We are a growing team of practicing MDs, AI scientists, PhDs, creatives, technologists, and engineers working together to empower people and make care make more sense. We have offices located in the Mission District in San Francisco, the SoHo neighborhood of New York, and East Liberty in Pittsburgh. THE ROLE As an ML Infrastructure Engineer, Model Inference at Abridge, you’ll play a pivotal role in building and optimizing the core inference infrastructure that powers our machine learning models. Your work will be instrumental in enhancing the scalability, efficiency, and performance of our AI-driven solutions. You will work with our Infrastructure and Research teams to build, deploy, optimize and orchestrate across our AI models. What You'll Do - Design, deploy and maintain scalable Kubernetes clusters for AI model inference and training - Develop, optimize, and maintain ML model serving infrastructure, ensuring high-performance and low-latency. - Collaborate with ML and product teams to scale backend infrastructure for AI-driven products, focusing on model deployment, throughput optimization, and compute efficiency. - Optimize compute-heavy workflows and enhance GPU utilization for ML wo

Apply on company site →