Member of Technical Staff, Performance Optimization
Fireworks AI · San Mateo
THE ROLE: We're looking for a Software Engineer focused on Performance Optimization to help push the boundaries of speed and efficiency across our AI infrastructure. In this role, you'll take ownership of optimizing performance at every layer of the stack—from low-level GPU kernels to large-scale distributed systems. A key focus will be maximizing the performance of our most demanding workloads, including large language models (LLMs), vision-language models (VLMs), and next-generation video models. You’ll work closely with teams across research, infrastructure, and systems to identify performance bottlenecks, implement cutting-edge optimizations, and scale our AI systems to meet the demands of real-world production use cases. Your work will directly impact the speed, scalability, and cost-effectiveness of some of the most advanced generative AI models in the world. KEY RESPONSIBILITIES: - Optimize system and GPU performance for high-throughput AI workloads across training and inference - Analyze and improve latency, throughput, memory usage, and compute efficiency - Profile system performance to detect and resolve GPU- and kernel-level bottlenecks - Implement low-level optimizations using CUDA, Triton, and other performance tooling - Drive improvements in execution speed and resource utilization for large-scale model workloads (LLMs, VLMs, and video models) - Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency - Improve support for mixed precision, quantization, and model graph optimization - Build and maintain performance benchmarking and monitoring infrastructure - Scale inference and training systems across multi-GPU, multi-node environments - Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes MINIMUM QUALIFICATIONS: - Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience - 5+