Multimodal LLM Researcher (MLLM)

Pika · Palo Alto HQ

$185k–400k/yr On-site

OrchestrationDistillationPyTorchTensorFlowDeep learningPythonComputer visionNLPGPUQuantizationLatencyThroughputMulti-agent

MULTIMODAL LLM RESEARCHER (MLLM) ABOUT THE ROLE At Pika, we are pioneering next-generation creative infrastructure built around real-time, multimodal generation and intelligent, agentic platforms. We are seeking accomplished Multimodal LLM Researchers (LLM, VLM, and Audio LM) to drive forward our mission to make agentic real-time generative technology accessible, dynamic, and transformative for millions of creators. As a core member of our research team, you will be integral to designing and building foundational technologies, developing novel approaches for large multimodal language models (LLMs/VLMs/Audio LMs), and orchestrating intelligent agentic systems that power scalable, interactive multimedia experiences. You will collaborate closely with engineering and product teams, shaping the future of real-time creative platforms. WHAT YOU’LL DO - Lead and contribute to research efforts focused on real-time, multimodal generation—including text, image, video, and audio synthesis—as well as orchestration of agentic platform infrastructure - Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interactive experiences - Focus on real-time aspects of model inference and synthesis across modalities - Work on diffusion model distillation and/or develop diffusion-based world models for multimodal applications - Train and finetune autoregressive and diffusion models in LLM, VLM, or Audio LM contexts with a focus on real-time performance - Curate specific datasets, especially for video, audio, cross-modal, and sensory-rich data - Collaborate with cross-functional teams to bring research advancements into production-ready technologies - Publish work in top-tier conferences and journals; communicate research results internally and externally - Stay at the cutting edge of real-time multimodal generative AI and agentic orchestration WHAT WE’RE LOOKING FOR - 5+ years of relevant experienc

Apply on company site →