Machine Learning Engineer, Global Public Sector

Scale AI · Doha, Qatar; London, UK

On-site

RAGMulti-agentGPULatencyPythonGoEmbeddingsRerankingTool useOrchestrationEval harnessesLLM-as-judgeDeep learningNLPComputer vision

Scale’s mission is to develop reliable AI systems for the world's most important decisions. Our core work consists of: Creating custom AI applications that will impact millions of citizens Generating high-quality training data for national LLMs Upskilling and advisory services to spread the impact of AI Scale is hiring ML Research Engineers to bridge the gap between emerging AI capabilities and mission-critical, real-world impact. In our Global Public Sector (GPS) division, we don’t just implement tools; we conduct applied research to solve the unique challenges of sovereign AI. Your role is to move beyond off-the-shelf implementations. You will lead the research into Agent Design, Reliability, and AI Safety, developing novel system architectures that power high-stakes government applications. You will be the bridge between a research paper and a production-ready system that functions at scale. The Mission Applied Agent Research: Leading the design of reliable, multi-step agentic systems and long-horizon reasoning frameworks that can solve complex problems for national security and public policy. Systemic Evaluation & Red-Teaming: Developing rigorous benchmarks and evaluation protocols to ensure AI systems are safe, unbiased, and performant in high-stakes, non-commercial environments. Model Optimisation & Selection: Conducting deep-dive research into model performance (both open-weight and closed) to identify the best tools for niche domains, optimising them through context engineering, RAG, and other inference-time techniques. What You Will Do Architect Agentic Systems: Design and build agent architectures, the harnesses, tool-use protocols, and logic flows that allow LLMs to function as reliable, autonomous agents in complex workflows. Drive Reliability & Safety: Research and implement robust evaluation frameworks. This includes red-teaming for sovereign AI requirements and developing strategies to mitigate hallucinations in regulated data environments

Apply on company site →