← Toutes les offres
Z

Software Engineer - Site Reliability Engineering

Zoox

Foster City, Ca
CDI
Hybride
Publié le

Description du poste

Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. In this role, you will own the full lifecycle of our services—from designing fault-tolerant, maintainable systems to deploying, operating, and continuously improving them in production. As a robotics company, Zoox embraces automation at every layer of our infrastructure, and you’ll help drive that ethos forward. You’ll work hands-on with systems that process massive volumes of data and support compute-intensive pipelines running on both CPUs and GPUs.  Zoox is seeking a Site Reliability Engineer to help ensure the availability, performance, and resilience of the services that power the development and operation of our autonomous vehicles. In this role, you will own the full lifecycle of our services—from designing fault-tolerant, maintainable systems to deploying, operating, and continuously improving them in production. As a robotics company, Zoox embraces automation at every layer of our infrastructure, and you’ll help drive that ethos forward. You’ll work hands-on with systems that process massive volumes of data and support compute-intensive pipelines running on both CPUs and GPUs. Architect and optimize scalable systems: You will design, implement, and continuously improve highly reliable infrastructure, directly impacting the success and safety of Zoox's autonomous vehicle platform. Build proactive monitoring solutions: You will develop advanced monitoring, alerting, and reporting tools to ensure potential issues are identified and resolved before they affect production. Collaborate across engineering: You will partner closely with software engineering teams to elevate our system architecture, streamline deployment processes, and drive automation initiatives. Lead incident resolution: You will conduct thorough root cause analyses on production issues and rapidly deploy