About the role
Shopify is the commerce platform that powers millions of merchants worldwide. Behind the product experience are ML systems that drive recommendations, search, and personalization at massive scale.
We build and maintain the operational backbone behind these systems: deployment pipelines, evaluation frameworks, data preprocessing, and the monitoring that keeps models fresh and reliable in production. Our models serve hundreds of millions of buyers, and the pipelines we build directly impact how quickly and safely we can improve merchant outcomes.
The Role
You will own the operational lifecycle of our ML systems: deployment pipelines, evaluation frameworks, data pipelines, and the monitoring and reliability layer that keeps everything running in production. You'll ensure models go from training to production safely, that we can evaluate changes rigorously, and that the data feeding our models is fresh and correct.
This role is the connective tissue between research and production. You'll build the systems that let engineers ship model improvements with confidence and speed, while maintaining the reliability standards required to serve hundreds of millions of buyers - including during peak events like Black Friday/Cyber Monday.
This role carries real technical authority. You'll set the standards for how models get deployed and evaluated, mentor engineers on operational best practices, and drive alignment on reliability and pipeline strategy across the team. You'll influence technical direction beyond your immediate team and raise the engineering bar through hiring and technical reviews.
What You'll Do
Deployment & Rollout
Own the model deployment pipeline end to end: export, validation, canary rollout, rollback, and A/B integration
Build and maintain CI/CD for ML: automated testing, model validation gates, and progressive delivery
Ensure safe, repeatable deployments with clear rollback paths and minimal manual intervention
Evaluation & Experimentation
Build automated offline evaluation pipelines against production baselines
Extend our experimentation framework so ML Engineers can launch and evaluate model changes with minimal friction
Maintain evaluation datasets and ensure data freshness and correctness
Integrate offline metrics with online A/B testing to close the feedback loop
Data Pipelines
Own data preprocessing for training: interaction histories, feature stores, and embedding tables
Manage workflow orchestration (Airflow or equivalent) for scheduled retraining and data refresh. You trigger and monitor training runs; the underlying GPU compute layer is owned by the infrastructure side of the team.
Ensure data quality, lineage tracking, and pipeline idempotency
Own data correctness and freshness; partner with infrastructure engineers on data loading throughput and efficiency
Monitoring & Reliability
Build monitoring and alerting across training jobs, serving endpoints, and data pipelines
Define and maintain SLOs for model freshness, serving latency, and training throughput
Participate in incident response and drive post-mortems for ML system failures
Identify and eliminate toil through automation
Technical Leadership
Drive cross-team technical strategy for ML operations - identify systemic reliability risks and pipeline bottlenecks before they become incidents
Mentor and up-level engineers on the team through pairing, design reviews, and setting operational standards
Contribute to hiring: screen candidates, conduct technical interviews, and calibrate the engineering bar
Write technical proposals and RFCs that shape operational direction across the organization
What We're Looking For
Required
7+ years in software engineering, with 5+ years focused on MLOps, data engineering, or production ML systems
Strong experience with ML deployment pipelines: model export, validation, canary releases, and rollback strategies
Experience with workflow orchestration for ML (Airflow, Dagster, Prefect, or similar)
Solid Python fundamentals; comfortable working with PyTorch model artifacts and training configurations
Production monitoring experience: you've built or operated alerting, dashboards, and SLO frameworks for ML systems
Experience with data pipelines at scale: batch processing, feature engineering, and data quality validation
Working proficiency with Kubernetes: able to debug pod failures, understand resource scheduling, and navigate GPU workloads
Demonstrated technical leadership: you've driven operational strategy, written technical proposals, and influenced engineering direction beyond your immediate team
Track record of mentoring engineers and raising the reliability bar on a team
Preferred
Experience with large-scale data warehouses (BigQuery or equivalent) for offline evaluation and metrics
Hands-on with experiment tracking and A/B testing frameworks
Experience operating recommendation or retrieval systems at scale
Familiarity with model compression workflows in production (quantization, pruning, distillation)
Experience with cloud-native ML orchestration (SkyPilot, Ray, or similar)
How We Work
You'll pair directly with ML Engineers. Understanding their models well enough to build the right operational workflows is part of the job.
We prefer automation over runbooks. If a process can be scripted, it should be.
On-call is shared. When you're on rotation, your scope is pipeline failures, data freshness alerts, deployment rollbacks, and evaluation integrity - you own it end to end.
You'll dig into Airflow DAG failures, data drift alerts, and deployment validation issues. This is a deeply operational role with high production stakes.
Research and production are the same codebase. You'll see your operational decisions reflected in real model quality and real merchant outcomes.
Shopify operates on high trust and low process. You'll have real ownership and the autonomy to make decisions, not just execute tickets.
What Success Looks Like
In 3 months: You've onboarded to deployment and evaluation pipelines, shipped at least one meaningful improvement to deployment safety or developer experience, and can independently debug issues across the operational stack.
In 6 months: You own a major subsystem (deployment pipeline, evaluation framework, or data pipelines). Researchers are shipping model changes faster or more safely because of improvements you've made.
In 12 months: You've shaped the operational roadmap for ML systems and influenced engineering direction beyond the team. Deployments are faster and safer, evaluation is more rigorous, and the team trusts the pipelines you've built. Other engineers across the organization come to you for guidance on ML operational best practices. You've made the team stronger through hiring and mentorship.