MLOps Without Burnout
Every MLOps tutorial seems to assume you have a Kubernetes cluster, a team of 5 DevOps engineers, and a cloud budget that would make a startup founder weep. Here's the reality for most of us: a VPS with 4 vCPUs and 8GB RAM, Docker Compose, and the determination to make it work.
I run my MLOps stack in Garut, Indonesia โ not exactly AWS us-east-1. Here's what I've learned about building reliable ML infrastructure without burning out (or burning cash).
The Minimal Viable MLOps Stack
Here's what you actually need โ and what you can skip:
โ Must-Have
- Experiment tracking (MLflow). You WILL forget which hyperparameters produced which results. MLflow is free, runs anywhere, and takes 20 minutes to set up.
-
Containerization (Docker). "It works on my machine"
is the enemy of production ML. Docker Compose handles your entire stack
in one
docker-compose.yml. - Monitoring (Grafana). If you can't see prediction drift, you don't know your model is degrading. Grafana dashboards take an hour to set up and pay back immediately.
- Model versioning. MLflow Model Registry handles this. Every model in production should have a version, a stage (staging/production/archived), and a clear lineage back to the training run.
โ Can Skip (For Now)
- Kubernetes โ Docker Compose is sufficient for single-node deployments
- Feature stores (Feast, Tecton) โ a PostgreSQL table with versioned feature sets works fine initially
- CI/CD pipelines for ML โ manual retraining with MLflow API is OK when you're the only engineer
- Data versioning (DVC) โ Git LFS or even dated data folders work for small-to-medium datasets
- Model serving frameworks (TorchServe, Triton) โ a FastAPI wrapper around your model is simpler and more debuggable
My docker-compose.yml
# The stack that runs on a $20/month VPS
version: '3.8'
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.12.0
ports: ["5000:5000"]
command: mlflow server --host 0.0.0.0 --backend-store-uri postgresql://...
restart: unless-stopped
grafana:
image: grafana/grafana:10.4.0
ports: ["3000:3000"]
volumes: ["./grafana/dashboards:/etc/grafana/provisioning/dashboards"]
restart: unless-stopped
prediction-api:
build: ./api
ports: ["8000:8000"]
depends_on: [mlflow]
restart: unless-stopped
postgres:
image: postgres:16
environment:
POSTGRES_DB: mlops
POSTGRES_USER: mlflow
volumes: ["pgdata:/var/lib/postgresql/data"]
restart: unless-stopped
volumes:
pgdata:
Principles for Solo MLOps
1. Prefer Boring Technology
PostgreSQL over specialized time-series DBs. FastAPI over TensorFlow Serving. Docker Compose over Kubernetes. Every technology choice should be defensible with: "I can debug this at 2 AM when it breaks." If you can't, it's too complex.
2. Monitoring Is Non-Negotiable
The minimum viable monitoring:
- Prediction distribution over time โ a histogram that updates daily. If the shape changes, your data changed.
- Feature drift (PSI) โ Population Stability Index for top 10 features. Alert if PSI > 0.2.
- Model latency p50/p95/p99 โ because a slow model is a broken model.
- Error rate โ prediction failures, timeouts, NaN outputs.
All of this fits in a single Grafana dashboard. Set it up once, glance at it daily.
3. Automate Retraining Triggers, Not Schedules
Retraining on a fixed schedule (every Monday!) is wasteful and often misses the moment when retraining is actually needed. Instead, trigger retraining when drift metrics cross thresholds. This means retraining happens when it matters โ not when the calendar says so.
4. Document Architecture Decisions
When you're a solo engineer, there's no one to ask "why did we choose XGBoost
over LightGBM?" Six months from now, you'll be that person. Write it down.
A simple DECISIONS.md in your repo saves future-you hours of
archaeology.
What I'd Add With More Resources
- A proper feature store when feature engineering becomes the bottleneck
- Shadow deployment for zero-risk model updates
- A/B testing infrastructure for model comparison in production
- Automated hyperparameter tuning (Optuna) integrated with MLflow
But none of these are blockers. You can ship reliable ML systems today with the stack described above. I do it every day โ from Garut, on a VPS that costs less than a dinner out.
The Bottom Line
MLOps doesn't require a PhD in infrastructure. Start with experiment tracking and monitoring. Add complexity only when the current setup actually hurts. Most ML projects don't fail because of inadequate infrastructure โ they fail because nobody knows if the model is still working.