Indonesian Stock MLOps Pipeline

End-to-end ML pipeline with experiment tracking, monitoring, and automated retraining for Indonesian stock prediction.

๐Ÿ”’ NDA ยท Simplified Demo
MLflow Grafana Docker Scikit-learn Pandas Time Series Python 2023โ€“2024

The Problem

The Indonesia Stock Exchange (IDX) presents unique challenges for ML models: high volatility, market-microstructure effects, and regulatory calendar impacts that Western-trained models don't account for. A client needed a prediction pipeline generating BUY/SELL signals for 45 IDXBLUE blue-chip stocks โ€” not just accurate but monitorable, retrainable, and auditable.

The key requirement: the pipeline had to run without manual intervention โ€” automated data ingestion, training, evaluation, and deployment with full observability. The public repo is a simplified demonstration of the architecture; the production version served live trading decisions under NDA.

The Architecture

graph LR A[IDX Data Sources] --> B[Data Ingestion] B --> C[Feature Engineering] C --> D[MLflow Tracking] D --> E{Model Registry} E -->|Staging| F[Docker Deploy] E -->|Production| G[Prediction API] F --> H[Grafana Monitoring] G --> H H -->|Drift Detected| I[Automated Retrain] I --> D style D fill:#6c5ce7,stroke:#7c6df0,color:#fff style H fill:#00d2ff,stroke:#00b8e6,color:#0a0a0f style E fill:#f7c948,stroke:#e0b830,color:#0a0a0f

Why It's Hard

  • Financial time series are non-stationary: Market regimes change. A model that worked last quarter may fail this quarter. The pipeline needed drift detection and automated retraining triggers.
  • MLOps on a budget: No AWS SageMaker, no Databricks. Everything runs on a modest VPS with Docker Compose โ€” MLflow, Grafana, and the model serving infrastructure had to coexist efficiently.
  • Feature engineering at scale: Indonesian market data required custom features โ€” sector rotation signals, foreign vs domestic flow ratios, and Islamic calendar adjustments (Ramadan effects on trading volume).
  • Auditability: Financial predictions need to be explainable. Every model version, every feature set, every prediction had to be traceable.

Technical Stack

  • Scikit-learn โ€” primary ML framework for stock movement prediction models
  • MLflow โ€” experiment tracking, model registry, and deployment management
  • Grafana โ€” real-time dashboards for prediction accuracy, data drift, feature distributions, and system health
  • Docker + Docker Compose โ€” containerized pipeline with reproducible environments
  • Pandas โ€” data ingestion, feature engineering, and preprocessing
  • Custom drift detection โ€” statistical tests on feature distributions triggering automated retrain workflows
  • FastAPI โ€” lightweight prediction serving

What I'd Do Differently

  • Add a feature store. Feast or a simple Redis-based store would have prevented feature inconsistencies between training and serving.
  • Use DVC for data versioning. Tracking which dataset produced which model is critical. Git alone isn't enough.
  • Implement shadow deployment. Running new models in shadow mode before promoting them would reduce deployment risk.
  • Explore gradient boosting. XGBoost or LightGBM could potentially capture non-linear patterns better than the Scikit-learn models โ€” worth benchmarking in a future iteration.

Key Takeaways

MLOps isn't about fancy tools โ€” it's about reproducibility and trust. When your predictions affect financial decisions, you need to know exactly which model version made which prediction, with which data, and why. That's the bar.
โ† Back to Projects