Production-grade MLOps system for multi-provider weather forecast ingestion, verification, and ensemble prediction β all running on $0/month free tiers.
Every hour, forecasts are ingested from Open-Meteo, MET Norway, OpenWeather, Visual Crossing, and US NWS. All free or free-tier APIs.
Ground-truth observations from Meteostat are joined against every forecast. MAE, RMSE, and MAPE computed continuously per source, variable, and horizon.
A LightGBM ensemble is trained daily on all vendor forecasts + lagged observations + calendar features. Champion-challenger promotion with >2% improvement threshold.
FastAPI serves real-time predictions. A Gradio dashboard visualizes verification metrics, leaderboards, and drift detection.
Every component runs on free tiers: Neon Serverless Postgres, DagsHub MLflow, GitHub Actions, Hugging Face Spaces, and Deta Space.
Pre-configured for 9 South African cities (Johannesburg, Cape Town, Durban, Pretoria, Gqeberha, Bloemfontein, Polokwane, Mbombela, East London).
Hourly :17
5 APIs ingested
Hourly :27
Ensemble inference
Hourly :37
Leaderboard + volume
Hourly :47
Obs + error compute
Daily 03:17
Model retraining
Daily 03:07
Data retention
| Variable | Horizon (h) | Best Source | RMSE | MAE |
|---|---|---|---|---|
| Waiting for data... | ||||
| Source | Variable | Horizon (h) | RMSE | MAE |
|---|---|---|---|---|
| Waiting for data... | ||||
| Capability | Implementation | Free Tier Used |
|---|---|---|
| Forecast Ingestion | 5 vendor-specific ETL modules with unit normalization | Open-Meteo, MET Norway, OpenWeather, Visual Crossing, NWS |
| Observation Ingestion | Meteostat Python library with 7-day lookback | Meteostat (CC BY-NC) |
| Data Warehouse | Neon Serverless Postgres with indexed pruning | Neon (0.5 GB, 100 compute hrs) |
| Feature Engineering | Vendor pivot + lagged observations (1h,3h,6h) + calendar | Pandas + SQLAlchemy |
| Model Training | Per-variable/horizon LightGBM + LinearRegression with weekly CV | GitHub Actions (2000 min/month) |
| Experiment Tracking | MLflow on DagsHub with model registry | DagsHub (free for public repos) |
| Model Promotion | Champion-challenger with >2% RMSE & MAE threshold | Postgres + MLflow API |
| Inference | Batch prediction streaming (50K-row batches) | GitHub Actions |
| Verification | Forecast-observation join with MAE/RMSE/MAPE compute | Neon + Pandas |
| Leaderboard | Best source per variable/horizon by mean RMSE | Postgres SQL |
| Serving API | FastAPI with /health, /sources, /metrics, /predict | Deta Space |
| Dashboard | Gradio with 4 tabs (Verification, Leaderboard, Our vs Best, Drift) | Hugging Face Spaces |
| Orchestration | 6 GitHub Actions workflows, staggered cron | GitHub Actions |
| Data Management | Automated pruning with configurable retention TTL | Postgres batched DELETE |
| Caching | File-based HTTP cache with configurable TTL + exponential backoff | Local filesystem |
Predict wind speed and temperature for solar/wind farm output optimization. Multi-horizon forecasts enable day-ahead energy trading.
Precipitation and temperature forecasts for irrigation scheduling, frost warnings, and harvest timing across South African farming regions.
Wind and precipitation forecasts for route planning, delivery scheduling, and warehouse operations across multiple cities.
Multi-source forecast consensus for early warning systems. When providers disagree, the ensemble model provides calibrated uncertainty.
Historical forecast accuracy data per provider, variable, and horizon for parametric insurance product design and risk modeling.
Reference implementation of production MLOps patterns: feature engineering, model registry, champion-challenger, data drift monitoring.
# 1. Clone
git clone https://github.com/JepStar990/weather-mlops-forecasts.git
cd weather-mlops-forecasts
# 2. Set up environment
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your DATABASE_URL and API keys
# 3. Bootstrap database
psql $DATABASE_URL -f src/db/schema.sql
python scripts/seed_locations.py
# 4. Smoke test
python src/etl/ingest_open_meteo.py
# 5. Set up GitHub Secrets for automation:
# - DATABASE_URL (Neon connection string)
# - DAGSHUB_USERNAME, DAGSHUB_TOKEN (MLflow tracking)
# - OPENWEATHER_API_KEY, VISUAL_CROSSING_API_KEY (optional)
# - MET_NO_USER_AGENT, NWS_USER_AGENT
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Health check β returns {"status":"ok"} |
/sources | GET | Per-source error metrics (RMSE, MAE, MAPE) over last 7 days |
/metrics | GET | Leaderboard: best source per variable and horizon |
/predict | POST | Get ensemble predictions β body: {"lat":-26.2,"lon":28.0,"variables":["temp_2m"],"horizons":[1,3,6]} |