Weather MLOps

Production-grade MLOps system for multi-provider weather forecast ingestion, verification, and ensemble prediction β€” all running on $0/month free tiers.

5 Forecast APIs Ensemble ML Neon Postgres MLflow + DagsHub FastAPI Gradio Dashboard GitHub Actions

What This Project Does

A complete, automated weather intelligence pipeline

🌀️

Multi-Source Ingestion

Every hour, forecasts are ingested from Open-Meteo, MET Norway, OpenWeather, Visual Crossing, and US NWS. All free or free-tier APIs.

πŸ“Š

Forecast Verification

Ground-truth observations from Meteostat are joined against every forecast. MAE, RMSE, and MAPE computed continuously per source, variable, and horizon.

πŸ€–

Ensemble Model

A LightGBM ensemble is trained daily on all vendor forecasts + lagged observations + calendar features. Champion-challenger promotion with >2% improvement threshold.

πŸš€

Live Serving

FastAPI serves real-time predictions. A Gradio dashboard visualizes verification metrics, leaderboards, and drift detection.

πŸ’Έ

Zero Budget

Every component runs on free tiers: Neon Serverless Postgres, DagsHub MLflow, GitHub Actions, Hugging Face Spaces, and Deta Space.

πŸ‡ΏπŸ‡¦

South Africa First

Pre-configured for 9 South African cities (Johannesburg, Cape Town, Durban, Pretoria, Gqeberha, Bloemfontein, Polokwane, Mbombela, East London).

System Architecture

End-to-end data flow from ingestion to serving

graph TB subgraph "Data Sources" OM[Open-Meteo
free, no key] MET[MET Norway
User-Agent only] OW[OpenWeather
1K calls/day free] VC[Visual Crossing
1K records/day free] NWS[weather.gov
US only, free] MS[Meteostat
observations, CC BY-NC] end subgraph "Ingestion (GitHub Actions, hourly)" ETL[ETL Pipeline
at :17 UTC] OBS[Obs Pipeline
at :47 UTC] end subgraph "Storage (Neon Serverless Postgres)" FCT[(forecasts)] OBT[(observations)] ERR[(errors)] MDL[(models)] end subgraph "ML Pipeline (GitHub Actions, daily)" FEAT[Feature Engineering
vendor pivots + lags + calendar] TRAIN[Train Ensemble
LightGBM + Linear] PROMO[Promote Champion
>>2% RMSE & MAE] end subgraph "Verification (hourly)" VERIFY[Compute Errors
MAE / RMSE / MAPE] LB[Leaderboard
best source per var+horizon] end subgraph "Serving" API[FastAPI
Deta Space] DASH[Gradio Dashboard
Hugging Face Spaces] end subgraph "Experiment Tracking" MLFLOW[DagsHub MLflow
model registry + metrics] end OM --> ETL MET --> ETL OW --> ETL VC --> ETL NWS --> ETL MS --> OBS ETL --> FCT OBS --> OBT FCT --> VERIFY OBT --> VERIFY VERIFY --> ERR ERR --> LB FCT --> FEAT OBT --> FEAT FEAT --> TRAIN TRAIN --> MLFLOW TRAIN --> MDL MDL --> PROMO FCT --> API API --> DASH style FCT fill:#dbeafe,stroke:#1e40af style OBT fill:#dbeafe,stroke:#1e40af style ERR fill:#fef3c7,stroke:#f59e0b style TRAIN fill:#dbeafe,stroke:#1e40af style PROMO fill:#fef3c7,stroke:#f59e0b

Pipeline Schedule

All times UTC, staggered off the hour to avoid GitHub Actions queue congestion

1

ETL Forecasts

Hourly :17

5 APIs ingested

2

Predict

Hourly :27

Ensemble inference

3

Monitor

Hourly :37

Leaderboard + volume

4

Verify

Hourly :47

Obs + error compute

5

Train + Promote

Daily 03:17

Model retraining

6

Prune

Daily 03:07

Data retention

Project Stats

Designed for scale within free-tier constraints

5
Forecast Providers
3
Weather Variables
7
Forecast Horizons (1h–72h)
9
South African Cities
6
GitHub Actions Workflows
~30K
Rows ingested / day
$0
Monthly infrastructure cost
2
Model algorithms (LightGBM + Linear)

Live System Data

Real-time metrics from Neon Postgres, updated hourly via GitHub Actions

--
Forecast Rows
--
Observation Rows
--
Error Rows
--
Trained Models

Leaderboard β€” Best Source per Variable & Horizon

VariableHorizon (h)Best SourceRMSEMAE
Waiting for data...

Top Error Metrics (Last 7 Days)

SourceVariableHorizon (h)RMSEMAE
Waiting for data...

Key Features

Production-grade MLOps patterns on a zero-cost stack

CapabilityImplementationFree Tier Used
Forecast Ingestion5 vendor-specific ETL modules with unit normalizationOpen-Meteo, MET Norway, OpenWeather, Visual Crossing, NWS
Observation IngestionMeteostat Python library with 7-day lookbackMeteostat (CC BY-NC)
Data WarehouseNeon Serverless Postgres with indexed pruningNeon (0.5 GB, 100 compute hrs)
Feature EngineeringVendor pivot + lagged observations (1h,3h,6h) + calendarPandas + SQLAlchemy
Model TrainingPer-variable/horizon LightGBM + LinearRegression with weekly CVGitHub Actions (2000 min/month)
Experiment TrackingMLflow on DagsHub with model registryDagsHub (free for public repos)
Model PromotionChampion-challenger with >2% RMSE & MAE thresholdPostgres + MLflow API
InferenceBatch prediction streaming (50K-row batches)GitHub Actions
VerificationForecast-observation join with MAE/RMSE/MAPE computeNeon + Pandas
LeaderboardBest source per variable/horizon by mean RMSEPostgres SQL
Serving APIFastAPI with /health, /sources, /metrics, /predictDeta Space
DashboardGradio with 4 tabs (Verification, Leaderboard, Our vs Best, Drift)Hugging Face Spaces
Orchestration6 GitHub Actions workflows, staggered cronGitHub Actions
Data ManagementAutomated pruning with configurable retention TTLPostgres batched DELETE
CachingFile-based HTTP cache with configurable TTL + exponential backoffLocal filesystem

Use Cases

Real-world applications for this system

Renewable Energy Forecasting

Predict wind speed and temperature for solar/wind farm output optimization. Multi-horizon forecasts enable day-ahead energy trading.

Agricultural Planning

Precipitation and temperature forecasts for irrigation scheduling, frost warnings, and harvest timing across South African farming regions.

Supply Chain Logistics

Wind and precipitation forecasts for route planning, delivery scheduling, and warehouse operations across multiple cities.

Disaster Preparedness

Multi-source forecast consensus for early warning systems. When providers disagree, the ensemble model provides calibrated uncertainty.

Insurance Underwriting

Historical forecast accuracy data per provider, variable, and horizon for parametric insurance product design and risk modeling.

Research & Education

Reference implementation of production MLOps patterns: feature engineering, model registry, champion-challenger, data drift monitoring.

Quick Start

Get running in under 10 minutes on free tiers only

# 1. Clone
git clone https://github.com/JepStar990/weather-mlops-forecasts.git
cd weather-mlops-forecasts

# 2. Set up environment
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your DATABASE_URL and API keys

# 3. Bootstrap database
psql $DATABASE_URL -f src/db/schema.sql
python scripts/seed_locations.py

# 4. Smoke test
python src/etl/ingest_open_meteo.py

# 5. Set up GitHub Secrets for automation:
#    - DATABASE_URL (Neon connection string)
#    - DAGSHUB_USERNAME, DAGSHUB_TOKEN (MLflow tracking)
#    - OPENWEATHER_API_KEY, VISUAL_CROSSING_API_KEY (optional)
#    - MET_NO_USER_AGENT, NWS_USER_AGENT

API Reference

FastAPI endpoints served from Deta Space

EndpointMethodDescription
/healthGETHealth check β€” returns {"status":"ok"}
/sourcesGETPer-source error metrics (RMSE, MAE, MAPE) over last 7 days
/metricsGETLeaderboard: best source per variable and horizon
/predictPOSTGet ensemble predictions β€” body: {"lat":-26.2,"lon":28.0,"variables":["temp_2m"],"horizons":[1,3,6]}

Technology Stack

Every component runs on a free tier

Python 3.11
Core language
Neon
Serverless Postgres
DagsHub
MLflow tracking
LightGBM
Ensemble model
FastAPI
Prediction API
Gradio
Verification dashboard