Which of the following descriptions best represent the overarching design of your forecasting model?
- Post-processing of numerical weather prediction (NWP) data.
- Machine learning-based weather prediction.
- Statistical model focused on generating quintile probabilities.
- Hybrid model that integrates physical simulations with machine learning or statistical techniques.
- An empirical model that utilises historical weather patterns.
- Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
Our system is a post-processing pipeline rather than a learned forecast model, so it does not generate its own initial conditions. It consumes initial-conditioned forecasts from NWP.
If any, what data does your model rely on for real-time forecasting purposes?
For each Thursday 00 UTC cycle we need three live feeds:
NOAA CFSv2 real-time time_grib_{01..04}/ daily GRIB2 files for tmp2m, prmsl, prate (primary s2s anchor, free, no credentials).
ECMWF Open Data extended forecast — used for calibration, never as the s2s forecast itself.
AI-WQ official FTP assets (observations, 20-year quintile climatology, 1.5° land-sea mask) for local verification and submission template wrapping.
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
ERA5 reanalysis (1.5°, 6-hourly, 2002–) from a locally hosted zarr store — used to build weekly-mean / weekly-sum ground truth and the 20-year rolling quintile climatology.
NOAA CFSv2 and ecmwf extended operational forecasts (4 members, 1.5° after regridding) — the forecast side of any bias / calibration fit.
AI-WQ official training data and quintile boundaries — authoritative references for the submission schema and evaluation.
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
The system is a staged post-processing ladder layered on NWPs:
B0 — flat 0.2 climatology baseline (sanity rung; locally verified to produce RPSS = 0).
B1 — raw ensemble → quintile-frequency probabilities against the 20-year rolling climatology.
B2 — gridpoint / day-of-year / lead mean-bias correction of the ensemble mean, from hindcast or rolling recent errors vs ERA5.
B3 — trend-aware quintile boundaries: 20-year linear trend per gridpoint shifts climatology toward the most recent 5 years (addresses cold-bias under warming).
B4 — probabilistic calibration via EMOS / isotonic regression, tuned so predicted quintile frequency matches observed quintile frequency in a held-out window.
All rungs share the same pre-processingand post-processing (quintile binning, clip and renormalise, offline format check, submission template wrapping). The framework is implemented in Python (xarray / zarr / numpy / cfgrib / pandas), driven by YAML experiment configs with deterministic reruns.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
No publications or public presentations related to this specific model at this time.
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
Yes. We built a local verification loop that:
replays the full pipeline over historical Thursdays and computes RPSS against ERA5 weekly truth (cosine-latitude weighted; land-sea mask applied for tas and pr);
uses the official AI-WQ formulation (cumulative sum over quintiles, reference = flat 1/5 climatology), cross-checked to produce RPSS = 0 for the climatology baseline within 1 × 10⁻⁶ (scripts/preflight.py);
enforces a three-gate submission check — offline shape / sum / NaN validation, local RPSS > 0 against flat, and manual review of a global RPSS map and time series — before any official submission.
Did you face any challenges during model development, and how did you address them?
ECMWF Open Data extended-range is capped at 360 h, so it cannot cover the 18–32-day target window; we pivoted the s2s anchor to NOAA CFSv2 on AWS S3 after empirical verification.
Floating-point drift (~10⁻¹⁴) between regridded CFSv2 coords, ERA5 coords, and the canonical 1.5° grid silently collapsed xarray broadcasts into inner joins (shapes like 121 → 40). Resolved by snapping every grid product to a single canonical LAT / LON immediately after load.
CFSv2 T126 Gaussian max |lat| ≈ 89.28°, so bilinear interpolation to ±90° produced NaN pole rows; filled by nearest-neighbour copy from the innermost latitude.
Initial B1 (raw 4-member ensemble → quintile frequency) is highly overconfident — in a single-init smoke, ~25 % of gridpoints have all four members in one bin, giving RPSS ≈ −0.5. This motivated the Laplace-smoothing and multi-model branches in the baseline ladder.
Are there any limitations to your current model that you aim to address in future iterations?
Ensemble size is still small (4 CFSv2 members); planned to fuse with GEFS, NCEP S2S, JMA, and UKMO outputs where licence allows, to reach 20 + effective members.
No learned calibration yet — the EMOS / isotonic stage (B4) is designed but not yet fit to long hindcast.
tas / mslp / pr are currently calibrated independently; a Gaussian-copula or conditional correction across variables is on the roadmap.
No explicit MJO / ENSO regime conditioning in the bias correction; planned as an ablation once B2 is stable.
A foundation-model branch is deliberately deferred; it will only be revisited if the post-processing ladder saturates before leaderboard targets are met.
Are there any other AI/ML model components or innovations that you wish to highlight?
Deterministic, configuration-driven post-processing framework with a transparent baseline ladder (B0 → B4) so every probabilistic gain is attributable to a specific correction (bias, trend, or calibration), not a black box.
Built-in preflight: the climatology baseline is verified to produce RPSS ≡ 0 before any real submission, catching scoring-pipeline bugs up front.
Trend-aware quintile boundaries explicitly address climatological drift that stationary 20-year quintiles miss — relevant under continued global warming at S2S scales.
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
This team has chosen to keep its participants anonymous.