Overview » Teams » WeatherQuant

WeatherQuant

Members

This team has chosen to keep its participants anonymous.

Models

Model name

aerosphere

Number of individuals supporting model development:

1-5

Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:

< 8

Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:

< 4

How would you best classify the IT system used for model development or forecast production:

Single node system

Model summary questionnaire for model aerosphere

Please note that the list below shows all questionnaires submitted for this model.
They are displayed from the most recent to the earliest, covering each 13-week competition period in which the team competed with this model.

Which of the following descriptions best represent the overarching design of your forecasting model?

Post-processing of numerical weather prediction (NWP) data.
Machine learning-based weather prediction.
Statistical model focused on generating quintile probabilities.
Hybrid model that integrates physical simulations with machine learning or statistical techniques.
An empirical model that utilises historical weather patterns.
Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.

What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)

Our system is a post-processing pipeline rather than a learned forecast model, so it does not generate its own initial conditions. It consumes initial-conditioned forecasts from NWP.

If any, what data does your model rely on for real-time forecasting purposes?

For each Thursday 00 UTC cycle we need three live feeds: NOAA CFSv2 real-time time_grib_{01..04}/ daily GRIB2 files for tmp2m, prmsl, prate (primary s2s anchor, free, no credentials). ECMWF Open Data extended forecast — used for calibration, never as the s2s forecast itself. AI-WQ official FTP assets (observations, 20-year quintile climatology, 1.5° land-sea mask) for local verification and submission template wrapping.

What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)

ERA5 reanalysis (1.5°, 6-hourly, 2002–) from a locally hosted zarr store — used to build weekly-mean / weekly-sum ground truth and the 20-year rolling quintile climatology. NOAA CFSv2 and ecmwf extended operational forecasts (4 members, 1.5° after regridding) — the forecast side of any bias / calibration fit. AI-WQ official training data and quintile boundaries — authoritative references for the submission schema and evaluation.

Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)

The system is a staged post-processing ladder layered on NWPs: B0 — flat 0.2 climatology baseline (sanity rung; locally verified to produce RPSS = 0). B1 — raw ensemble → quintile-frequency probabilities against the 20-year rolling climatology. B2 — gridpoint / day-of-year / lead mean-bias correction of the ensemble mean, from hindcast or rolling recent errors vs ERA5. B3 — trend-aware quintile boundaries: 20-year linear trend per gridpoint shifts climatology toward the most recent 5 years (addresses cold-bias under warming). B4 — probabilistic calibration via EMOS / isotonic regression, tuned so predicted quintile frequency matches observed quintile frequency in a held-out window. All rungs share the same pre-processingand post-processing (quintile binning, clip and renormalise, offline format check, submission template wrapping). The framework is implemented in Python (xarray / zarr / numpy / cfgrib / pandas), driven by YAML experiment configs with deterministic reruns.

Have you published or presented any work related to this forecasting model? If yes, could you share references or links?

No publications or public presentations related to this specific model at this time.

Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?

Yes. We built a local verification loop that: replays the full pipeline over historical Thursdays and computes RPSS against ERA5 weekly truth (cosine-latitude weighted; land-sea mask applied for tas and pr); uses the official AI-WQ formulation (cumulative sum over quintiles, reference = flat 1/5 climatology), cross-checked to produce RPSS = 0 for the climatology baseline within 1 × 10⁻⁶ (scripts/preflight.py); enforces a three-gate submission check — offline shape / sum / NaN validation, local RPSS > 0 against flat, and manual review of a global RPSS map and time series — before any official submission.

Did you face any challenges during model development, and how did you address them?

ECMWF Open Data extended-range is capped at 360 h, so it cannot cover the 18–32-day target window; we pivoted the s2s anchor to NOAA CFSv2 on AWS S3 after empirical verification. Floating-point drift (~10⁻¹⁴) between regridded CFSv2 coords, ERA5 coords, and the canonical 1.5° grid silently collapsed xarray broadcasts into inner joins (shapes like 121 → 40). Resolved by snapping every grid product to a single canonical LAT / LON immediately after load. CFSv2 T126 Gaussian max |lat| ≈ 89.28°, so bilinear interpolation to ±90° produced NaN pole rows; filled by nearest-neighbour copy from the innermost latitude. Initial B1 (raw 4-member ensemble → quintile frequency) is highly overconfident — in a single-init smoke, ~25 % of gridpoints have all four members in one bin, giving RPSS ≈ −0.5. This motivated the Laplace-smoothing and multi-model branches in the baseline ladder.

Are there any limitations to your current model that you aim to address in future iterations?

Ensemble size is still small (4 CFSv2 members); planned to fuse with GEFS, NCEP S2S, JMA, and UKMO outputs where licence allows, to reach 20 + effective members. No learned calibration yet — the EMOS / isotonic stage (B4) is designed but not yet fit to long hindcast. tas / mslp / pr are currently calibrated independently; a Gaussian-copula or conditional correction across variables is on the roadmap. No explicit MJO / ENSO regime conditioning in the bias correction; planned as an ablation once B2 is stable. A foundation-model branch is deliberately deferred; it will only be revisited if the post-processing ladder saturates before leaderboard targets are met.

Are there any other AI/ML model components or innovations that you wish to highlight?

Deterministic, configuration-driven post-processing framework with a transparent baseline ladder (B0 → B4) so every probabilistic gain is attributable to a specific correction (bias, trend, or calibration), not a black box. Built-in preflight: the climatology baseline is verified to produce RPSS ≡ 0 before any real submission, catching scoring-pipeline bugs up front. Trend-aware quintile boundaries explicitly address climatological drift that stationary 20-year quintiles miss — relevant under continued global warming at S2S scales.

Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.

This team has chosen to keep its participants anonymous.

Model name

aerosphereHybrid

Number of individuals supporting model development:

1-5

Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:

< 8

Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:

< 4

How would you best classify the IT system used for model development or forecast production:

Single node system

Model summary questionnaire for model aerosphereHybrid

Which of the following descriptions best represent the overarching design of your forecasting model?

Post-processing of numerical weather prediction (NWP) data.
Machine learning-based weather prediction.
Statistical model focused on generating quintile probabilities.
Hybrid model that integrates physical simulations with machine learning or statistical techniques.
Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.

What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)

NOAA CFSv2 4-member operational ensemble (00 UTC Thursday cycle, fetched from noaa-cfs-pds AWS bucket), regridded from native ~1° to 1.5° competition grid. Past-4-week ERA5/ERA5T weekly observations are used as auxiliary context for the ML residual head.

If any, what data does your model rely on for real-time forecasting purposes?

NOAA CFSv2 operational ensemble (S3, daily refresh) AIWQ official 20-year quintile climatology (FTP, target Monday) AIWQ weekly observations / ERA5T weekly (FTP) for trailing-bias pool and residual context NOAA Niño 3.4 monthly anomaly index (PSL, optional)

What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)

ERA5 1959-2021 weekly aggregates (tas / mslp / pr) for residual context AIWQ 20-yr rolling quintile climatology for label binning CFSv2 reforecast/operational hindcast 2018-2025 for forecast input AIWQ training-data archive 2022-2024 + AIWQ operational obs 2024-10+ for newer ground truth

Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)

Hybrid: final_logits = log(p_BaselineA) + ε × residual_head(ctx). Baseline-A prior = trailing-K bias-corrected CFSv2 ens-mean binned into quintiles with Laplace smoothing (per-var K=1 / α∈{2, 3, 4} for tas/mslp/pr). Residual head = small UNet (~0.11M params, GroupNorm + GELU, circular lon padding). Per-variable learnable ε vector scaled by 0.25× at inference to prevent overfitting. Trained via weighted CE with cos(lat) weighting, ε L2 penalty 0.01, dropout 0.3, mam_proxy validation.

Have you published or presented any work related to this forecasting model? If yes, could you share references or links?

None at this time.

Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?

3-fold leave-year-out cross-validation: Baseline + LYO-2020 + LYO-2021 ckpts trained, then evaluated on official RPSS (cos-lat + land-sea mask) on held-out years 2019/2020/2021 MAM and 2026 MAM. Confirmed residual generalises in El Niño-like regimes (+0.026 on 2019/2020) but is regime-dependent.

Did you face any challenges during model development, and how did you address them?

5-Thursday small-sample variance (±0.03 RPSS per Thu) → mitigated via multi-year LYO Val/test ENSO regime mismatch causing ε overshoot → mitigated by inference-time ε × 0.25 scaling (calibrated on 3-year OOS) Bias correction with stale fetch of CFSv2 cycles → fixed via daily idempotent refresh script Foundation pretrain attempts on ERA5 alone failed in regime-edge year 2026 → dropped that path

Are there any limitations to your current model that you aim to address in future iterations?

Residual is only mildly used (ε × 0.25); could increase if OOS confidence improves CFSv2 4-member ensemble is small → diffusion-based ensemble generation planned ENSO regime dependency unmodelled → state-conditional ε is a candidate Past-4-week ERA5 context is short → expand to 52 weeks with real long-history ctx

Are there any other AI/ML model components or innovations that you wish to highlight?

Custom per-variable Laplace α (Round 6c) — physically motivated (pr noisier needs more smoothing) Per-variable learnable ε with backward-compat shim Idempotent daily-refresh pipeline auto-overwrites FTP submission as bias pool advances

This team has chosen to keep its participants anonymous.

Model name

aerosphereHercules

Number of individuals supporting model development:

1-5

Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:

< 8

Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:

< 4

How would you best classify the IT system used for model development or forecast production:

Single node system

Model summary questionnaire for model aerosphereHercules

Which of the following descriptions best represent the overarching design of your forecasting model?

Post-processing of numerical weather prediction (NWP) data.
Machine learning-based weather prediction.
Statistical model focused on generating quintile probabilities.
Hybrid model that integrates physical simulations with machine learning or statistical techniques.
Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.

What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)

Same as Model 2: NOAA CFSv2 00 UTC Thursday cycle, regridded to 1.5° competition grid, with past-4-week ERA5/ERA5T weekly obs as residual-head context.

If any, what data does your model rely on for real-time forecasting purposes?

Identical to Model 2 (CFSv2 operational + AIWQ quintile climatology + AIWQ obs / ERA5T + Niño 3.4).

What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)

Identical to Model 2 — ERA5 1959-2021, AIWQ quintile clim 1980-2021, CFSv2 hindcast 2018-2025, AIWQ training-data 2022-2024 + ops obs 2024-10+.

Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)

Same hybrid backbone as Model 2: log-prior + ε × residual UNet on CFSv2 + ERA5 context. Difference: ε is applied at full learned magnitude (1.0×, no inference-time damping). This is a higher-variance complement to Model 2 — wins more in El Niño-like regimes, loses more in La Niña-like regimes. Designed for portfolio diversification across the team's 3-model quota.

Have you published or presented any work related to this forecasting model? If yes, could you share references or links?

None at this time.

Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?

Same 3-fold LYO as Model 2 (baseline / LYO-2020 / LYO-2021 ckpts). Cross-year ε-scan confirms expected behaviour: residual at full ε adds +0.026 RPSS on 2019/2020 MAM (El Niño-like) but subtracts 0.01-0.02 on 2021 MAM (strong La Niña). Submitted as the deliberate high-risk/high-reward complement.

Did you face any challenges during model development, and how did you address them?

Same residual ENSO-regime dependency as Model 2, but embraced here rather than damped Risk of negative score in La Niña years acknowledged; offset by Model 1 (safe baseline) + Model 2 (mild hybrid) in the team portfolio Plan X self-supervised foundation-model variants (UNet + Swin Transformer) attempted as Hercules upgrade, but failed gate due to σ collapse + ERA5 self-supervised lacking real forecast skill at S2S lead

Are there any limitations to your current model that you aim to address in future iterations?

High variance is a feature here, but means single-year scoring can be very negative if regime is hostile A learned ENSO-state gate on ε (so the model damps itself in La Niña) is the natural next step Plan X v4 (CFSv2-conditional probabilistic regressor with CRPS loss + heterogeneous σ + Swin Transformer backbone) is coded and queued; will replace Hercules once it passes the LYO gate of ≥ +0.045 official RPSS

Are there any other AI/ML model components or innovations that you wish to highlight?

Same UNet/ε vector architecture as Model 2 but submitted at unattenuated ε as deliberate portfolio diversifier — turns one trained model into two complementary submissions through inference-time ε scaling ε-scan diagnostic: scale learned ε at inference by {0, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0} to characterise the magnitude/direction trade-off per variable per year

This team has chosen to keep its participants anonymous.

Submitted forecast data in previous period(s)

Please note: Submitted forecast data is only publicly available once the evaluation of a full competitive period has been completed. See the competition's full detailed schedule with submitted data publication dates for each period here.

Access forecasts data

WeatherQuant

Members

Models

Model name

Model summary questionnaire for model aerosphere

MAM 2026 Period

Model name

Model summary questionnaire for model aerosphereHybrid

MAM 2026 Period

Model name

Model summary questionnaire for model aerosphereHercules

MAM 2026 Period

Submitted forecast data in previous period(s)

Participation