Which of the following descriptions best represent the overarching design of your forecasting model?
- Post-processing of numerical weather prediction (NWP) data.
- Machine learning-based weather prediction.
- Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
For the BaseModel, which serves as a baseline post-processing system for raw S2S forecasts, the initialization relies entirely on operational numerical weather prediction (NWP) outputs and basic machine learning correction. The initialization techniques are as follows:
Data Sources
S2S Prediction Products: Raw forecasts from multiple operational centers, primarily ECMWF, as well as supplementary data from CMA and NCEP, obtained from the official [S2S Project Database](https://s2s.ecmwf.int/).
Observational Reference: CMARA (China Meteorological Administration Reanalysis) data used to compute historical forecast errors for bias correction.
Processing of Initial Conditions
1. Temporal and Spatial Alignment:
All S2S forecast fields (temperature, precipitation, sea level pressure) were regridded to the common 1.5° × 1.5° global grid and aligned to the target forecast windows (Days 19–25 and Days 26–32) as required by the AI Weather Quest.
2. Bias Estimation:
Historical forecast errors (S2S minus CMARA) over the period 2015–2024 were computed for each lead time, variable, and grid point.
3. Model Initialization via Simple ML:
A linear regression (or quantile mapping for precipitation) model was trained at each grid point to map raw S2S forecasts to observed climatology using CMARA as truth.
The regression coefficients (or correction functions) derived from the historical period serve as the initial parameters of the BaseModel.
No deep learning or complex neural architectures are used—only interpretable, lightweight statistical methods suitable for operational post-processing.
4. Real-Time Initialization:
At forecast time, the BaseModel takes the latest S2S output as input and applies the pre-trained correction functions directly, without any dynamic state or iterative initialization.
In summary, the BaseModel is initialized purely from historical S2S–reanalysis error statistics, using simple, static machine learning techniques (linear models and quantile mapping) to correct systematic biases in the raw numerical forecasts.
If any, what data does your model rely on for real-time forecasting purposes?
For real-time forecasting, the BaseModel relies exclusively on the following data sources:
1. Operational S2S Forecast Outputs
Primary Source: Real-time Sub-seasonal to Seasonal (S2S) prediction products from ECMWF (European Centre for Medium-Range Weather Forecasts).
Supplementary Sources: S2S forecasts from CMA (China Meteorological Administration) and NCEP (National Centers for Environmental Prediction), used to support multi-model consistency checks or ensemble averaging where applicable.
These S2S outputs provide the raw predictions for the three target variables:
2-meter temperature
Total precipitation
Sea level pressure (SLP)
at the required 1.5° × 1.5° global grid resolution and for the target lead windows (Days 19–25 and Days 26–32).
2. Pre-computed Correction Parameters
The BaseModel does not require real-time observational data during forecasting.
Instead, it uses static correction functions (e.g., linear regression coefficients or quantile mapping tables) that were pre-trained offline using historical data (2015–2024) by comparing S2S forecasts against the CMARA reanalysis dataset.
These parameters are stored and applied directly to incoming S2S forecasts at inference time.
Summary
In real-time operation, the BaseModel only ingests the latest operational S2S forecast files (from ECMWF, and optionally CMA/NCEP) and applies pre-learned statistical corrections. It does not use satellite data, surface observations, or real-time reanalysis during forecasting—ensuring low latency, reproducibility, and compliance with the AI Weather Quest’s data usage policy.
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
For the BaseModel, the following types of datasets were used during model training:
1. Numerical Weather Prediction (NWP) Outputs
S2S Forecast Products: Historical forecasts from operational sub-seasonal prediction systems, primarily:
ECMWF S2S (European Centre for Medium-Range Weather Forecasts)
CMA S2S (China Meteorological Administration)
NCEP S2S (National Centers for Environmental Prediction)
These provided the input features (raw model forecasts) for the three target variables:
2-meter temperature
Total precipitation
Sea level pressure (SLP)
Temporal coverage: 2015–2024, with forecasts initialized weekly (typically on Mondays and Thursdays).
2. Reanalysis Data (Used as Ground Truth)
CMARA (China Meteorological Administration Reanalysis):
Served as the observational reference or "truth" dataset for training.
Provided high-quality, spatially complete, and temporally consistent historical atmospheric states.
Used to compute forecast errors (S2S – CMARA) for bias estimation and correction model training.
CMARA was chosen for its compatibility with CMA’s modeling framework and its reliability over the training period.
3. Data Processing for Training
All datasets were regridded to a common 1.5° × 1.5° global grid to match the AI Weather Quest submission format.
Only Days 19–32 of each S2S forecast were retained for training, aligning with the competition’s target lead windows.
For precipitation, quantile mapping was trained using cumulative distribution functions derived from S2S and CMARA; for temperature and SLP, linear regression or mean bias correction was applied.
Not Used
Satellite data: Not directly ingested or used in training.
In-situ observational datasets (e.g., station data): Not used, as CMARA already assimilates such data and provides a consistent gridded product.
Other reanalyses (e.g., ERA5, MERRA-2): Not used; CMARA was the sole reference to maintain consistency with the development team’s operational environment.
In summary, the BaseModel was trained using historical NWP (S2S) outputs as predictors and CMARA reanalysis data as the target, employing simple statistical machine learning techniques to learn systematic forecast corrections.
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
Certainly. Below is an overview of the BaseModel architecture—a lightweight, interpretable post-processing system designed to correct raw S2S forecasts using simple machine learning techniques.
Overview of BaseModel Architecture
The BaseModel is not a deep learning model. Instead, it is a statistical post-processing framework that applies grid-point-wise bias correction to operational S2S forecasts. Its design prioritizes simplicity, robustness, reproducibility, and low computational cost, making it suitable as a baseline for sub-seasonal forecasting.
Key Design Features
1. Per-Grid-Point Correction:
Each 1.5° × 1.5° grid cell is treated independently—no spatial or temporal coupling in the correction logic.
2. Variable-Specific Methods:
Different correction strategies are applied based on variable characteristics:
Temperature & Sea Level Pressure (SLP): Linear bias correction (additive or multiplicative).
Precipitation: Non-parametric quantile mapping to preserve distributional properties and handle non-Gaussian errors.
3. Static, Pre-Trained Parameters:
All correction parameters are estimated offline using historical data (2015–2024) and remain fixed during real-time forecasting.
4. Multi-Model Input Support:
While primarily tuned on ECMWF S2S, the framework can ingest CMA or NCEP outputs using the same correction logic (with source-specific parameters if needed).
Specific Algorithms Used
Linear Regression / Mean Bias Adjustment:
For temperature and SLP, the correction takes the form:
$$
[
y_{\text{corrected}} = y_{\text{raw}} - \text{mean_bias}
]
$$
$
where$$ (\text{mean_bias} = \text{mean}(y_{\text{S2S}} - y_{\text{CMARA}}))$$ over the training period for each lead day and grid point.
Quantile Mapping (QM):
For precipitation:
Empirical cumulative distribution functions (CDFs) are built from historical S2S forecasts and CMARA observations.
For a new forecast value \(x\), its percentile in the S2S CDF is found, then mapped to the corresponding value in the CMARA CDF.
Ensures corrected precipitation respects observed climatology (e.g., frequency of dry days, extreme tails).
No Neural Networks or Complex ML:
The BaseModel uses no deep learning, no ensembles, and no trainable parameters at inference time.
Pre-Processing Steps
1. Data Harmonization:
All S2S forecasts and CMARA data regridded to 1.5° global grid.
Temporal alignment to standard forecast initialization days (e.g., Monday/Thursday).
2. Training Data Construction:
Paired datasets created: (S2S forecast on Day d, CMARA observation on Day d), for d = 19 to 32.
Separate models trained for each lead day (or grouped by week: Days 19–25, 26–32).
3. Parameter Estimation:
Bias terms and quantile mapping tables computed offline and stored as lookup files.
Post-Processing Steps
1. Deterministic Output Generation:
Corrected values for temperature, precipitation, and SLP are output directly.
2. Quintile Probability Estimation (for competition submission):
For each variable, historical CMARA climatology (2015–2024) is used to define quintile thresholds.
The corrected deterministic forecast is converted into probabilities by assuming a parametric (e.g., Gaussian for temperature) or empirical distribution centered on the corrected value.
Alternatively, ensemble-like spread is approximated using historical error variance to assign probabilities to each quintile bin.
3. Format Compliance:
Outputs are formatted to match the AI Weather Quest submission template (NetCDF, 1.5° grid, five quintile probabilities per variable).
Frameworks and Tools
Programming Language: Python
Core Libraries: NumPy, xarray, SciPy, scikit-learn (for basic statistics)
No deep learning frameworks (e.g., PyTorch/TensorFlow) are used.
Summary
The BaseModel is a transparent, physics-agnostic, statistical post-processor that enhances raw S2S forecasts through historical bias correction. While simple, it provides a strong and reliable baseline by leveraging decades of forecast-error statistics and ensures compliance with real-time operational constraints. It served as the foundation upon which more advanced models (like NewMet) were developed and benchmarked.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
No published
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
Yes, before submitting forecasts to the AI Weather Quest, we rigorously validated the BaseModel against an independent observational reference dataset to ensure its reliability and skill.
Validation Dataset
We used the CMARA (China Meteorological Administration Reanalysis) as the ground truth for validation.
CMARA was not used in real-time forecasting but served as an independent, high-quality, and temporally consistent benchmark for historical evaluation.
Validation Approach
1. Out-of-Sample Temporal Holdout
The BaseModel was trained on S2S–CMARA pairs from 2020–2023.
All validation metrics were computed on a held-out test period (2024) to simulate real-time forecasting conditions and avoid overfitting.
2. Metrics Used
We evaluated both deterministic and probabilistic performance using standard sub-seasonal verification metrics:
Deterministic:
Mean Absolute Error (MAE)
Root Mean Square Error (RMSE)
Anomaly Correlation Coefficient (ACC) against climatology
Probabilistic (for quintile forecasts):
Brier Score (BS) and Brier Skill Score (BSS) relative to climatological probabilities
Ranked Probability Skill Score (RPSS)
Reliability diagrams to assess calibration
3. Spatial and Temporal Scope
Validation covered the global domain at 1.5° resolution.
Focused specifically on the competition’s target lead windows: Days 19–25 and Days 26–32.
Separate evaluations were conducted for each variable: 2-meter temperature, total precipitation, and sea level pressure.
4. Baseline Comparison
The BaseModel’s performance was compared against:
Raw ECMWF S2S forecasts (uncorrected)
Climatology (CMARA 2020–2024 mean)
Results confirmed that the BaseModel consistently reduced systematic biases and improved BSS/RPSS, especially for temperature and SLP.
5. Calibration Check
For probabilistic outputs, we verified that the quintile probabilities were well-calibrated—i.e., when the model predicted a 20% chance of being in the highest quintile, observations fell in that quintile approximately 20% of the time.
Outcome
The validation confirmed that the BaseModel provides statistically significant improvements over raw S2S inputs in terms of bias, accuracy, and probabilistic reliability. Only after passing these validation checks did we proceed to submit real-time forecasts to the AI Weather Quest platform.
Did you face any challenges during model development, and how did you address them?
Yes, during the development of the BaseModel, we encountered several practical challenges. Although the model uses simple statistical methods, ensuring robustness, generalizability, and compliance with competition requirements required careful problem-solving. Below are the key challenges and how we addressed them:
1. Non-Stationarity of S2S Forecast Biases
Challenge:
Systematic errors in S2S forecasts (e.g., from ECMWF) can drift over time due to model upgrades, changes in data assimilation, or resolution improvements. A static bias correction trained on older data may become less effective for recent forecasts.
Solution:
Limited the training period to the most recent 5 years (2020–2023) to better represent current S2S system behavior.
Monitored bias stability across years during validation and confirmed that mean errors remained relatively consistent for the target lead windows.
Avoided using data prior to 2020 to minimize the impact of outdated model versions.
2. Precipitation’s Non-Gaussian and Sparse Nature
Challenge:
Precipitation is highly skewed, with many zero (dry) days and occasional extreme events. Simple linear bias correction fails to capture this distribution, leading to unrealistic corrected values (e.g., negative rainfall).
Solution:
Replaced linear correction with empirical quantile mapping (QM) specifically for precipitation.
Used separate CDFs for wet and dry frequencies to preserve the climatological probability of no rain.
Applied smoothing to empirical CDFs to avoid overfitting to rare extremes in the limited historical sample.
3. Defining Reliable Quintile Probabilities from a Deterministic Model
Challenge:
The BaseModel produces only a single deterministic forecast per lead time, but the AI Weather Quest requires probabilistic quintile forecasts (i.e., probabilities for each of five climatological bins).
Solution:
Derived quintile thresholds from the CMARA 2015–2024 climatology (fixed reference).
Approximated forecast uncertainty by assuming a Gaussian distribution centered on the corrected deterministic forecast, with variance estimated from historical S2S error spread.
For precipitation, used a mixed discrete–continuous distribution to account for dry-day probability and wet-day intensity.
Validated resulting probabilities using reliability diagrams to ensure they were not overconfident.
4. Handling Multi-Source S2S Inputs Consistently
Challenge:
While ECMWF is the primary input, we also tested CMA and NCEP S2S data. Each system has different bias structures, making a unified correction approach difficult.
Solution:
Trained source-specific correction parameters (separate bias terms and QM tables for ECMWF, CMA, and NCEP).
In real-time operations, selected the correction set matching the input source.
For ensemble-like blending, applied corrections individually before averaging.
5. Ensuring Global Applicability
Challenge:
Bias characteristics vary significantly by region (e.g., tropics vs. mid-latitudes, land vs. ocean), and a global average correction would degrade local performance.
Solution:
Applied grid-point-wise correction—each 1.5° × 1.5° location has its own bias parameters.
This preserves regional specificity without requiring complex spatial modeling.
6. Computational Simplicity vs. Skill Trade-off
Challenge:
We aimed to keep the model simple, but overly simplistic methods (e.g., global mean bias removal) showed limited skill improvement.
Solution:
Struck a balance by using per-grid-point, per-lead-day, per-variable corrections—still simple, but sufficiently adaptive.
Avoided any iterative or machine-learned components that would complicate reproducibility or real-time deployment.
By addressing these challenges with pragmatic, transparent, and data-driven adjustments, we ensured that the BaseModel delivers consistent, bias-corrected, and probabilistically meaningful forecasts—fulfilling its role as a reliable baseline for the AI Weather Quest competition.
Are there any limitations to your current model that you aim to address in future iterations?
Yes, the BaseModel, while effective as a simple and interpretable baseline, has several inherent limitations due to its statistical nature and design constraints. We recognize these shortcomings and aim to address them in future iterations:
1. Lack of Spatiotemporal Context
Limitation:
The BaseModel corrects each grid point and lead day independently, ignoring spatial coherence (e.g., weather systems spanning regions) and temporal evolution (e.g., persistence of anomalies).
Future Improvement:
Introduce spatiotemporal smoothing or lightweight models (e.g., Gaussian process regression) that borrow strength across neighboring grids and lead times.
Explore low-rank representations of error fields to capture large-scale bias patterns.
2. Static Correction Parameters
Limitation:
Bias correction parameters are fixed after training and cannot adapt to sudden changes in NWP system behavior (e.g., model upgrades) or evolving climate conditions.
Future Improvement:
Implement online learning or rolling-window retraining to update correction parameters periodically using the latest forecast–observation pairs.
Monitor real-time performance metrics to trigger parameter recalibration when drift is detected.
3. Oversimplified Uncertainty Representation
Limitation:
Probabilistic forecasts are derived by assuming parametric error distributions around a single deterministic prediction, which may not reflect true forecast uncertainty—especially for extreme events or multimodal outcomes.
Future Improvement:
Move toward ensemble-based post-processing (e.g., using raw S2S ensemble members if available) to better sample uncertainty.
Adopt non-parametric probability estimation (e.g., kernel density estimation) based on historical analogs.
4. Limited Use of Predictive Covariates
Limitation:
The BaseModel only uses the target variable’s own forecast as input, ignoring potentially useful predictors like sea surface temperature, soil moisture, or stratospheric signals that influence sub-seasonal predictability.
Future Improvement:
Incorporate external boundary condition data (e.g., from reanalysis or satellite products) as additional features in a more advanced statistical or machine learning framework.
5. Inability to Capture Nonlinear Error Structures
Limitation:
Linear bias correction and quantile mapping assume stationary, monotonic relationships between forecasts and observations, which may break down under complex error regimes (e.g., conditional biases during El Niño).
Future Improvement:
Replace with nonlinear but still interpretable models, such as generalized additive models (GAMs) or gradient-boosted decision trees, trained on stratified climate states.
6. Dependence on High-Quality Reanalysis for Training
Limitation:
Performance relies heavily on CMARA as ground truth; any errors or inhomogeneities in the reanalysis propagate into the correction model.
Future Improvement:
Cross-validate against multiple reanalyses (e.g., ERA5, JRA-55) or blended observational datasets to improve robustness.
Develop reanalysis-agnostic calibration methods where possible.
While the BaseModel fulfills its purpose as a transparent and operational-ready baseline, these limitations highlight clear pathways toward more adaptive, physically informed, and skillful post-processing systems in future work.
Are there any other AI/ML model components or innovations that you wish to highlight?
For the BaseModel, it is important to clarify that it does not contain advanced AI or machine learning components—by design. Its purpose is to serve as a transparent, reproducible, and operationally feasible statistical baseline, not as an innovative AI system.
That said, we would like to highlight a few pragmatic methodological choices that reflect thoughtful application of classical ML/statistical principles in the sub-seasonal forecasting context:
1. Variable-Adaptive Post-Processing Strategy
While simple, the BaseModel employs different correction techniques tailored to each variable’s statistical properties:
Additive bias correction for near-Gaussian variables (temperature, SLP).
Non-parametric quantile mapping for skewed, non-negative variables (precipitation).
This demonstrates a foundational ML principle: match the algorithm to the data distribution—even without deep learning.
2. Climatology-Aligned Probabilistic Conversion
To meet the AI Weather Quest’s requirement for quintile probability forecasts, we developed a lightweight method to convert a single deterministic output into calibrated probabilities:
Quintile thresholds are fixed using long-term CMARA climatology (2015–2024).
Forecast uncertainty is approximated using historical error variance, enabling probabilistic interpretation without ensembles.
This approach bridges deterministic correction and probabilistic forecasting using only basic statistics—a practical innovation for operational settings with limited resources.
3. Strict Separation of Training and Real-Time Data Flow
The BaseModel enforces a clean pipeline:
All parameters are pre-computed offline using historical S2S–reanalysis pairs.
No real-time data assimilation or online learning occurs during forecasting.
This ensures full reproducibility and compliance with competition rules—highlighting that robust operational systems often prioritize reliability over complexity.
4. Multi-Source Readiness
Although primarily tuned on ECMWF, the BaseModel’s modular design allows plug-and-play use of other S2S sources (e.g., CMA, NCEP) by simply swapping in source-specific correction tables—demonstrating flexibility within a minimal framework.
Final Note
We intentionally avoided deep learning, neural networks, or complex ensembles in the BaseModel to establish a clear performance floor and ensure interpretability. The true AI/ML innovations in our work are reserved for our advanced model (NewMet, based on Transformer architectures and adaptive fusion), while the BaseModel stands as a testament to the enduring value of well-applied classical statistics in weather prediction.
Thus, while there are no cutting-edge AI components in the BaseModel, its design embodies principled, robust, and competition-compliant statistical post-processing—a necessary foundation for any trustworthy forecasting system.
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
The development of this model was a collaborative effort by a team based in Hubei, China. The contributors and their specific roles are as follows:
Wang Qinglong: Led the overall model architecture design and was primarily responsible for model implementation and integration.
Cheng Qin: Conducted forecast evaluation and validation, including skill scoring against observational datasets and performance benchmarking.
Zhou Ting: Managed data preprocessing, including harmonization, quality control, and formatting of multi-source S2S and reanalysis data.
Hu Yiyang and Feng Biao: Developed and maintained the core codebase, implemented the Transformer-based deep learning components, and ensured software robustness and reproducibility.
Ouyang Wei, Yang Wei, Yao Man, and Lu Yi: Contributed to model design discussions, assisted in data preprocessing workflows, and supported experimentation and analysis during the development phase.
This team collectively designed, built, validated, and deployed the NewMet model for the ECMWF AI Weather Quest competition.