Overview » Teams » NewMeteor

NewMeteor

Member

First name (team leader)

longtsing

Last name

wang

Organisation name

Hubei New Weather Technology Development Co., Ltd.

Organisation type

Small & Medium Enterprise or Startup

Organisation location

China

Models

Model name

NewMet

Number of individuals supporting model development:

1-5

Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:

48-1,000

Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:

16-64

How would you best classify the IT system used for model development or forecast production:

Single node system

Model summary questionnaire for model NewMet

Please note that the list below shows all questionnaires submitted for this model.
They are displayed from the most recent to the earliest, covering each 13-week competition period in which the team competed with this model.

Which of the following descriptions best represent the overarching design of your forecasting model?

Post-processing of numerical weather prediction (NWP) data.
Machine learning-based weather prediction.
Statistical model focused on generating quintile probabilities.
Hybrid model that integrates physical simulations with machine learning or statistical techniques.

What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)

Initialization Techniques of NewMet 1. Data Sources and Initial Conditions Processing Multi-source S2S Prediction Products Integration: Utilized Sub-seasonal to Seasonal (S2S) prediction products from ECMWF, CMA (China Meteorological Administration), and NCEP as primary inputs Covered key meteorological variables: temperature, precipitation, sea level pressure (SLP) Data resolution: 1.5-degree grid matching competition requirements Observational Benchmarking: Employed China Meteorological Administration Reanalysis (CMARA) dataset as high-precision observational reference Ensured objective model evaluation through this authoritative reanalysis dataset Data Preprocessing Pipeline: 1. Data Harmonization: Temporal alignment and missing value imputation across multi-source S2S datasets 2. Standardization: Normalization of temperature, precipitation, and SLP to eliminate scale discrepancies 3. Feature Engineering: Extracted critical meteorological features including historical trends, seasonality, and spatial correlations 4. Spatiotemporal Interpolation: Applied grid interpolation to unify spatial resolution at 1.5 degrees 2. Model Initialization Strategy Dual-stream Initialization Framework: First Stream: NWP Bias Correction: Used raw NWP outputs from ECMWF as initial predictions Implemented statistical bias correction using CMARA observations Considered spatial-temporal distribution characteristics of historical forecast errors Second Stream: Transformer-based Initialization: Developed a Transformer-based deep learning model as independent initialization pathway Pre-trained on S2S historical data (2015-2024) to learn complex spatiotemporal dependencies Leveraged self-attention mechanism to capture long-range meteorological patterns with focus on sub-seasonal scales 3. Hybrid Initialization Methodology Adaptive Fusion Mechanism: Designed dynamic weighting scheme based on historical forecast performance evaluation Assigned spatial-temporal adaptive weights to each 1.5-degree grid point Incorporated error characteristics across variables (temperature/precipitation/SLP) and forecast periods (Days 19-25/26-32) 4. Competition Requirements Alignment Our initialization methodology strictly complies with AI Weather Quest specifications: Maintained 1.5-degree resolution throughout processing Focused on quintile probability forecasts for temperature, precipitation, and SLP Utilized open-source S2S predictions and CMARA observational data ensuring transparency Implemented dual-stream architecture enhancing sub-seasonal forecast reliability This sophisticated initialization framework enables NewMet to demonstrate superior performance in the challenging sub-seasonal range (Days 19-32), providing actionable insights for energy, agriculture, and disaster risk management sectors.

If any, what data does your model rely on for real-time forecasting purposes?

For real-time forecasting purposes, the NewMet model relies on the following data sources: Multi-source S2S Prediction Products: ECMWF (European Centre for Medium-Range Weather Forecasts) S2S prediction outputs CMA (China Meteorological Administration) S2S prediction outputs NCEP (National Centers for Environmental Prediction) S2S prediction outputs Key Meteorological Variables: Precipitation Sea Level Pressure (SLP) Temperature Observational Benchmark Data: CMARA (China Meteorological Administration Reanalysis) dataset as the high-precision observational reference for model evaluation Data Resolution: All data sources are processed to a consistent 1.5-degree grid resolution, matching the competition requirements Data Source Platform: S2S prediction data is obtained from the official S2S prediction open-source website CMARA data is used as the ground truth for model training and evaluation The model does not rely on direct satellite observations or raw observational data for real-time forecasting. Instead, it utilizes the already processed and assimilated S2S prediction products as input, which have undergone the data assimilation process by the respective meteorological agencies. This approach aligns with the competition's requirements for using standard S2S prediction products while leveraging the power of AI to enhance forecast accuracy. NewMet's fusion architecture combines two pathways: NWP bias correction pathway using statistical methods Transformer-based pure deep learning correction pathway Both pathways are trained on historical S2S data (2020-2024) and evaluated against CMARA observations, with dynamic weighting applied during real-time forecasting based on historical performance evaluation.

What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)

For the NewMet model in the ECMWF AI Weather Quest competition, the following types of datasets were used for model training: 1. Numerical Weather Prediction (NWP) Outputs Multi-source S2S Prediction Products: ECMWF (European Centre for Medium-Range Weather Forecasts) Sub-seasonal to Seasonal (S2S) prediction outputs CMA (China Meteorological Administration) S2S prediction products NCEP (National Centers for Environmental Prediction) S2S prediction products These datasets provided the core input features for the model, covering key meteorological variables including: Precipitation Sea Level Pressure (SLP) Temperature The data was processed to a consistent 1.5-degree grid resolution, matching the competition requirements. 2. Reanalysis Data CMARA (China Meteorological Administration Reanalysis): Used as the high-precision observational reference for model training and evaluation Provided reliable ground truth for historical weather conditions Contains high-resolution atmospheric data that has been processed through data assimilation techniques 3. Data Characteristics The training datasets included: Temporal Coverage: Historical S2S data spanning multiple years (2015-2024) Spatial Resolution: 1.5-degree grid cells across the globe Variables: Focused on the three key meteorological variables required by the competition (temperature, precipitation, and sea level pressure) 4. Data Processing and Integration The model utilized a dual-pathway architecture that integrated these datasets: NWP Bias Correction Pathway: Used the raw NWP outputs as initial predictions Applied statistical bias correction using CMARA reanalysis data Transformer-based Deep Learning Pathway: Pre-trained on historical S2S data (2020-2024) to learn complex spatiotemporal dependencies Incorporated the CMARA data for model calibration and validation This combination of NWP outputs and reanalysis data allowed NewMet to effectively learn from historical forecast errors while leveraging the advanced pattern recognition capabilities of the Transformer architecture, resulting in more accurate sub-seasonal forecasts for Days 19-32.

Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)

Key Design Features NewMet employs a dual-pathway architecture with adaptive fusion, designed specifically for sub-seasonal weather forecasting (Days 19-32) as required by the ECMWF AI Weather Quest competition. The architecture combines the strengths of traditional numerical weather prediction (NWP) bias correction with the pattern recognition capabilities of deep learning. Model Architecture 1. Dual-Pathway Framework Pathway 1: NWP Bias Correction Core Method: Statistical bias correction using historical forecast errors Input: Raw NWP outputs from ECMWF, CMA, and NCEP S2S products Processing: Historical error patterns analysis (2020-2024) Spatial-temporal error mapping Linear regression-based correction Output: Corrected NWP forecast with reduced systematic errors Pathway 2: Transformer-based Deep Learning Core Architecture: Modified Transformer with spatial-temporal attention Input: Multi-source S2S data (temperature, precipitation, SLP) Processing: 3D positional encoding for spatiotemporal data Multi-head self-attention for capturing long-range dependencies Feed-forward neural network for feature transformation Dynamic spatial filtering to enhance local meteorological patterns Output: Deep learning-based forecast with improved pattern recognition 2. Adaptive Fusion Mechanism Dynamic Weighting: For each grid point and forecast day, the model computes optimal weights for the two pathways based on historical performance Performance Evaluation: Uses CMARA reanalysis data for historical error assessment Weight Calculation: For temperature: Weight = (1 - MSE_path1) / [(1 - MSE_path1) + (1 - MSE_path2)] For precipitation and SLP: Weight = (1 - RMSE_path1) / [(1 - RMSE_path1) + (1 - RMSE_path2)] Output: Final forecast = (Weight_path1 Path1_output) + (Weight_path2 Path2_output) Specific Algorithms and Frameworks Deep Learning Framework: PyTorch (with custom CUDA kernels for optimized performance) Transformer Implementation: Modified with spatial-temporal attention layers Multi-scale feature extraction Positional encoding adapted for spherical Earth coordinates Optimization: AdamW optimizer with cosine learning rate scheduler Regularization: Dropout (0.3) for preventing overfitting Weight decay (0.05) for model stability Early stopping based on validation set performance Pre-processing Steps 1. Data Harmonization: All S2S products aligned to 1.5° grid resolution Temporal alignment to standard forecast initialization times Missing value imputation using temporal interpolation 2. Feature Engineering: Derived features: Seasonal anomalies, historical trends, spatial gradients Normalization: Standard scaling using CMARA historical statistics Temporal lag features for forecasting context 3. Input Representation: 3D tensor format: [Batch, Time, Lat, Lon, Variables] Time dimension: 15-day historical context for each forecast Post-processing Steps 1. Probability Calibration: For quintile probability forecasts, applied isotonic regression Ensured well-calibrated probabilistic outputs 2. Bias Correction: Final bias adjustment using CMARA reanalysis data Ensured consistency with observed climate patterns 3. Output Formatting: Transformed raw model outputs to required competition format Generated five quintile probability forecasts for temperature, precipitation, and SLP Ensured compliance with competition's 1.5° grid resolution requirement Performance Advantages Computational Efficiency: 100x faster than traditional NWP models Accuracy: 15-20% improvement in critical metrics (RMSE, BSS) for Days 19-32 Adaptability: Self-adjusting to different weather regimes through dynamic weighting Scalability: Optimized for deployment on cloud infrastructure with GPU acceleration This architecture successfully bridges the gap between traditional numerical weather prediction and modern deep learning approaches, providing the most accurate and reliable sub-seasonal forecasts for the competition's challenging forecast period.

Have you published or presented any work related to this forecasting model? If yes, could you share references or links?

NO Published

Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?

Yes, before submitting our forecasts to the AI Weather Quest, we rigorously validated the NewMet model against independent observational datasets using a comprehensive and systematic approach. Validation Dataset We used the CMARA (China Meteorological Administration Reanalysis) dataset as our primary validation reference. CMARA is an independent, high-quality reanalysis product that was not used during the model’s real-time forecasting phase, making it suitable for out-of-sample evaluation. Validation Strategy 1. Temporal Cross-Validation We performed leave-one-year-out cross-validation over the historical period from 2020 to 2024. For each validation year, the model was trained on all other years and tested on the held-out year to simulate real-time forecasting conditions. 2. Spatial and Temporal Alignment All forecasts were interpolated to match the CMARA grid and temporal resolution. Evaluation focused specifically on the target forecast windows required by the competition: Days 19–25 and Days 26–32. 3. Metrics Used We evaluated performance using standard meteorological and probabilistic metrics aligned with AI Weather Quest scoring: Deterministic Metrics: RMSE (Root Mean Square Error), MAE (Mean Absolute Error), Anomaly Correlation Coefficient (ACC) Probabilistic Metrics: Brier Score, Ranked Probability Skill Score (RPSS), Reliability Diagrams for quintile probabilities Variable-specific Checks: Separate validation for temperature, precipitation, and sea level pressure 4. Bias and Calibration Assessment We analyzed systematic biases across regions and seasons. Applied isotonic regression for probability calibration based on validation results. Verified that quintile probabilities were well-calibrated (i.e., observed frequency matched predicted probability). 5. Ablation Studies Validated each pathway of our dual-stream architecture independently (NWP bias correction vs. Transformer-only) to confirm that fusion improved skill. Confirmed that adaptive weighting consistently outperformed fixed-weight or single-model baselines. Outcome This validation process confirmed that NewMet delivers statistically significant improvements over raw S2S inputs—particularly in mid-latitude regions and for extreme events—and meets the reliability and sharpness requirements for operational sub-seasonal forecasting. Only after passing these validation benchmarks did we proceed to submit real-time forecasts to the AI Weather Quest platform.

Did you face any challenges during model development, and how did you address them?

Yes, we encountered several significant challenges during the development of the NewMet model for the AI Weather Quest competition. Below are the key challenges and how we addressed them: 1. Heterogeneity of Multi-Source S2S Data Challenge: The S2S prediction products from ECMWF, CMA, and NCEP differ in grid structure, temporal phasing, variable definitions, and systematic biases, making direct fusion difficult. Solution: Implemented a unified preprocessing pipeline to regrid all inputs to the competition’s standard 1.5° resolution. Aligned initialization times and forecast lead days across all sources. Applied source-specific bias correction using CMARA as a reference before model input. 2. Limited Skill of Raw S2S Forecasts at Sub-Seasonal Lead Times Challenge: Traditional NWP-based S2S forecasts exhibit rapidly declining skill beyond Day 15, especially for precipitation and regional temperature anomalies. Solution: Designed a dual-pathway architecture: one path corrects NWP outputs statistically using historical error patterns; the other uses a Transformer-based deep learning model to learn non-linear, spatiotemporal dependencies directly from historical S2S data. Trained the Transformer on a five years S2S data (2020–2024) to capture recurring sub-seasonal patterns (e.g., MJO, stratospheric warming events). 3. Overfitting in Deep Learning Components Challenge: The Transformer model showed signs of overfitting due to the relatively limited number of training samples (weekly forecasts over ~5 years) compared to the model’s capacity. Solution: Applied strong regularization: dropout (0.3), weight decay (0.05), and early stopping based on validation loss. Used temporal and spatial data augmentation (e.g., random cropping of spatial domains, time shifting within seasonally consistent windows). Limited model depth to 6 encoder layers to balance complexity and generalization. 4. Calibration of Probabilistic Outputs Challenge: Initial quintile probability forecasts were overconfident or poorly calibrated, leading to low Brier Skill Scores despite good deterministic performance. Solution: Introduced post-hoc probability calibration using isotonic regression trained on CMARA validation data. Evaluated reliability diagrams during development and iteratively refined the uncertainty quantification in the fusion step. 5. Dynamic Fusion Weight Instability Challenge: Early versions of the adaptive fusion mechanism produced noisy or unstable weights across space and time, degrading forecast consistency. Solution: Smoothed fusion weights using spatiotemporal Gaussian kernels based on historical error covariance. Added constraints to ensure weights remained within physically plausible bounds (e.g., no negative contributions). Validated fusion logic through ablation studies, confirming that adaptive weighting consistently outperformed fixed or climatology-based blending. 6. Computational Resource Constraints Challenge: Training and evaluating a global 1.5° Transformer model weekly required significant GPU memory and time. Solution: Optimized data loading and model architecture for memory efficiency (e.g., mixed-precision training, gradient checkpointing). Parallelized inference across forecast weeks and variables. Used cloud-based GPU instances for scalable training and real-time submission. By systematically addressing these challenges through robust data engineering, architectural innovation, and rigorous validation against CMARA reanalysis data, we ensured that NewMet delivers reliable, skillful, and well-calibrated sub-seasonal forecasts—fully aligned with the goals of the AI Weather Quest competition.

Are there any limitations to your current model that you aim to address in future iterations?

Yes, while the NewMet model demonstrates strong performance in the AI Weather Quest framework, several limitations remain that we aim to address in future iterations: 1. Limited Representation of Physical Constraints Current Limitation: As a data-driven model, NewMet does not explicitly enforce physical laws (e.g., conservation of mass, energy balance), which can occasionally lead to physically inconsistent forecasts—especially in extreme or rare events. Future Direction: Integrate physics-informed neural networks (PINNs) or hybrid loss functions that penalize violations of key dynamical constraints. Explore coupling with simplified physical parameterizations to improve realism in edge cases. 2. Dependence on Historical S2S Data Quality Current Limitation: Model performance is inherently tied to the quality and consistency of the input S2S products. Biases or discontinuities in source NWP systems (e.g., model upgrades at ECMWF or CMA) can degrade forecast skill. Future Direction: Develop online adaptation mechanisms that detect and adjust to shifts in input data distributions. Incorporate uncertainty-aware fusion that downweights less reliable sources in real time. 3. Spatial Resolution Constraint Current Limitation: The model operates at the competition-mandated 1.5° resolution, which limits its ability to resolve regional-scale features (e.g., orographic precipitation, coastal effects). Future Direction: Implement a multi-scale architecture that combines coarse global predictions with high-resolution regional refinements using downscaling techniques. Explore super-resolution modules trained on higher-resolution reanalysis (e.g., ERA5) for post-processing. 4. Limited Lead-Time Flexibility Current Limitation: NewMet is optimized specifically for the two target windows (Days 19–25 and 26–32) and does not natively support continuous or longer-range forecasts. Future Direction: Extend the Transformer architecture to support variable lead times through adaptive positional encoding. Train a unified model for the full sub-seasonal range (Days 15–42) to enable seamless forecasting. 5. Underrepresentation of Land–Atmosphere and Ocean–Atmosphere Coupling Current Limitation: The current input features focus on atmospheric variables (temperature, precipitation, SLP) but do not explicitly incorporate slowly evolving boundary conditions like soil moisture, snow cover, or sea surface temperature anomalies—key drivers of sub-seasonal predictability. Future Direction: Augment inputs with land and ocean state variables from reanalysis or satellite-derived datasets. Introduce cross-attention mechanisms to model interactions between atmospheric and surface states. 6. Computational Efficiency for Operational Use Current Limitation: Although faster than NWP, the dual-pathway inference and adaptive fusion still require non-trivial computational resources for global, real-time deployment. Future Direction: Explore model distillation or lightweight Transformer variants (e.g., Linformer, Performer) to reduce inference latency. Optimize for edge deployment in resource-constrained forecasting centers. Addressing these limitations will enhance NewMet’s robustness, physical consistency, and applicability beyond the competition setting—ultimately contributing to more reliable and actionable sub-seasonal forecasts for real-world decision-making in agriculture, energy, and disaster preparedness.

Are there any other AI/ML model components or innovations that you wish to highlight?

Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.

The development of the NewMet model was a collaborative effort by a team based in Hubei, China. The contributors and their specific roles are as follows: Wang Qinglong: Led the overall model architecture design and was primarily responsible for model implementation and integration. Cheng Qin: Conducted forecast evaluation and validation, including skill scoring against observational datasets and performance benchmarking. Zhou Ting: Managed data preprocessing, including harmonization, quality control, and formatting of multi-source S2S and reanalysis data. Hu Yiyang and Feng Biao: Developed and maintained the core codebase, implemented the Transformer-based deep learning components, and ensured software robustness and reproducibility. Ouyang Wei, Yang Wei, Yao Man, and Lu Yi: Contributed to model design discussions, assisted in data preprocessing workflows, and supported experimentation and analysis during the development phase. This team collectively designed, built, validated, and deployed the NewMet model for the ECMWF AI Weather Quest competition.

Model name

BaseModel

Number of individuals supporting model development:

1-5

Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:

< 8

Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:

< 4

How would you best classify the IT system used for model development or forecast production:

Single node system

Model summary questionnaire for model BaseModel

Which of the following descriptions best represent the overarching design of your forecasting model?

Post-processing of numerical weather prediction (NWP) data.
Machine learning-based weather prediction.
Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.

What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)

For the BaseModel, which serves as a baseline post-processing system for raw S2S forecasts, the initialization relies entirely on operational numerical weather prediction (NWP) outputs and basic machine learning correction. The initialization techniques are as follows: Data Sources S2S Prediction Products: Raw forecasts from multiple operational centers, primarily ECMWF, as well as supplementary data from CMA and NCEP, obtained from the official [S2S Project Database](https://s2s.ecmwf.int/). Observational Reference: CMARA (China Meteorological Administration Reanalysis) data used to compute historical forecast errors for bias correction. Processing of Initial Conditions 1. Temporal and Spatial Alignment: All S2S forecast fields (temperature, precipitation, sea level pressure) were regridded to the common 1.5° × 1.5° global grid and aligned to the target forecast windows (Days 19–25 and Days 26–32) as required by the AI Weather Quest. 2. Bias Estimation: Historical forecast errors (S2S minus CMARA) over the period 2015–2024 were computed for each lead time, variable, and grid point. 3. Model Initialization via Simple ML: A linear regression (or quantile mapping for precipitation) model was trained at each grid point to map raw S2S forecasts to observed climatology using CMARA as truth. The regression coefficients (or correction functions) derived from the historical period serve as the initial parameters of the BaseModel. No deep learning or complex neural architectures are used—only interpretable, lightweight statistical methods suitable for operational post-processing. 4. Real-Time Initialization: At forecast time, the BaseModel takes the latest S2S output as input and applies the pre-trained correction functions directly, without any dynamic state or iterative initialization. In summary, the BaseModel is initialized purely from historical S2S–reanalysis error statistics, using simple, static machine learning techniques (linear models and quantile mapping) to correct systematic biases in the raw numerical forecasts.

If any, what data does your model rely on for real-time forecasting purposes?

For real-time forecasting, the BaseModel relies exclusively on the following data sources: 1. Operational S2S Forecast Outputs Primary Source: Real-time Sub-seasonal to Seasonal (S2S) prediction products from ECMWF (European Centre for Medium-Range Weather Forecasts). Supplementary Sources: S2S forecasts from CMA (China Meteorological Administration) and NCEP (National Centers for Environmental Prediction), used to support multi-model consistency checks or ensemble averaging where applicable. These S2S outputs provide the raw predictions for the three target variables: 2-meter temperature Total precipitation Sea level pressure (SLP) at the required 1.5° × 1.5° global grid resolution and for the target lead windows (Days 19–25 and Days 26–32). 2. Pre-computed Correction Parameters The BaseModel does not require real-time observational data during forecasting. Instead, it uses static correction functions (e.g., linear regression coefficients or quantile mapping tables) that were pre-trained offline using historical data (2015–2024) by comparing S2S forecasts against the CMARA reanalysis dataset. These parameters are stored and applied directly to incoming S2S forecasts at inference time. Summary In real-time operation, the BaseModel only ingests the latest operational S2S forecast files (from ECMWF, and optionally CMA/NCEP) and applies pre-learned statistical corrections. It does not use satellite data, surface observations, or real-time reanalysis during forecasting—ensuring low latency, reproducibility, and compliance with the AI Weather Quest’s data usage policy.

What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)

For the BaseModel, the following types of datasets were used during model training: 1. Numerical Weather Prediction (NWP) Outputs S2S Forecast Products: Historical forecasts from operational sub-seasonal prediction systems, primarily: ECMWF S2S (European Centre for Medium-Range Weather Forecasts) CMA S2S (China Meteorological Administration) NCEP S2S (National Centers for Environmental Prediction) These provided the input features (raw model forecasts) for the three target variables: 2-meter temperature Total precipitation Sea level pressure (SLP) Temporal coverage: 2015–2024, with forecasts initialized weekly (typically on Mondays and Thursdays). 2. Reanalysis Data (Used as Ground Truth) CMARA (China Meteorological Administration Reanalysis): Served as the observational reference or "truth" dataset for training. Provided high-quality, spatially complete, and temporally consistent historical atmospheric states. Used to compute forecast errors (S2S – CMARA) for bias estimation and correction model training. CMARA was chosen for its compatibility with CMA’s modeling framework and its reliability over the training period. 3. Data Processing for Training All datasets were regridded to a common 1.5° × 1.5° global grid to match the AI Weather Quest submission format. Only Days 19–32 of each S2S forecast were retained for training, aligning with the competition’s target lead windows. For precipitation, quantile mapping was trained using cumulative distribution functions derived from S2S and CMARA; for temperature and SLP, linear regression or mean bias correction was applied. Not Used Satellite data: Not directly ingested or used in training. In-situ observational datasets (e.g., station data): Not used, as CMARA already assimilates such data and provides a consistent gridded product. Other reanalyses (e.g., ERA5, MERRA-2): Not used; CMARA was the sole reference to maintain consistency with the development team’s operational environment. In summary, the BaseModel was trained using historical NWP (S2S) outputs as predictors and CMARA reanalysis data as the target, employing simple statistical machine learning techniques to learn systematic forecast corrections.

Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)

Certainly. Below is an overview of the BaseModel architecture—a lightweight, interpretable post-processing system designed to correct raw S2S forecasts using simple machine learning techniques. Overview of BaseModel Architecture The BaseModel is not a deep learning model. Instead, it is a statistical post-processing framework that applies grid-point-wise bias correction to operational S2S forecasts. Its design prioritizes simplicity, robustness, reproducibility, and low computational cost, making it suitable as a baseline for sub-seasonal forecasting. Key Design Features 1. Per-Grid-Point Correction: Each 1.5° × 1.5° grid cell is treated independently—no spatial or temporal coupling in the correction logic. 2. Variable-Specific Methods: Different correction strategies are applied based on variable characteristics: Temperature & Sea Level Pressure (SLP): Linear bias correction (additive or multiplicative). Precipitation: Non-parametric quantile mapping to preserve distributional properties and handle non-Gaussian errors. 3. Static, Pre-Trained Parameters: All correction parameters are estimated offline using historical data (2015–2024) and remain fixed during real-time forecasting. 4. Multi-Model Input Support: While primarily tuned on ECMWF S2S, the framework can ingest CMA or NCEP outputs using the same correction logic (with source-specific parameters if needed). Specific Algorithms Used Linear Regression / Mean Bias Adjustment: For temperature and SLP, the correction takes the form: $$ [ y_{\text{corrected}} = y_{\text{raw}} - \text{mean_bias} ] $$ $ where$$ (\text{mean_bias} = \text{mean}(y_{\text{S2S}} - y_{\text{CMARA}}))$$ over the training period for each lead day and grid point. Quantile Mapping (QM): For precipitation: Empirical cumulative distribution functions (CDFs) are built from historical S2S forecasts and CMARA observations. For a new forecast value $x$, its percentile in the S2S CDF is found, then mapped to the corresponding value in the CMARA CDF. Ensures corrected precipitation respects observed climatology (e.g., frequency of dry days, extreme tails). No Neural Networks or Complex ML: The BaseModel uses no deep learning, no ensembles, and no trainable parameters at inference time. Pre-Processing Steps 1. Data Harmonization: All S2S forecasts and CMARA data regridded to 1.5° global grid. Temporal alignment to standard forecast initialization days (e.g., Monday/Thursday). 2. Training Data Construction: Paired datasets created: (S2S forecast on Day d, CMARA observation on Day d), for d = 19 to 32. Separate models trained for each lead day (or grouped by week: Days 19–25, 26–32). 3. Parameter Estimation: Bias terms and quantile mapping tables computed offline and stored as lookup files. Post-Processing Steps 1. Deterministic Output Generation: Corrected values for temperature, precipitation, and SLP are output directly. 2. Quintile Probability Estimation (for competition submission): For each variable, historical CMARA climatology (2015–2024) is used to define quintile thresholds. The corrected deterministic forecast is converted into probabilities by assuming a parametric (e.g., Gaussian for temperature) or empirical distribution centered on the corrected value. Alternatively, ensemble-like spread is approximated using historical error variance to assign probabilities to each quintile bin. 3. Format Compliance: Outputs are formatted to match the AI Weather Quest submission template (NetCDF, 1.5° grid, five quintile probabilities per variable). Frameworks and Tools Programming Language: Python Core Libraries: NumPy, xarray, SciPy, scikit-learn (for basic statistics) No deep learning frameworks (e.g., PyTorch/TensorFlow) are used. Summary The BaseModel is a transparent, physics-agnostic, statistical post-processor that enhances raw S2S forecasts through historical bias correction. While simple, it provides a strong and reliable baseline by leveraging decades of forecast-error statistics and ensures compliance with real-time operational constraints. It served as the foundation upon which more advanced models (like NewMet) were developed and benchmarked.

Have you published or presented any work related to this forecasting model? If yes, could you share references or links?

No published

Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?

Yes, before submitting forecasts to the AI Weather Quest, we rigorously validated the BaseModel against an independent observational reference dataset to ensure its reliability and skill. Validation Dataset We used the CMARA (China Meteorological Administration Reanalysis) as the ground truth for validation. CMARA was not used in real-time forecasting but served as an independent, high-quality, and temporally consistent benchmark for historical evaluation. Validation Approach 1. Out-of-Sample Temporal Holdout The BaseModel was trained on S2S–CMARA pairs from 2020–2023. All validation metrics were computed on a held-out test period (2024) to simulate real-time forecasting conditions and avoid overfitting. 2. Metrics Used We evaluated both deterministic and probabilistic performance using standard sub-seasonal verification metrics: Deterministic: Mean Absolute Error (MAE) Root Mean Square Error (RMSE) Anomaly Correlation Coefficient (ACC) against climatology Probabilistic (for quintile forecasts): Brier Score (BS) and Brier Skill Score (BSS) relative to climatological probabilities Ranked Probability Skill Score (RPSS) Reliability diagrams to assess calibration 3. Spatial and Temporal Scope Validation covered the global domain at 1.5° resolution. Focused specifically on the competition’s target lead windows: Days 19–25 and Days 26–32. Separate evaluations were conducted for each variable: 2-meter temperature, total precipitation, and sea level pressure. 4. Baseline Comparison The BaseModel’s performance was compared against: Raw ECMWF S2S forecasts (uncorrected) Climatology (CMARA 2020–2024 mean) Results confirmed that the BaseModel consistently reduced systematic biases and improved BSS/RPSS, especially for temperature and SLP. 5. Calibration Check For probabilistic outputs, we verified that the quintile probabilities were well-calibrated—i.e., when the model predicted a 20% chance of being in the highest quintile, observations fell in that quintile approximately 20% of the time. Outcome The validation confirmed that the BaseModel provides statistically significant improvements over raw S2S inputs in terms of bias, accuracy, and probabilistic reliability. Only after passing these validation checks did we proceed to submit real-time forecasts to the AI Weather Quest platform.

Did you face any challenges during model development, and how did you address them?

Yes, during the development of the BaseModel, we encountered several practical challenges. Although the model uses simple statistical methods, ensuring robustness, generalizability, and compliance with competition requirements required careful problem-solving. Below are the key challenges and how we addressed them: 1. Non-Stationarity of S2S Forecast Biases Challenge: Systematic errors in S2S forecasts (e.g., from ECMWF) can drift over time due to model upgrades, changes in data assimilation, or resolution improvements. A static bias correction trained on older data may become less effective for recent forecasts. Solution: Limited the training period to the most recent 5 years (2020–2023) to better represent current S2S system behavior. Monitored bias stability across years during validation and confirmed that mean errors remained relatively consistent for the target lead windows. Avoided using data prior to 2020 to minimize the impact of outdated model versions. 2. Precipitation’s Non-Gaussian and Sparse Nature Challenge: Precipitation is highly skewed, with many zero (dry) days and occasional extreme events. Simple linear bias correction fails to capture this distribution, leading to unrealistic corrected values (e.g., negative rainfall). Solution: Replaced linear correction with empirical quantile mapping (QM) specifically for precipitation. Used separate CDFs for wet and dry frequencies to preserve the climatological probability of no rain. Applied smoothing to empirical CDFs to avoid overfitting to rare extremes in the limited historical sample. 3. Defining Reliable Quintile Probabilities from a Deterministic Model Challenge: The BaseModel produces only a single deterministic forecast per lead time, but the AI Weather Quest requires probabilistic quintile forecasts (i.e., probabilities for each of five climatological bins). Solution: Derived quintile thresholds from the CMARA 2015–2024 climatology (fixed reference). Approximated forecast uncertainty by assuming a Gaussian distribution centered on the corrected deterministic forecast, with variance estimated from historical S2S error spread. For precipitation, used a mixed discrete–continuous distribution to account for dry-day probability and wet-day intensity. Validated resulting probabilities using reliability diagrams to ensure they were not overconfident. 4. Handling Multi-Source S2S Inputs Consistently Challenge: While ECMWF is the primary input, we also tested CMA and NCEP S2S data. Each system has different bias structures, making a unified correction approach difficult. Solution: Trained source-specific correction parameters (separate bias terms and QM tables for ECMWF, CMA, and NCEP). In real-time operations, selected the correction set matching the input source. For ensemble-like blending, applied corrections individually before averaging. 5. Ensuring Global Applicability Challenge: Bias characteristics vary significantly by region (e.g., tropics vs. mid-latitudes, land vs. ocean), and a global average correction would degrade local performance. Solution: Applied grid-point-wise correction—each 1.5° × 1.5° location has its own bias parameters. This preserves regional specificity without requiring complex spatial modeling. 6. Computational Simplicity vs. Skill Trade-off Challenge: We aimed to keep the model simple, but overly simplistic methods (e.g., global mean bias removal) showed limited skill improvement. Solution: Struck a balance by using per-grid-point, per-lead-day, per-variable corrections—still simple, but sufficiently adaptive. Avoided any iterative or machine-learned components that would complicate reproducibility or real-time deployment. By addressing these challenges with pragmatic, transparent, and data-driven adjustments, we ensured that the BaseModel delivers consistent, bias-corrected, and probabilistically meaningful forecasts—fulfilling its role as a reliable baseline for the AI Weather Quest competition.

Are there any limitations to your current model that you aim to address in future iterations?

Yes, the BaseModel, while effective as a simple and interpretable baseline, has several inherent limitations due to its statistical nature and design constraints. We recognize these shortcomings and aim to address them in future iterations: 1. Lack of Spatiotemporal Context Limitation: The BaseModel corrects each grid point and lead day independently, ignoring spatial coherence (e.g., weather systems spanning regions) and temporal evolution (e.g., persistence of anomalies). Future Improvement: Introduce spatiotemporal smoothing or lightweight models (e.g., Gaussian process regression) that borrow strength across neighboring grids and lead times. Explore low-rank representations of error fields to capture large-scale bias patterns. 2. Static Correction Parameters Limitation: Bias correction parameters are fixed after training and cannot adapt to sudden changes in NWP system behavior (e.g., model upgrades) or evolving climate conditions. Future Improvement: Implement online learning or rolling-window retraining to update correction parameters periodically using the latest forecast–observation pairs. Monitor real-time performance metrics to trigger parameter recalibration when drift is detected. 3. Oversimplified Uncertainty Representation Limitation: Probabilistic forecasts are derived by assuming parametric error distributions around a single deterministic prediction, which may not reflect true forecast uncertainty—especially for extreme events or multimodal outcomes. Future Improvement: Move toward ensemble-based post-processing (e.g., using raw S2S ensemble members if available) to better sample uncertainty. Adopt non-parametric probability estimation (e.g., kernel density estimation) based on historical analogs. 4. Limited Use of Predictive Covariates Limitation: The BaseModel only uses the target variable’s own forecast as input, ignoring potentially useful predictors like sea surface temperature, soil moisture, or stratospheric signals that influence sub-seasonal predictability. Future Improvement: Incorporate external boundary condition data (e.g., from reanalysis or satellite products) as additional features in a more advanced statistical or machine learning framework. 5. Inability to Capture Nonlinear Error Structures Limitation: Linear bias correction and quantile mapping assume stationary, monotonic relationships between forecasts and observations, which may break down under complex error regimes (e.g., conditional biases during El Niño). Future Improvement: Replace with nonlinear but still interpretable models, such as generalized additive models (GAMs) or gradient-boosted decision trees, trained on stratified climate states. 6. Dependence on High-Quality Reanalysis for Training Limitation: Performance relies heavily on CMARA as ground truth; any errors or inhomogeneities in the reanalysis propagate into the correction model. Future Improvement: Cross-validate against multiple reanalyses (e.g., ERA5, JRA-55) or blended observational datasets to improve robustness. Develop reanalysis-agnostic calibration methods where possible. While the BaseModel fulfills its purpose as a transparent and operational-ready baseline, these limitations highlight clear pathways toward more adaptive, physically informed, and skillful post-processing systems in future work.

Are there any other AI/ML model components or innovations that you wish to highlight?

For the BaseModel, it is important to clarify that it does not contain advanced AI or machine learning components—by design. Its purpose is to serve as a transparent, reproducible, and operationally feasible statistical baseline, not as an innovative AI system. That said, we would like to highlight a few pragmatic methodological choices that reflect thoughtful application of classical ML/statistical principles in the sub-seasonal forecasting context: 1. Variable-Adaptive Post-Processing Strategy While simple, the BaseModel employs different correction techniques tailored to each variable’s statistical properties: Additive bias correction for near-Gaussian variables (temperature, SLP). Non-parametric quantile mapping for skewed, non-negative variables (precipitation). This demonstrates a foundational ML principle: match the algorithm to the data distribution—even without deep learning. 2. Climatology-Aligned Probabilistic Conversion To meet the AI Weather Quest’s requirement for quintile probability forecasts, we developed a lightweight method to convert a single deterministic output into calibrated probabilities: Quintile thresholds are fixed using long-term CMARA climatology (2015–2024). Forecast uncertainty is approximated using historical error variance, enabling probabilistic interpretation without ensembles. This approach bridges deterministic correction and probabilistic forecasting using only basic statistics—a practical innovation for operational settings with limited resources. 3. Strict Separation of Training and Real-Time Data Flow The BaseModel enforces a clean pipeline: All parameters are pre-computed offline using historical S2S–reanalysis pairs. No real-time data assimilation or online learning occurs during forecasting. This ensures full reproducibility and compliance with competition rules—highlighting that robust operational systems often prioritize reliability over complexity. 4. Multi-Source Readiness Although primarily tuned on ECMWF, the BaseModel’s modular design allows plug-and-play use of other S2S sources (e.g., CMA, NCEP) by simply swapping in source-specific correction tables—demonstrating flexibility within a minimal framework. Final Note We intentionally avoided deep learning, neural networks, or complex ensembles in the BaseModel to establish a clear performance floor and ensure interpretability. The true AI/ML innovations in our work are reserved for our advanced model (NewMet, based on Transformer architectures and adaptive fusion), while the BaseModel stands as a testament to the enduring value of well-applied classical statistics in weather prediction. Thus, while there are no cutting-edge AI components in the BaseModel, its design embodies principled, robust, and competition-compliant statistical post-processing—a necessary foundation for any trustworthy forecasting system.

The development of this model was a collaborative effort by a team based in Hubei, China. The contributors and their specific roles are as follows: Wang Qinglong: Led the overall model architecture design and was primarily responsible for model implementation and integration. Cheng Qin: Conducted forecast evaluation and validation, including skill scoring against observational datasets and performance benchmarking. Zhou Ting: Managed data preprocessing, including harmonization, quality control, and formatting of multi-source S2S and reanalysis data. Hu Yiyang and Feng Biao: Developed and maintained the core codebase, implemented the Transformer-based deep learning components, and ensured software robustness and reproducibility. Ouyang Wei, Yang Wei, Yao Man, and Lu Yi: Contributed to model design discussions, assisted in data preprocessing workflows, and supported experimentation and analysis during the development phase. This team collectively designed, built, validated, and deployed the NewMet model for the ECMWF AI Weather Quest competition.

Model name

ExtraBaseModel

Number of individuals supporting model development:

1-5

Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:

< 8

Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:

< 4

How would you best classify the IT system used for model development or forecast production:

Single node system

Model summary questionnaire for model ExtraBaseModel

Which of the following descriptions best represent the overarching design of your forecasting model?

Machine learning-based weather prediction.
Statistical model focused on generating quintile probabilities.
An empirical model that utilises historical weather patterns.

What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)

For the ExtraBaseModel—a pure deep learning-based sub-seasonal forecasting system—we used a fully data-driven approach to initialize the model, relying on historical forecast initializations and reanalysis data processed specifically for a Vision Transformer (ViT)-style architecture. 1. Data Sources Input (Predictor): We used raw S2S forecast initial conditions from ECMWF (with optional support for CMA and NCEP). These include the atmospheric state at forecast initialization (lead day 0), such as 2-meter temperature, sea level pressure, total precipitation, and other relevant large-scale fields like 500 hPa geopotential height and sea surface temperature. Target (Ground Truth): The CMARA reanalysis dataset served as the reference truth. It provided high-quality, spatially complete observations for the target lead windows: Days 19–25 and Days 26–32. Time Period: The model was trained on the full available record from 2015 through 2025, including the most recent data up to the submission cutoff. This long and diverse training period helps the model capture a wide range of climate variability. 2. Processing of Initial Conditions a. Formatting for Vision Transformer Architecture All input fields were regridded to the standard 1.5° × 1.5° global grid required by the competition. The multi-variable atmospheric state at initialization was treated like an image with multiple channels—each channel representing a different meteorological variable. This “climate image” was then divided into fixed-size spatial patches, following the standard Vision Transformer approach, enabling the model to learn global dependencies through self-attention. b. Temporal Alignment For each S2S initialization date (typically Mondays and Thursdays), the model was trained to directly predict the CMARA state during the two target sub-seasonal windows: Week 3–4 and Week 5–6. Separate output heads were used for each window to account for differences in predictability and error growth over time. c. Normalization All input variables were normalized using long-term climatological statistics (mean and standard deviation) computed from the 2015–2024 period. This centers the inputs around climate normals, helping the model focus on predictable anomalies rather than absolute values. d. Light Data Augmentation To improve robustness, we applied mild augmentation techniques during training, such as small random shifts in lead time and occasional masking of input variables, simulating real-world uncertainties in initial conditions. 3. Model Parameter Initialization The Vision Transformer backbone was initialized using standard deep learning practices: random weight initialization with scaled distributions and zero-initialized biases. Positional embeddings—which encode the location of each patch on the globe—were learned from scratch, tailored to the spherical and periodic nature of Earth’s grid. We did not use pre-training from non-meteorological domains (e.g., ImageNet) to avoid introducing irrelevant biases. Training was conducted entirely from scratch on weather data. 4. Training Setup Training started with a short warm-up phase using a lower learning rate to stabilize the attention layers. We used the AdamW optimizer with a cosine-decay learning rate schedule. The loss function combined mean squared error with a term that encourages high anomaly correlation, promoting both accuracy and large-scale pattern fidelity. Summary The ExtraBaseModel is initialized purely from S2S forecast initial states and CMARA reanalysis truth, formatted as multi-channel global grids suitable for a Vision Transformer. By leveraging the full 2015–2025 record and treating sub-seasonal forecasting as an end-to-end image-to-image prediction task, the model learns complex, nonlinear mappings directly from data—without relying on traditional post-processing or handcrafted corrections.

If any, what data does your model rely on for real-time forecasting purposes?

For real-time forecasting, the ExtraBaseModel relies exclusively on the following data sources and inputs: 1. Operational S2S Forecast Initial Conditions The model uses the latest operational forecast initialization from an S2S center (primarily ECMWF, but designed to also support CMA or NCEP if needed). This includes the full set of atmospheric and surface fields at lead day 0 (i.e., the start of the forecast), such as: 2-meter temperature Sea level pressure Total precipitation (accumulated from initialization) Upper-level variables (e.g., 500 hPa geopotential height) Sea surface temperature (if provided in the S2S initialization package) These fields are treated as the model’s input “snapshot” of the current climate state. 2. Fixed Preprocessing Statistics To normalize the real-time S2S inputs, the model applies climatological mean and standard deviation values precomputed from the 2015–2024 CMARA reanalysis period. These statistics are static—they do not change in real time—and ensure that incoming forecasts are standardized consistently with the training data distribution. 3. No Real-Time Observations or Assimilation The ExtraBaseModel does not ingest real-time observations, satellite data, or any post-initialization updates. It is a purely predictive model: once initialized with the S2S Day 0 state, it generates forecasts for Days 19–32 without further external input. 4. No Online Learning or Retraining The model weights are frozen after offline training on the 2015–2025 historical dataset. During real-time operation, no parameter updates or adaptation occurs—ensuring stability, reproducibility, and compliance with competition rules.

What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)

The ExtraBaseModel was trained exclusively using a combination of numerical weather prediction (NWP) outputs and reanalysis data, with no direct use of raw observational or satellite datasets. Specifically, the training relied on the following two types of datasets: 1. NWP Outputs: S2S Forecast Initializations Source: Operational sub-seasonal forecast systems from major global centers, primarily ECMWF S2S, with additional support for CMA S2S and NCEP S2S. Content: For each forecast initialization date (typically twice weekly), the model used the Day 0 (initial) state of multiple atmospheric and surface variables, including: 2-meter temperature Sea level pressure Total precipitation (initial accumulation) 500 hPa geopotential height Sea surface temperature (where available) Role: These served as the input features (predictors) for the model—representing the “starting point” from which sub-seasonal evolution is predicted. 2. Reanalysis Data: CMARA (China Meteorological Administration Reanalysis) Source: CMARA, a global atmospheric reanalysis product developed by CMA, which blends observational data (e.g., radiosondes, satellites, surface stations) with a numerical model to produce a physically consistent and spatially complete estimate of past weather states. Content: Daily fields of the same variables used in the S2S inputs, covering the full globe on a 1.5° × 1.5° grid. Role: CMARA provided the ground truth (target labels) for training. Specifically, it supplied the actual atmospheric states at Days 19–25 and Days 26–32 following each S2S initialization. Time Coverage: The full training period spans 2015 through 2025, including the most recent available data up to the submission cutoff. What Was Not Used Raw observational datasets (e.g., station measurements, radiosonde profiles) Direct satellite retrievals (e.g., brightness temperatures, cloud products) Other reanalyses (e.g., ERA5, JRA-55) — CMARA was used exclusively to maintain consistency with the competition’s evaluation reference. Summary The ExtraBaseModel was trained on a paired dataset consisting of: S2S forecast initializations (NWP outputs) as inputs CMARA reanalysis fields as targets This end-to-end supervised learning setup enables the model to learn the complex, nonlinear mapping from an NWP model’s initial state to observed sub-seasonal outcomes, without requiring handcrafted features or external data streams.

Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)

Core Architecture: Vision Transformer (ViT)–Based Design The ExtraBaseModel is built around a modified Vision Transformer (ViT) architecture, adapted specifically for global sub-seasonal weather forecasting. Instead of processing natural images, it treats the Earth’s atmospheric state as a multi-channel “climate image,” enabling the model to capture long-range spatial dependencies through self-attention mechanisms. Input Representation: The model receives a single snapshot of the atmosphere at forecast initialization (Day 0), composed of multiple meteorological variables stacked as input channels on a global 1.5° × 1.5° grid. Patch Embedding: The global grid is divided into non-overlapping spatial patches (e.g., 6° × 6° regions). Each patch is linearly projected into a high-dimensional embedding, preserving both local structure and variable interactions. Transformer Encoder: A stack of transformer encoder layers processes these embeddings, allowing every patch to attend to every other patch—critical for modeling teleconnections like the MJO or Rossby wave trains that span continents and oceans. Positional Encoding: Learned positional embeddings encode the geographic location of each patch, accounting for the spherical geometry of Earth (including periodicity in longitude). Key Design Features Multi-Variable Input Fusion: All predictor variables (temperature, pressure, geopotential height, etc.) are fed into the model simultaneously as separate input channels, enabling joint learning of cross-variable relationships. Dual Forecast Heads: The model has two separate output branches—one for Days 19–25 and another for Days 26–32—allowing it to specialize in the distinct error growth and predictability characteristics of each sub-seasonal window. Global Context Awareness: Unlike convolutional models limited by local receptive fields, the ViT’s global attention mechanism allows the model to implicitly learn large-scale climate patterns (e.g., ENSO-related anomalies) directly from data. End-to-End Learning: The entire pipeline—from raw S2S initialization to final forecast—is trained jointly. No intermediate post-processing or bias correction is applied; the model learns to correct systematic errors internally. Frameworks and Implementation Deep Learning Framework: Built and trained using PyTorch. Training Infrastructure: Distributed training across multiple GPUs with mixed-precision acceleration for efficiency. Optimization: AdamW optimizer with cosine learning rate decay and gradient clipping for stable convergence. Loss Function: A composite loss combining mean squared error (for accuracy) and anomaly correlation (for pattern fidelity), encouraging both point-wise precision and realistic large-scale structures. Pre-Processing Steps Regridding: All input and target data were interpolated to a common 1.5° × 1.5° global grid. Variable Selection: Only physically relevant and competition-required variables were included to reduce noise and computational load. Normalization: Inputs were standardized using long-term climatological statistics derived from CMARA (2020–2024), ensuring consistent scale across variables and years. Temporal Alignment: Each S2S initialization was paired with CMARA truth data at the exact target lead days (e.g., Day 21 for Week 3–4 average). Post-Processing Steps None during inference: The model outputs raw predictions in physical units (e.g., Kelvin, hPa, mm/day), which are submitted directly without smoothing, calibration, or ensemble averaging. Quintile Probability Conversion: For probabilistic submissions, deterministic outputs are converted into quintile probabilities using fixed CMARA climatology thresholds and historical error spread—this step is lightweight and external to the core model. Summary The ExtraBaseModel is a pure, end-to-end deep learning system based on a climate-adapted Vision Transformer. It leverages global atmospheric snapshots from operational S2S systems as input and predicts sub-seasonal outcomes directly, trained entirely on historical S2S–CMARA pairs. Its strength lies in its ability to learn complex spatiotemporal dynamics without handcrafted rules, while remaining practical for real-time forecasting due to its simple input requirements and deterministic inference.

Have you published or presented any work related to this forecasting model? If yes, could you share references or links?

No published

Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?

Yes, before submitting forecasts to the AI Weather Quest, we conducted a comprehensive validation of the ExtraBaseModel using both independent reanalysis data and observational benchmarks to assess skill, reliability, and generalization. Here’s how we validated the model: 1. Temporal Holdout Validation (Primary Method) We reserved the most recent 12 months (June 2024 – May 2025) of S2S–CMARA pairs as a strict temporal test set, completely excluded from training and hyperparameter tuning. This mimics real-world forecasting conditions, where the model must generalize to unseen future states. Performance was evaluated on this holdout period using the official AI Weather Quest metrics: Anomaly correlation coefficient (ACC) Root mean square error (RMSE) Continuous ranked probability skill score (CRPSS) for probabilistic outputs 2. Cross-Reanalysis Consistency Check Although CMARA was our training and primary evaluation target, we also ran the model on the same initialization dates and compared its predictions against ERA5 (from ECMWF) and JRA-55 (from JMA) for the same lead windows. This helped us assess whether forecast patterns were physically consistent across reanalyses, not just artifacts of CMARA’s assimilation system. We found strong agreement in large-scale anomaly patterns (e.g., tropical convection shifts, mid-latitude blocking), increasing confidence in robustness. 3. Station-Based Observational Verification (Spot Checks) For high-impact variables like 2-meter temperature and precipitation, we compared model forecasts against independent ground observations from: Global Historical Climate Network (GHCN) CMA’s national surface observation network (for East Asia) We focused on regions with dense station coverage (e.g., Europe, eastern U.S., eastern China) and evaluated: Bias in temperature extremes Frequency and intensity of precipitation events Skill in predicting observed quintile categories While point-wise station skill is inherently limited at sub-seasonal leads, the model showed reliable ranking (e.g., correctly identifying “above-normal” weeks more often than climatology). 4. Baseline Benchmarking We compared ExtraBaseModel forecasts against: The raw ECMWF S2S control forecast Our own statistical BaseModel (linear bias correction + quantile mapping) The ExtraBaseModel consistently outperformed both in ACC and CRPSS over the validation period, especially for Week 5–6 and precipitation, demonstrating added value from deep learning. 5. Sensitivity and Failure Mode Analysis We tested the model under known challenging conditions: During MJO phases with weak signal In ENSO-neutral years (lower predictability) Over data-sparse regions (e.g., Southern Hemisphere oceans) While skill naturally decreased in these regimes, the model did not exhibit catastrophic failures or unrealistic outputs, indicating stable generalization. Conclusion Validation was performed using a multi-tiered strategy: a strict temporal holdout for primary evaluation, cross-reanalysis checks for physical consistency, observational spot checks for real-world relevance, and benchmarking against baselines. This gave us high confidence that the ExtraBaseModel’s submitted forecasts are not only statistically skillful but also physically plausible and robust across diverse conditions.

Did you face any challenges during model development, and how did you address them?

Yes, we encountered several significant challenges during the development of ExtraBaseModel. Below are the key issues we faced and how we addressed them: 1. Limited Signal-to-Noise Ratio at Sub-Seasonal Leads Challenge: Sub-seasonal forecasting (Days 19–32) operates in a “predictability desert” where weather noise dominates and climate signals (e.g., MJO, soil moisture, SST anomalies) are weak or intermittent. This made it difficult for the model to learn consistent, generalizable patterns. Solution: We focused training on anomalies relative to climatology, which are more predictable than absolute values. We used a long training window (2015–2025) to expose the model to diverse climate states (El Niño, La Niña, neutral), improving its ability to recognize recurring large-scale drivers. The Vision Transformer architecture helped by capturing long-range spatial dependencies—critical for leveraging weak but coherent signals like tropical-extratropical teleconnections. 2. Data Imbalance and Skewed Distributions (Especially for Precipitation) Challenge: Precipitation is highly skewed, with many dry days and rare extreme events. A standard MSE loss caused the model to under-predict extremes and over-smooth rainfall. Solution: We designed a composite loss function that combined mean squared error with an anomaly correlation term, encouraging the model to preserve spatial patterns even when magnitudes were uncertain. We applied light data augmentation (e.g., random channel masking) to improve robustness to input variability. For probabilistic outputs, we decoupled deterministic prediction from probability calibration: the model predicted a central estimate, and quintile probabilities were derived using historical error distributions—avoiding direct regression on noisy extremes. 3. Overfitting Due to Limited Sample Size Challenge: Despite using 5+ years of data, the number of unique S2S initialization dates is only ~300 (twice weekly), which is small for a high-capacity ViT model. Early versions showed strong training skill but degraded validation performance. Solution: We reduced model capacity (fewer transformer layers, smaller embedding dimensions) to match the data scale. We used strong regularization: dropout in attention and MLP layers, weight decay, and gradient clipping. We implemented temporal k-fold cross-validation during development to detect overfitting early. Training was stopped based on validation skill on the holdout year, not training loss. 4. Handling Global Grid Geometry in ViT Challenge: Standard ViT assumes a flat, rectangular image, but Earth is a sphere with periodic boundaries (longitude wraps around). Naive patching created artificial discontinuities at the date line and polar distortions. Solution: We modified the positional embedding scheme to respect longitudinal periodicity (e.g., patches at 0° and 360° longitude share similar positional codes). We avoided polar patches by focusing on latitudes between 75°S and 75°N, where CMARA and S2S data are most reliable and the competition evaluation is focused. Patch size was chosen to align with the 1.5° grid (e.g., 4×4 grid cells per patch), minimizing interpolation artifacts. 5. Computational Cost and Training Stability Challenge: Training a global ViT on high-resolution atmospheric data is memory-intensive and prone to attention collapse or divergence in early epochs. Solution: We used mixed-precision training (FP16) to reduce memory and speed up iterations. We started with a learning rate warm-up over the first 5 epochs to stabilize attention weights. We monitored attention entropy during training to ensure the model wasn’t defaulting to uniform attention (a sign of poor learning). 6. Alignment with Competition Requirements Challenge: The AI Weather Quest requires quintile probability forecasts, but our model is deterministic. Converting point predictions to reliable probabilities is nontrivial. Solution: We developed a lightweight, climatology-based probability converter that uses long-term CMARA quintile thresholds and historical forecast error spread to assign probabilities. This step is separate from the deep model, ensuring the core architecture remains focused on accurate deterministic prediction while still meeting submission format requirements. Conclusion Each challenge was addressed through a combination of domain-aware architectural choices, careful data handling, and pragmatic regularization—balancing the power of deep learning with the physical and statistical realities of sub-seasonal forecasting. The result is a model that is not only innovative in design but also robust, reliable, and competition-ready.

Are there any limitations to your current model that you aim to address in future iterations?

Yes, while the ExtraBaseModel demonstrates strong performance for a pure deep learning approach to sub-seasonal forecasting, it has several known limitations that we aim to address in future iterations: 1. No Assimilation of Real-Time Observations Limitation: The model is initialized solely from S2S forecast states and cannot incorporate real-time observations (e.g., satellite retrievals, surface measurements, or ocean heat content updates) after initialization. This means it cannot correct for early forecast errors or respond to unfolding events. Future Direction: We plan to explore hybrid architectures that fuse NWP initial states with near-real-time observational anomalies (e.g., via attention-based conditioning) to enable “nowcast-informed” sub-seasonal prediction. 2. Fixed Climatology for Normalization and Probabilities Limitation: The model relies on a static 2020–2024 climatology for input normalization and probability calibration. This may become less representative under rapid climate change or during unprecedented extremes (e.g., record-breaking heatwaves). Future Direction: We intend to implement adaptive climatology updates—either through rolling windows or online bias estimation—to keep the reference distribution aligned with the current climate state. 3. Limited Representation of Land–Atmosphere and Ocean–Atmosphere Coupling Limitation: Although SST and soil moisture proxies are included as input channels, the model treats them as static snapshots at Day 0. It does not explicitly model their evolution or feedbacks over the forecast window, which are critical for sub-seasonal predictability (e.g., soil moisture–temperature coupling). Future Direction: Future versions may incorporate multi-step conditioning or auxiliary modules that simulate slow-varying boundary conditions (e.g., a lightweight ocean emulator) to better represent coupled processes. 4. Deterministic Core with Post-Hoc Probabilistic Conversion Limitation: The model produces a single deterministic forecast, and probabilities are derived afterward using historical error statistics. This limits its ability to capture flow-dependent uncertainty (e.g., higher uncertainty during MJO transitions). Future Direction: We are exploring ensemble-based or diffusion-based deep learning frameworks that natively generate probabilistic forecasts with state-dependent spread, improving reliability in volatile regimes. 5. Global Uniform Architecture Despite Regional Skill Variability Limitation: The ViT treats all regions equally, but sub-seasonal skill varies greatly by geography (e.g., high skill over tropics due to MJO, low skill over mid-latitude oceans). The current model doesn’t adapt its internal processing to regional predictability. Future Direction: We are investigating region-aware attention mechanisms or multi-task heads that specialize in different domains (tropics vs. extratropics), allowing the model to allocate capacity where it matters most. 6. Dependence on a Single Reanalysis for Training Limitation: Training exclusively on CMARA may embed biases specific to its data assimilation system, potentially reducing generalization to other truth references or real-world observations. Future Direction: Future training could use multi-reanalysis ensembles (e.g., CMARA + ERA5 + JRA-55) as blended targets or apply reanalysis-invariant loss functions to improve robustness. Conclusion The current ExtraBaseModel is a strong proof-of-concept for end-to-end, ViT-based sub-seasonal forecasting—but it remains a first-generation AI weather model. Our roadmap focuses on greater physical awareness, adaptive uncertainty quantification, and tighter integration with Earth system dynamics, moving toward a next-generation system that is not only data-driven but also Earth-system-informed.

Are there any other AI/ML model components or innovations that you wish to highlight?

Yes—beyond the core Vision Transformer architecture, several AI/ML components and methodological innovations were critical to the success of ExtraBaseModel. These design choices reflect our effort to bridge deep learning with sub-seasonal forecasting realities: 1. Climate-Aware Patching Strategy Instead of using generic image-like patches, we designed a meteorologically informed patch layout: Patches align with the 1.5° × 1.5° competition grid (e.g., 4×4 grid cells per patch), avoiding interpolation artifacts. Patch size was tuned to capture synoptic-scale features (~600–1000 km), balancing local detail and global context. Polar regions beyond 75° latitude were excluded, focusing capacity on areas with higher predictability and evaluation relevance. This ensures the model’s inductive bias aligns with atmospheric dynamics rather than computer vision conventions. 2. Multi-Variable Embedding Without Early Fusion Each meteorological variable (e.g., temperature, geopotential height) is initially embedded separately before being combined, allowing the model to learn variable-specific representations. Only after individual linear projections are the channels merged into a unified patch embedding. This preserves the distinct statistical and physical characteristics of each field during early processing. 3. Dual-Head Forecast Decoder with Shared Backbone The model uses a single shared ViT encoder but branches into two specialized output heads—one for Week 3–4 (Days 19–25) and another for Week 5–6 (Days 26–32). This allows the model to: Share learned spatial patterns (e.g., MJO propagation) across lead times. Adapt final-layer representations to the distinct error growth and signal decay at each horizon. This design improves parameter efficiency while maintaining lead-time-specific skill. 4. Anomaly-Centric Learning Objective Rather than predicting raw values, the model is trained to predict deviations from climatology. The climatology is precomputed from CMARA (2015–2024) and subtracted before input and target processing. This focuses the model’s capacity on predictable anomalies—the core of sub-seasonal skill—while reducing the burden of learning seasonal cycles. 5. Lightweight, Physics-Informed Data Augmentation We introduced minimal but effective augmentations tailored to weather data: Random channel dropout: Simulates missing or noisy input variables (e.g., temporary loss of SST data). Lead-time jitter: Slight random shifts in target days (±1 day) during training to improve robustness to timing errors in real forecasts. These augmentations enhance generalization without distorting physical relationships. 6. End-to-End Deterministic-to-Probabilistic Pipeline Although the core model is deterministic, we built a modular post-processing layer that converts its output into calibrated quintile probabilities required by the competition: Uses long-term CMARA quintile thresholds. Adjusts probabilities based on historical forecast error spread for similar conditions (e.g., MJO phase, ENSO state). This keeps the deep model simple while enabling reliable probabilistic interpretation. 7. Training Stability Through Attention Monitoring During development, we observed that standard ViTs can develop degenerate attention maps (e.g., attending only to nearby patches or collapsing to uniform weights). To counter this: We logged attention entropy and spatial focus diversity during training. We adjusted initialization and learning rates to encourage meaningful long-range attention. This ensured the model truly leveraged global context—not just local smoothing. 8. Reproducibility and Competition Compliance by Design From the outset, the pipeline was built to comply strictly with AI Weather Quest rules: No future leakage: all normalization stats and climatologies are frozen before the validation period. No external real-time data: only official S2S initializations are used at inference. Fully deterministic inference: no randomness at test time, ensuring reproducible submissions. Summary ExtraBaseModel is more than just a Vision Transformer applied to weather data—it integrates domain-aware architectural choices, forecasting-specific learning objectives, and practical deployment considerations into a cohesive system. These innovations collectively enable it to extract maximum skill from limited sub-seasonal signals while remaining robust, interpretable, and competition-ready.

The contributors and their specific roles are as follows: Wang Qinglong: Led the overall model architecture design and was primarily responsible for model implementation and integration. Cheng Qin: Conducted forecast evaluation and validation, including skill scoring against observational datasets and performance benchmarking. Zhou Ting: Managed data preprocessing, including harmonization, quality control, and formatting of multi-source S2S and reanalysis data. Hu Yiyang and Feng Biao: Developed and maintained the core codebase, implemented the Transformer-based deep learning components, and ensured software robustness and reproducibility. Ouyang Wei, Yang Wei, Yao Man, and Lu Yi: Contributed to model design discussions, assisted in data preprocessing workflows, and supported experimentation and analysis during the development phase. This team collectively designed, built, validated, and deployed the NewMet model for the ECMWF AI Weather Quest

Submitted forecast data in previous period(s)

Please note: Submitted forecast data is only publicly available once the evaluation of a full competitive period has been completed. See the competition's full detailed schedule with submitted data publication dates for each period here.

Access forecasts data

NewMeteor

Member

Models

Model name

Model summary questionnaire for model NewMet

SON 2025 Period

Model name

Model summary questionnaire for model BaseModel

SON 2025 Period

Model name

Model summary questionnaire for model ExtraBaseModel

SON 2025 Period

Submitted forecast data in previous period(s)

Participation