Which of the following descriptions best represent the overarching design of your forecasting model?
- Machine learning-based weather prediction.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
The model was trained using data from 2020 to 2024, while inference and prediction were performed using data from 2000 to 2024. First, the original ERA5 daily and monthly datasets were interpolated to a 1.5° grid. Weekly data were then generated by averaging the daily data. Based on the prediction date (the Monday three weeks ahead), the input and output datasets were organized into pairs.
The input data include:
• Monthly forecast data,
• Weekly data of predictive variables for the previous 20 weeks,
• Weekly upper-level data for the previous 10 weeks, and
• Elevation data.
The output data consist of daily predictions for seven specific days: Monday, Wednesday, Friday, and Sunday of the third week ahead, and Tuesday, Thursday, and Saturday of the fourth week ahead.
The monthly forecast model was trained using monthly data from 1940 to 2024. It takes the previous 15 months of historical data and other surface variables as input and predicts the monthly mean two months ahead. The monthly forecast data used as input correspond to the predicted month of the target date.
If any, what data does your model rely on for real-time forecasting purposes?
TAS:
The input includes 20 consecutive weeks of historical daily 2-meter temperature (tas) and 10 consecutive weeks of historical daily geopotential height at 200 hPa, 300 hPa, and 500 hPa.
The monthly forecast model (same as for MSLP) requires 15 weeks of historical data for U10, V10, SST, TAS, and MSLP.
MSLP:
The input includes 20 consecutive weeks of historical daily mean sea level pressure (mslp) and 10 consecutive weeks of historical daily geopotential height at 500 hPa and 850 hPa, specific humidity at 700 hPa, divergence and potential vorticity at 900 hPa.
The monthly forecast model (same as for TAS) requires 15 weeks of historical data for U10, V10, SST, TAS, and MSLP.
TP:
The input includes 20 consecutive weeks of historical daily total precipitation (tp) and 10 consecutive weeks of historical daily geopotential height at 200 hPa, 300 hPa, and 500 hPa, specific humidity at 700 hPa, and cloud cover at 800 hPa.
The monthly forecast model requires 15 weeks of historical data for U10, V10, SST, T2M, MSLP, and TP.
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
All surface and pressure-level meteorological variables were obtained from ERA5, while the elevation data were obtained from ETOPO Global Relief Model.
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
The overall prediction framework is divided into two sequential stages. First, more than one year of surface variables is used to predict the corresponding monthly values. Next, using the upper-level variables from the previous three-plus months, the monthly predictions, and elevation data, the model forecasts daily values for seven specific days: Monday, Wednesday, Friday, and Sunday of the third week ahead, and Tuesday, Thursday, and Saturday of the fourth week ahead. The daily predictions for the third and fourth weeks are then converted into quintiles based on historical climate quartiles, and the mean of these quintiles is computed for submission. Separate models are trained for each variable: near-surface air temperature (tas), sea-level pressure (mslp), and total precipitation (tp).
Monthly Prediction Model:
• Input Data: Surface variables from the previous 15 months.
• Architecture: A hybrid of Convolutional Neural Network (CNN) and Temporal Convolutional Network (TCN).
• Output: Two-month-ahead single-variable predictions for tas, mslp, or tp.
Subseasonal Daily Prediction Model:
• Input Data: Weekly data of predictive variables for the previous 20 weeks, weekly upper-level variables for the previous 10 weeks, monthly predictions, and elevation data.
• Architectures for Near-Surface Air Temperature (tas): Six deep learning architectures were explored. All showed good convergence during training, with the Normalized Anomaly Correlation (NAC) steadily increasing and ultimately exceeding 0.75:
1. Convolutional Neural Network (CNN)
2. CNN combined with Gated Recurrent Unit (CNN + GRU Hybrid Model)
3. Transformer-CNN Hybrid Model
4. Residual blocks combined with the spatial attention mechanism from the Convolutional Block Attention Module (CBAM)
5. Residual Squeeze-and-Excitation network combined with CNN (Residual SE + CNN)
6. Residual U-Net structure
• Architectures for Sea-Level Pressure (mslp): Three architectures demonstrated strong convergence during training, with overall ACC steadily increasing and ultimately exceeding 0.7:
1. Residual U-Net
2. Shallow encoder–decoder Residual Network (ResNet) with CBAM-style spatial attention integrated in each residual block to enhance spatial feature extraction, and a 1×1 convolution decoder to output multivariable predictions
3. Transformer-CNN Hybrid Model, which first maps the multi-channel input to feature embeddings via convolution, then models global spatial dependencies using a Transformer encoder, and finally generates spatial predictions for the target variable via convolutional decoding
• Architectures for Total Precipitation (tp): Four architectures demonstrated strong convergence during training, with overall ACC steadily increasing and ultimately exceeding 0.7:
1. Residual U-Net
2. SE Multi-Head ResU-Net, which integrates residual blocks with Squeeze-and-Excitation (SE) modules to enhance feature representation, employs a multi-layer encoder–decoder to extract multi-scale spatial features, and uses multi-head convolutional outputs to generate high-resolution spatial predictions for each target variable
3. Spatial Attention ResNet, which separately encodes different groups of meteorological variables via residual blocks with spatial attention, then fuses these features, and uses independent decoders to produce high-resolution spatial predictions for each target variable
• Model Output: Daily predictions for seven specific days: Monday, Wednesday, Friday, and Sunday of the third week ahead, and Tuesday, Thursday, and Saturday of the fourth week ahead.
Post-Processing: For each model, the daily predictions for the third and fourth weeks are converted into quintiles using historical climate quartiles. The mean of these quintiles is calculated as the model’s quintile prediction. Finally, the quintiles from all models are averaged to generate the final submission. Historical climate quartiles are computed separately for each model.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
I presented an oral talk titled “A Machine Learning Model for Subseasonal Prediction: Forecasting Global Surface Temperature, Pressure, and Precipitation” at the Symposium on Applied Mathematics, Artificial Intelligence Methods, and the Complexity of the Earth System (December 26–28, 2025, Beijing) https://csiam.org.cn/1005/202511/2592.html . The symposium was conducted in Chinese, and the associated conference materials are not publicly documented or available online.
Looking forward to the day this work can be published.
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
No.
Did you face any challenges during model development, and how did you address them?
(1)I encountered was the apparent randomness in model performance. For instance, when I retrained a previously high-performing temperature prediction architecture with an additional year of data, slightly adjusted evaluation metrics, and some minor optimizations, the resulting model unexpectedly performed worse or became unusable. The underlying cause remains unclear. While this did not affect submission, the instability temporarily reduced my confidence in further model optimization.
(2)Another challenge was the limited computational capacity of my laptop. Currently, this does not constrain competition submissions or forecast quality. If I wanted to rigorously assess the impact of computational resources on performance, I would consider purchasing a dedicated GPU rather than relying on cloud computing, budget permitting, as local training allows for more efficient experimentation. Nonetheless, I believe that even with my current setup, there remains substantial potential to further improve forecast accuracy.
Are there any limitations to your current model that you aim to address in future iterations?
On the one hand, the selection of input variables was mainly based on my understanding of meteorological processes and was made rather arbitrarily, without conducting any practical analysis or validation. On the other hand, the model architecture was chosen by testing only a few structures: I kept those that could produce forecasts and discarded those that could not, without exploring a sufficiently wide range of architectures. As a result, I still do not clearly understand what role a specific architecture plays in forecasting. Although both issues are relatively easy to start addressing, they require extensive experiments and cannot be resolved in the short term.
Are there any other AI/ML model components or innovations that you wish to highlight?
(1) The success of the monthly forecasting model marked the beginning of all subsequent possibilities.
(2) During the DJF submission period, the improvements I made step by step included adding new sub-models for mslp and tp, and optimizing the method for calculating quintile probabilities.
(3) Some mistakes in the SON submission questionnaire included: I trained the model using data from 2020–2024 instead of 2000–2024 (of course, I would prefer to use more training data if computational resources allowed, but I forgot that feeding in 25 years of data would cause the system to crash; later I tested with only five years of data, and after finishing the experiment, I completely overlooked the fact that I had used only five years). In addition, the metric in which tas performed well was NAC, not ACC. Since RMSE contains many digits and looks inconvenient, I wanted to use a metric bounded within 1 for easier evaluation. I thought I had been calculating ACC, but in fact it was NAC.
(4) After submitting forecasts for a long time, I realized that some of my approaches to data downloading and folder-structure organization were unnecessarily time-consuming and inefficient—I honestly felt like crying at my own stupidity. /(ㄒoㄒ)/~~
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
This team has chosen to keep its participants anonymous.