Which of the following descriptions best represent the overarching design of your forecasting model?
- Post-processing of numerical weather prediction (NWP) data.
- Statistical model focused on generating quintile probabilities.
- Hybrid model that integrates physical simulations with machine learning or statistical techniques.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
Our approach relies on post-processing a coupled atmosphere-land model, ClimaCoupler.jl. The coupler requires specifying an initial state and time-varying forcings. The land and atmosphere components are initialized directly from the ERA5 state. A hydrostatic vertical pressure profile is derived from the near-surface atmospheric pressure. Sea surface temperature and sea ice concentration are persisted from the ERA initial condition through the forecast period.
If any, what data does your model rely on for real-time forecasting purposes?
Most recently available 3D ERA5 state for land and atmosphere. ERA5 sea surface temperature and sea ice concentration are prescribed.
Initialization variables:
Atmosphere Model (ClimaAtmos.jl): Temperature (3D), Specific Humidity (3D), U/V winds (3D), Near-surface pressure (2D)
Land Model (ClimaLand.jl): Soil Temperature (3D), Volumetric fraction of water (3D), Skin temperature (2D), Snow water equivalent (2D), Temperature of snow layer (2D)
;Prescribed Land Fields: Leaf Area Index [Full integrated land model]; Albedo [bucket model] (2D)
Auxiliary/Prescribed Fields: Sea surface temperature (2D), Sea ice concentration (2D), Surface elevation (2D)
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
Coupled model output and ERA5 reanalysis (2018 - present)
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
Coupled Atmosphere-Land Model:
The coupled model (ClimaCoupler.jl), initialized from ERA5 as detailed above, is run for 4 weeks.
ML Post-processing:
We use a data-driven correction framework that maps simulated ClimaCoupler forecast fields to ERA5. A pointwise machine learning model is trained to adjust the dynamical forecasts from the coupled model using analogs from comparable time periods over the historical period (since 2018). The model takes as input spatial, temporal, and dynamical variables from the coupled model and produces both a mean correction and an initial uncertainty estimate. Ensemble forecasts are generated by sampling from this predictive distribution. To address underdispersion in the ensemble, we introduce an empirically derived component that captures nonlocal contributions to uncertainty. This additive term is estimated from historical forecast residuals and combined with local error estimates.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
Results specific to subseasonal forecasting with CliMA will be presented at AMS 2026.
Recent papers highlighting components of the CliMA model:
Yatunin, D., Byrne, S., Kawczynski, C., Kandala, S., Bozzola, G., Sridhar, A., Shen, Z., Jaruga, A., Sloan, J., He, J., Huang, D.Z., Barra, V., Knoth, O., Ullrich, P., Schneider, T., 2025: The CliMA atmosphere dynamical core: Concepts, numerics, and scaling. Journal of Advances in Modeling Earth Systems, submitted.
Deck, K., Braghiere, R. K., Renchon, A. A., Sloan, J., Bozzola, G., Speer, E., Mackay, B., Reddy, T., Phan, K., Gagne-Landmann, A. L., Yatunin, D., Charbonneau, A., Efrat-Henrici, N., Bach, E., Ma, S., Gentine, P., Frankenberg, C., Bloom, A., Wang, Y., Longo, M., Schneider, T., 2025: ClimaLand: A land surface model for advancing climate modeling with machine learning and data-driven parameterizations. Journal of Advances in Modeling Earth Systems, submitted.
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
Training is performed on years from 2018, with 2024 held out as a validation year. Hyperparameters are tuned to optimize performance in the validation year. For uncertainty quantification, we assess reliability by examining the fraction of grid cells within predefined, predicted probability bins (0-0.2, 0.2-0.8, 0.8-1.0). A well-calibrated model should yield approximately 20%, 60%, and 20% of grid cells in the lower, middle, and upper ranges, respectively. The model was subsequently tuned to improve this alignment.
Did you face any challenges during model development, and how did you address them?
1.) Biases in raw models simulations: core model development, bug fixes, coupling improvements
2.) Overconfident forecasts/overfitting: add additional noise, modify hyperparameters
Are there any limitations to your current model that you aim to address in future iterations?
1) A single deterministic model forecast is mapped to quantities, so uncertainty from initial condition and forcings are not properly captured.
2) The local model prevents learning non-local, larger-scale features and regimes.
3) Currently, sea ice concentration and sea surface temperature are persisted. We would like to move to a climatological reference or couple full-complexity sea ice and ocean models.
Are there any other AI/ML model components or innovations that you wish to highlight?
Ongoing work employing ensemble-based data-assimilation for parameter estimation, which will allow for simultaneous calibration/fine-tuning of physics parameters in the numerical model and parameters in the ML postprocessor.
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
This team has chosen to keep its participants anonymous.