Which of the following descriptions best represent the overarching design of your forecasting model?
- Post-processing of numerical weather prediction (NWP) data.
- Statistical model focused on generating quintile probabilities.
- Hybrid model that integrates physical simulations with machine learning or statistical techniques.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
Our approach relies on post-processing a coupled atmosphere-land model, ClimaCoupler.jl. The coupler requires specifying an initial state and time-varying forcings. The land and atmosphere components are initialized directly from the ERA5 state. A hydrostatic vertical pressure profile is derived from the near-surface atmospheric pressure. Sea surface temperature and sea ice concentration are persisted from the ERA initial condition through the forecast period.
If any, what data does your model rely on for real-time forecasting purposes?
Most recently-available 3D ERA5 state for land and atmosphere. ERA5 sea surface temperature and sea ice concentration are persisted from the initial state.
Initialization variables:
Atmosphere Model (ClimaAtmos.jl): Temperature (3D), Specific Humidity (3D), U/V winds (3D), Near-surface pressure (2D)
Land Model (ClimaLand.jl): Soil Temperature (3D), Volumetric fraction of water (3D), Skin temperature (2D), Snow water equivalent (2D), Temperature of snow layer (2D)
;Prescribed Land Fields: Leaf Area Index [integrated land model]; Albedo [bucket model] (2D)
Auxiliary/Prescribed Fields: Sea surface temperature (2D), Sea ice concentration (2D), Sea ice temperature (3D), Surface elevation (2D)
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
Coupled model output and ERA5 weekly statistics
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
Coupled Atmosphere-Land Model:
The coupled model (ClimaCoupler.jl), initialized from ERA5 as detailed above, is run for 40 days. Initial calibration work has been done to optimize several coupler parameters against ERA5 weekly statistics with variants of Ensemble Kaman Inversion.
ML Post-processing:
We use a data-driven correction framework that maps simulated ClimaCoupler forecast fields to ERA5 (weekly aggregated). A machine learning model is trained to adjust the dynamical forecasts from the coupled model using analogs from comparable time periods over the historical period. The correction model takes as input spatial, temporal, and dynamical variables from the coupled model and produces both a mean correction and an uncertainty estimate.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
Work specific to subseasonal forecasting with CliMA was presented at AMS 2026. https://ams.confex.com/ams/106ANNUAL/meetingapp.cgi/Paper/476713
Recent papers highlighting components of the CliMA model:
Yatunin, D., Byrne, S., Kawczynski, C., Kandala, S., Bozzola, G., Sridhar, A., Shen, Z., Jaruga, A., Sloan, J., He, J., Huang, D.Z., Barra, V., Knoth, O., Ullrich, P., Schneider, T., 2025: The CliMA atmosphere dynamical core: Concepts, numerics, and scaling. Journal of Advances in Modeling Earth Systems, submitted.
Deck, K., Braghiere, R. K., Renchon, A. A., Sloan, J., Bozzola, G., Speer, E., Mackay, B., Reddy, T., Phan, K., Gagne-Landmann, A. L., Yatunin, D., Charbonneau, A., Efrat-Henrici, N., Bach, E., Ma, S., Gentine, P., Frankenberg, C., Bloom, A., Wang, Y., Longo, M., Schneider, T., 2025: ClimaLand: A land surface model for advancing climate modeling with machine learning and data-driven parameterizations. Journal of Advances in Modeling Earth Systems, submitted.
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
Validation is performed by splitting the data into training, validation, and test years. Hyperparameters/models are adjusted to optimize performance in the validation years.
Did you face any challenges during model development, and how did you address them?
1.) Biases in raw models simulations: core model development, bug fixes, coupling improvements
2.) Overconfident forecasts/overfitting: modify noise, ML model for post-processing, and hyperparameters
Are there any limitations to your current model that you aim to address in future iterations?
1) A single deterministic model forecast is mapped to quantities, so uncertainty from the initial condition and forcings are not properly captured.
2) Currently, sea ice concentration and sea surface temperature are persisted. We would like to move to a climatological reference or couple full-complexity sea ice and ocean models.
Are there any other AI/ML model components or innovations that you wish to highlight?
Ongoing work employing ensemble-based data-assimilation for parameter estimation, which will allow for simultaneous calibration/fine-tuning of physics parameters in the numerical model and parameters in the ML postprocessor.
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
Costa Christopoulos - coupled model setup, core model development, ML-based post-processing, global data assimilation
Jordan Benjamin - coupled model improvements, post-processing, global data assimilation
Ronak Patel - Statistical improvements to post-processing, meteorological analysis of coupled modeled output
Julian Schmitt - coupled model setup, core model development, historical archive
Haakon Ludvig - RPSS validation pipeline, historical archive
Ollie Dunbar - Data assimilation, ML-based post processing, uncertainty quantification
Zhaoyi Shen - core model development (atmosphere), coupling
Katherine Deck - core model development (land), coupling
Nat - software, performance, data assimilation for coupled model
Kevin - software, performance, data assimilation for coupled model