NSFNCAR
Members
First name (team leader)
Kirsten
Last name
Mayer
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
Will
Last name
Chapman
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
Judith
Last name
Berner
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
David John
Last name
Gagne
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
Katie
Last name
Dagon
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
John
Last name
Schreck
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
Abby
Last name
Jaye
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
Charlie
Last name
Becker
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
Sasha
Last name
Glanville
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
First name
Dhamma
Last name
Kimpara
Organisation name
NSF National Center for Atmospheric Research
Organisation type
Research Organisation (Academic, Independent, etc.)
Organisation location
United States of America
Model
Model name
subCESMulator
Number of individuals supporting model development:
6-10
Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:
8-48
Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:
4-16
How would you best classify the IT system used for model development or forecast production:
High-Performance Computing (HPC) Cluster
Model summary questionnaire for model subCESMulator
Please note that the list below shows all questionnaires submitted for this model.
They are displayed from the most recent to the earliest, covering each 13-week competition period in which the team competed with this model.
Which of the following descriptions best represent the overarching design of your forecasting model?
- Machine learning-based weather prediction.
- Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
We used the 31 real-time members from GEFS and perturbed each one 11 times using the S2S CESM ensemble generation method (detailed here: https://doi.org/10.1175/WAF-D-21-0163.1) to get 341 initial conditions. To use GEFS as initial conditions, we first had to convert from pressure levels to model levels. We use surface temperature with a land mask as a proxy for sea surface temperature. The first iteration of our model persists this SST throughout the forecast. The next version of our model will predict SST at each time step.
If any, what data does your model rely on for real-time forecasting purposes?
It relies on the GEFS initial conditions.
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
We used Earth system model simulations to train, specifically output from CESM.
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
The architecture of the model is the same as CAMulator, which is detailed in this publication: https://arxiv.org/pdf/2504.06007
Since our model is trained on CESM, to calculate the quintiles, we use the CESM climatological values. Further, we had to regrid the ML model output from 192x288 to the required 121x240 grid.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
The architecture of the model is the same as CAMulator, which is detailed in this publication: https://arxiv.org/pdf/2504.06007
The first iteration of our model is essentially CAMulator, but with sea ice fraction and CO2 as dynamical forcings. The second iteration of the model is trained on coupled CESM simulations and also includes SST and soil moisture at 10cm as prognostic variables.
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
We have not yet, but are in the process of doing so. We will first evaluate the model by initializing it with the initializations used to generate the CESM S2S reforecasts and compare the skill to the CESM2 S2S hindcasts.
Did you face any challenges during model development, and how did you address them?
For the first iteration, we had to deal with many regridding artifacts
For the second iteration of the model, we had a standardization issue with soil moisture that was discovered after training. This issue lead to weird artifacts in the predictions. We discovered the issue came from the z-score normalization, where we had very small standard deviations in soil moisture in regions that are often covered by ice, leading to very large soil moisture values. To address this issue, we masked out these points and found the predictions no longer had strange artifacts, and therefore, we have not retrained/fine-tuned the model.
Are there any limitations to your current model that you aim to address in future iterations?
Yes, we will incorporate SSTs and soil moisture at 10cm as prognostic variables.
Are there any other AI/ML model components or innovations that you wish to highlight?
We are training on output from an Earth System model (CESM) rather than an observational or reanalysis dataset. We chose to do this to explore whether training on climate model output could help address the limited observational record.
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
Kirsten Mayer - subCESMulator project lead, data preparation, forecast post-processing
Charlie Becker - data preparation, forecast submission, model deployment
Will Chapman - data generation and preparation, CAMulator architecture and training
David John Gagne - CREDIT project lead, inference data preparation, and forecast data post-processing
John Schreck - CAMulator architecture and training, data preparation, model inference, GPU parallelization
Katie Dagon - conceptualization and data standardization
Sasha Glanville - ensemble generation code
Abby Jaye - CESM quintile calculation
Judith Berner - ensemble generation conceptualization
Submitted forecast data in previous period(s)
Please note: Submitted forecast data is only publicly available once the evaluation of a full competitive period has been completed. See the competition's full detailed schedule with submitted data publication dates for each period here.
Access forecasts data