Which of the following descriptions best represent the overarching design of your forecasting model?
- Machine learning-based weather prediction.
- Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
Initial conditions are sourced from ECMWF Open Data, including surface, soil moisture, and pressure level parameters. These are downloaded following the AIFS methodology https://huggingface.co/ecmwf/aifs-ens-1.0/blob/main/run_AIFS_ENS_v1.ipynb, preprocessed into standardized input states, and stored as pickle(pkl) files for each of the 50 ensemble members before GPU inference. The initial conditions are downloaded weekly on Thursday 00Z and previous day 18Z IFS 0h grib2 files. The code was modified into https://github.com/icpac-igad/ea-aifs/blob/main/ecmwf_opendata_pkl_input_aifsens.py to run on CPU-only computers to save costs, and methods were added to transfer the initial dataset for use within the GPU computer.
If any, what data does your model rely on for real-time forecasting purposes?
ECMWF Open Data provides real-time initial conditions—including surface parameters, soil moisture, and pressure levels—retrieved weekly at 00Z on Thursdays and 18Z on the previous day. The operational workflow downloads and preprocesses this data for all 50 ensemble members to initialize each forecast run. However, ECMWF Open Data only retains the most recent four days of data, which limits its use as initial conditions for AIFS inference. To overcome this constraint, an alternative approach is being explored using the AWS Open Data Registry, where IFS data are archived for over a year and remain available in near real time. A method is being developed using the Kerchunk and VirtualiZarr Python libraries to enable seamless access to GRIB2 files stored in the AWS Open Data Registry, thereby overcoming the bottleneck caused by the four-day data availability limitation in ECMWF Open Data.
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
The base AIFS ENS v1.0 model was trained by ECMWF on ERA5 reanalysis and ECMWF operational NWP analyses. We utilize this pre-trained model weight for inference (ecmwf/aifs-ens-1.0) from https://huggingface.co/ecmwf/aifs-ens-1.0) without any additional training.
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
ECMWF’s AIFS-ENS v1.0 is a machine-learning–based ensemble weather forecasting system. It uses a graph neural network (GNN) encoder-decoder architecture combined with a transformer-based temporal processor.
The model is trained with the Continuous Ranked Probability Score (CRPS) as its loss function—specifically. The resulting model is stochastic, capable of producing an set of ensemble members to represent forecast uncertainty.
In the current version of model architecture, after the initial condition files(pkl) are created, the script https://github.com/icpac-igad/ea-aifs/blob/main/automate_aifs_gpu_pipeline.py was used to run the GPU based inference on the pkl files. A 50-member ensemble inference is executed on an NVIDIA A100 GPU—in this case via Google Cloud Platform using a Coiled https://coiled.io/ -managed Dask notebook environment. Post-processing includes ensemble statistical computation (e.g., mean, spread, and quantiles) to generate probabilistic forecast products using the AI-WQ-package Python library, which are then submitted as part of the weekly forecast product for AIquest. The python scripts https://github.com/icpac-igad/ea-aifs/blob/main/aifs_n320_grib_1p5defg_nc_cli.py and https://github.com/icpac-igad/ea-aifs/blob/main/ensemble_quintile_analysis_cli.py and https://github.com/icpac-igad/ea-aifs/blob/main/forecast_submission_cli.py was used for this purposes.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
No publications yet. The foundational AIFS model is described in Lang et al. 2024. We use ECMWF's pre-trained model from https://huggingface.co/ecmwf/aifs-ens-1.0 and adapted it for operational ensemble forecasting. The complete workflow is published at https://github.com/icpac-igad/ea-aifs
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
At present validation occurs through AI Weather Quest competition library and submissions routine only. Model outputs are regridded to 1.5° resolution, ensemble statistics computed, and forecasts compared against climatology before submission to ensure quality and consistency.
Did you face any challenges during model development, and how did you address them?
Setting up the anemoi library and FlashAttention library requirements via Docker image linked with Coiled software to start a GPU cluster was challenging. GPU memory limitations were addressed using NVIDIA A100 GPUs with optimized memory chunking. Storage constraints were resolved by utilizing scratch disk space. High computational costs (~$40-46/forecast) were mitigated through efficient CPU/GPU workflow separation and regional resource alignment in europe-west4.
Are there any limitations to your current model that you aim to address in future iterations?
Sequential ensemble processing is time-intensive and costly in cloud computing; we plan parallel member execution for 2-3x speedup. High computational costs necessitate exploring preemptible GPU instances or GPU based compute functions.
Are there any other AI/ML model components or innovations that you wish to highlight?
The multi-environment workflow introduces an innovative cost-optimization strategy by separating CPU-based preprocessing from GPU-based inference phases. Cloud deployment through Coiled-managed notebooks on Google Cloud Platform enables scalable and exploration on GPU cloudrun or function for parallelized ensemble forecasting promises cost effectiveness.
Furthermore, the data streaming and integration of AWS S3 Open Data for AIFS inference represent a key innovation—extending access beyond ECMWF’s four-day data window to include multi-year initial condition datasets. This advancement bridges the gap toward operational subseasonal-to-seasonal forecasting, enabling more robust experimentation and model validation across longer climatological periods.
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
Nishadh Kalladath: DevOps cloud computing infrastructure, workflow and data pipeline programming, model deployment.
Masilin Gudoshava, Climate science expertise, meteorological validation, and scientific methodology guidance.
Anthony Mwanthi: Programming and scientific analysis support.
Eunice Koech, Climate science expertise, meteorological validation
Jason Kinyua: DevOps cloud computing infrastructure, Software development support and programming.
Alex Ogalo: Software development support and programming.
Hillary Koros: DevOps, workflow and data pipeline programming.
Ahmed Amdihun: Program coordination, resource mobilization, and overall supervision.