CHAIMAEOUAZRI
Members
This team has chosen to keep its participants anonymous.
Model
Model name
ChaimaeModel
Number of individuals supporting model development:
1-5
Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:
< 8
Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:
< 4
How would you best classify the IT system used for model development or forecast production:
Single node system
Model summary questionnaire for model ChaimaeModel
Please note that the list below shows all questionnaires submitted for this model.
They are displayed from the most recent to the earliest, covering each 13-week competition period in which the team competed with this model.
Which of the following descriptions best represent the overarching design of your forecasting model?
- Machine learning-based weather prediction.
- Statistical model focused on generating quintile probabilities.
- Hybrid model that integrates physical simulations with machine learning or statistical techniques.
- An empirical model that utilises historical weather patterns.
What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)
The model was initialized using ERA5 reanalysis data (2021–2023), aggregated into weekly means/sums, cleaned, and sampled into (t, t+1) pairs to provide consistent initial conditions for CatBoost forecasting.
If any, what data does your model rely on for real-time forecasting purposes?
Weekly ERA5 reanalysis data provided by the AI Weather Quest platform for tas, mslp, and pr, preprocessed as seven-day means or sums and automatically retrieved each forecast cycle. No external or observational data are used.
What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)
The model was trained exclusively on ERA5 reanalysis datasets (tas, mslp, pr) from 2021–2023, obtained via the AI Weather Quest platform and preprocessed as seven-day means or sums. No observational, satellite, or external NWP datasets were used.
Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)
The model uses a CatBoostClassifier trained on weekly ERA5 reanalysis data (tas, mslp, pr). Input values represent current grid conditions, and targets are next-step quintile classes. After removing missing values, the model predicts probabilistic forecasts for five classes, normalized and exported as NetCDF files via the AI Weather Quest API.
Have you published or presented any work related to this forecasting model? If yes, could you share references or links?
Yes. Related works applying similar AI-based forecasting approaches have been submitted, including one under journal review and others presented at upcoming conferences and ongoing research using graph neural networks (GNNs) for spatio-temporal prediction.
Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?
The model was validated by verifying the coherence of predicted probabilities and ensuring consistency with ERA5 reference data. Internal checks confirmed that outputs were correctly normalized between 0 and 1, and spatial patterns followed realistic climatological distributions for each variable (tas, mslp, pr). No independent observational datasets were used for external validation.
Did you face any challenges during model development, and how did you address them?
The main challenge was handling large multidimensional ERA5 datasets and ensuring stable model training within Colab’s limited computational resources. This was addressed by reducing temporal sampling, optimizing memory usage, and simplifying the CatBoost model architecture to maintain performance while enabling consistent weekly submissions.
Are there any limitations to your current model that you aim to address in future iterations?
The current model uses only a single variable’s past values to predict its future quintile probabilities, without incorporating spatial dependencies or multi-variable interactions. Future iterations will integrate additional predictors, temporal sequences, and hybrid architectures combining physical and deep learning components to enhance accuracy and generalization.
Are there any other AI/ML model components or innovations that you wish to highlight?
The model integrates an automated workflow that retrieves ERA5 reanalysis data, trains a CatBoost classifier, generates probabilistic forecasts for tas, mslp, and pr, and submits them through the AI Weather Quest interface. This design ensures full reproducibility and efficient weekly forecasting without external data sources.
Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.
This team has chosen to keep its participants anonymous.
Submitted forecast data in previous period(s)
Please note: Submitted forecast data is only publicly available once the evaluation of a full competitive period has been completed. See the competition's full detailed schedule with submitted data publication dates for each period here.
Access forecasts data