Overview » Teams » WIND

WIND

Members

First name (team leader)

Andreas

Last name

Fürst

Organisation name

Johannes Kepler University Linz

Organisation type

Research Organisation (Academic, Independent, etc.)

Organisation location

Austria

First name

Michael

Last name

Aich

Organisation name

Technical University of Munich

Organisation type

Research Organisation (Academic, Independent, etc.)

Organisation location

Germany

Model

Model name

WindDiffusion

Number of individuals supporting model development:

1-5

Maximum number of Central Processing Units (CPUs) supporting model development or forecast production:

< 8

Maximum number of Graphics Processing Units (GPUs) supporting model development or forecast production:

< 4

How would you best classify the IT system used for model development or forecast production:

Single node system

Model summary questionnaire for model WindDiffusion

Please note that the list below shows all questionnaires submitted for this model.
They are displayed from the most recent to the earliest, covering each 13-week competition period in which the team competed with this model.

Which of the following descriptions best represent the overarching design of your forecasting model?

Machine learning-based weather prediction.
Ensemble-based model, aggregating multiple predictions to assess uncertainty and variability.

What techniques did you use to initialise your model? (For example: data sources and processing of initial conditions)

# Data source: WIND uses ERA5 reanalysis data from ECMWF at 1.5° resolution (240×121 grid) with 6-hourly temporal resolution, sourced via the WeatherBench-2 dataset. # Variables: The model uses 70 prognostic variables — 5 surface variables (precipitation, 2m temperature, mean sea level pressure, 10m u-wind, 10m v-wind) and 65 pressure-level variables (temperature, geopotential, specific humidity, u-wind, v-wind across 13 pressure levels). # Processing of initial conditions: - Each variable is normalized to zero mean and unit variance using training set statistics. - Precipitation undergoes a log-transformation before normalization: x = log₁₀(1000x + 1), converting meters to millimeters with a +1 offset for numerical stability at zero-precipitation regions. - The input is augmented with static features (land-sea mask, soil type, surface geopotential) and dynamic temporal encodings (sine/cosine embeddings of annual and diurnal cycles). - Spatial coordinates are embedded into 3D Cartesian space as (sin φ, cos φ, cos φ sin λ) to avoid discontinuities at the date line. - Since the native 240×121 grid doesn't divide evenly through downsampling stages, inputs are bilinearly interpolated to 240×128 before the network and back to original resolution afterward. # Forecast initialization: For forecasting, the model is given one or more clean (noise-free) frames as context, and future frames are initialized as pure noise and iteratively denoised.

If any, what data does your model rely on for real-time forecasting purposes?

The model does not rely on any real-time data, the model is initialized on one frame of ERA5 reanalysis fields.

What types of datasets were used for model training? (For example: observational datasets, reanalysis data, NWP outputs or satellite data)

Only ERA5 reanalysis data at 1.5° resolution

Please provide an overview of your final ML/AI model architecture (For example: key design features, specific algorithms or frameworks used, and any pre- or post-processing steps)

# Architecture: WIND uses a UViT (U-Net Vision Transformer) backbone with approximately 458 million parameters. It has a hierarchical structure across four spatial resolutions with base channel dimension 256, progressing as (256, 512, 1024, 2048). After initial patchification with patch size 2, it uses four residual blocks (ResNet-style) at the two highest spatial resolutions and four Transformer blocks at the two lowest resolutions for global context modeling. Each Transformer block uses 4 attention heads with 1D Rotary Position Embeddings (RoPE) applied across both spatial and temporal dimensions. # Training framework: The model is trained using diffusion forcing. Diffusion forcing is an unconditional video diffusion approach where each frame in a temporal sequence is assigned an independent, random noise level. This contrasts with standard video diffusion (shared noise level) or autoregressive approaches (predicting one step at a time). Critically, the network is not conditioned on the noise levels, forcing it to infer uncertainty directly from the input state. The model processes sequences of T=5 frames (covering 24 hours at 6-hour stride) and learns to jointly denoise the entire sequence. Training uses MSE loss with area-weighted and channel-weighted terms, Adam optimizer with cosine learning rate schedule, bfloat16 mixed precision, and EMA (decay 0.999) for evaluation. # Forecasting WIND denoises 4 frames given a clean context frame. We then repeat this given the latest denoised frame as clean context until we reach the forecast horizon. # Pre-processing: Variable normalization to zero mean/unit variance, log-transform for precipitation, spatial interpolation from 240×121 to 240×128, 3D Cartesian coordinate embedding, and sine/cosine temporal encodings. # Post-processing: Bilinear interpolation back from 240×128 to the native 240×121 grid. No other post-processing is applied.

Have you published or presented any work related to this forecasting model? If yes, could you share references or links?

The work is currently under review at ICML 2026. It was accepted at FM4Science - ICLR 2026 Workshop and ReALM-GEN 2026 - ICLR 2026 Workshop. https://openreview.net/forum?id=kAfesT0CXj https://openreview.net/forum?id=t6blX5ar4a

Before submitting your forecasts to the AI Weather Quest, did you validate your model against observational or independent datasets? If so, how?

No, we do not

Did you face any challenges during model development, and how did you address them?

Yes, one of the main challenge regarding forecasting was comparability between weather models. Each of the models like GraphCast or GenCast use different variables, and different tricks to weight these variables, which make predictions hard to compare.

Are there any limitations to your current model that you aim to address in future iterations?

We currently only train on 1.5°, our goal is to also get competitive at 0.25° resolution, but we lack the compute to do so.

Are there any other AI/ML model components or innovations that you wish to highlight?

The key innovation of WIND is that it is not trained for forecasting or any specific task. It learns a general-purpose generative prior of atmospheric dynamics through self-supervised video reconstruction. All downstream applications — forecasting, downscaling, sparse reconstruction, conservation law enforcement, and counterfactual storylines — are solved as inverse problems at inference time via posterior sampling, with no fine-tuning. This replaces the current paradigm of training separate specialized models for each application with a single unified foundation model.

Who contributed to the development of this model? Please list all individuals who contributed to this model, along with their specific roles (e.g., data preparation, model architecture, model validation, etc) to acknowledge individual contributions.

# Andreas Fürst: Core implementation: data preparation pipeline, training infrastructure, inference code, and validation code across all downstream tasks. # Michael Aich: Conceptual development of downstream applications (dry air mass conservation, counterfactual storylines), implementation of the validation for counterfactual storylines. # Florian Sestak: Model architecture design and implementation.

WIND

Members

Model

Model name

Model summary questionnaire for model WindDiffusion

MAM 2026 Period

Participation