Competition Overview

WMO logo WWRP logo WCRP logo S2S logo SDSC logo ECMWF logo

Table of Contents

  1. Announcements
  2. Description
  3. Timeline
  4. Prize
  5. Predictions
  6. Evaluation
  7. Data
  8. Training
  9. Discussions
  10. Leaderboard
  11. Rules
  12. Organizers

Announcements

2021-07-27:

2021-06-29:

2021-06-19:

2021-05-31: Template repository and scorer bot ready

2021-05-27: Town hall dates and EWC compute deadline changes

The organizers slightly adapted the rules:

The organizers invite everyone to join two town hall meetings:

The meetings will include a 15-minutes presentation on the competition rules and technical aspects, followed by a 45-minutes discussion for Q&A.

A first version of the s2s-ai-challenge-template repository is released. Please fork again or rebase.

The deadline to apply for EWC compute access is shifted to 15th June 2021. Please use the competition registration form to explain why you need compute resources. Please note that ECMWF just provides access to EWC, but not detailed support of how to setup your environments etc.

The RPSS formula has change to incorporate the RPSS with respect to climatology, see evaluation.

2021-05-10: Rules adapted to discourage overfitting

The organizers modified the rules:

The organizers are aware that overfitting is an issue if the ground truth is accessible. A more robust verification would require predicting future states with weekly submissions over a year, which would take much more time until one year of new observations is available. Therefore, we decided against a real-time competition to shorten the project length and keep momentum high. Over time we will all see which methods genuinely have skill and which overfitted their available data.

Description

The World Meteorological Organization (WMO) is launching an open prize challenge to improve current forecasts of precipitation and temperature from today’s best computational fluid dynamical models 3 to 6 weeks into the future using Artificial Intelligence and/or Machine Learning techniques. The challenge is part of the the Subseasonal-to-Seasonal Prediction Project (S2S Project), coordinated by the World Weather Research Programme (WWRP)/World Climate Research Programme (WCRP), in collaboration with Swiss Data Science Center (SDSC) and European Centre for Medium-Range Weather Forecasts (ECMWF).

Improved sub-seasonal to seasonal (S2S) forecast skill would benefit multiple user sectors immensely, including water, energy, health, agriculture and disaster risk reduction. The creation of an extensive database of S2S model forecasts has provided a new opportunity to apply the latest developments in machine learning to improve S2S prediction of temperature and total precipitation forecasts up to 6 weeks ahead, with focus on biweekly averaged conditions around the globe.

The competition will be implemented on the platform of Renkulab at the Swiss Data Science Center (SDSC), which hosts all the codes and scripts. The training and verification data will be easily accessible from the European Weather Cloud and relevant access scripts will be provided to the participants. All the codes and forecasts of the challenge will be made open access after the end of the competition.

This is the landing page of the competition presenting static information about the competition. For code examples and how to contribute, please visit the contribution template repository renkulab.io.

Timeline

Prize

Prizes will be awarded to for the top three submissions evaluated by RPSS and peer-review scores and must beat the calibrated ECMWF benchmark and climatology. The ECMWF recalibration has been performed by using the tercile boundaries from the model climatology rather than from observations:

The 3rd prize is reserved for the top submission from developing or least developed country or small island state as per the UN list (see table C, F, H p.166ff). If such a submissions is already among the top 2, the third submission will get the 3rd prize.

Predictions

The organizers envisage two different approaches for Machine Learning-based predictions, which may be combined. Predict the week 3-4 & 5-6 state based on:

ML-based predictions schematic

For the exact valid_times to predict, see timings. For the data to use us for training, see data sources. Comply with the rules.

Evaluation

The objective of the competition is to improve week 3-4 (weeks 3 plus 4) and 5-6 (weeks 5 plus 6) subseasonal global probabilistic 2m temperature and total precipitation tercile forecasts issued in the year 2020 by using Machine Learning/Artificial Intelligence.

The evaluation will be continuously performed by a s2saichallengescorer bot, following the verification notebook. Submissions are evaluated on the Ranked Probability Score (RPS) between the ML-based forecasts and ground truth CPC temperature and accumulated precipitation observations based on pre-computed observations-based terciles calculated in renku_datasets_biweekly.ipynb. This RPS is compared to the climatology forecast in the Ranked Probability Skill Score (RPSS). The ML-based forecasts should beat the re-calibrated real-time 2020 ECMWF and climatology forecasts to be able to win prizes, see end of verification notebook.

RPS is calculated with the open-source package xskillscore over all 2020 forecast_times. For probabilistic forecasts:

xs.rps(observations, probabilistic_forecasts, category_edges=None, input_distributions='p', dim='forecast_time')

See the xskillscore.rps API for details.

def RPSS(rps_ML, climatology):
    """Ranked Probability Skill Score with respect to climatology.
  
    +---------+-------------------------------------------+
    |  Score  | Description                               |
    +---------+-------------------------------------------+
    |    1    | maximum, perfect improvement              |
    +---------+-------------------------------------------+
    |  (0,1]  | positive means ML better than climatology |
    +---------+-------------------------------------------+
    |    0    | equal performance                         |
    +---------+-------------------------------------------+
    | (0, -∞) | negative means ML worse than climatology  |
    +---------+-------------------------------------------+

  """
  return 1 - rps_ML / climatology

The RPSS relevant for the prizes is first calculated on each grid cell over land globally on a 1.5 degree grid. In grid cells where numerical values are expected but NaNs are provided, the RPSS is penalized with -10. The RPSS values are clipped to the interval [-10, 1]. This gridded RPSS is then averaged over all forecast_times, spatially averaged (weighted (np.cos(np.deg2rad(ds.latitude)))) over [90N-60S] land points and further averaged over both variables and both lead_times. Please note that the observational probabilities are applied with a dry mask on total precipitation tp evaluation as in Vigaud et al. 2017, i.e. we exclude grid cells where the observations-based lower tercile edge is below 1 mm/day. Please find the ground truth compared against here.

For diagnostics, we will further host leaderboards for the two variables in three regions in November 2021:

Please find more details in the verification notebook.

Submissions

We expect submissions to cover all bi-weekly week 3-4 and week 5-6 forecasts issued in 2020, see timings. We expect one submission netcdf file for all 53 forecasts issued on thursdays in 2020. Submissions must be gridded on a global 1.5 degree grid.

Each submission has to be a netcdf file with the following dimension sizes and coordinates:

<xarray.Dataset>
Dimensions:        (category: 3, forecast_time: 53, latitude: 121, lead_time: 2, longitude: 240)
Coordinates:
  * forecast_time  (forecast_time) datetime64[ns] 2020-01-02 ... 2020-12-31
  * latitude       (latitude) float64 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
  * lead_time      (lead_time) timedelta64[ns] 14 days 28 days
  * longitude      (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
    valid_time     (lead_time, forecast_time) datetime64[ns] 2020-01-16 ... 2...
  * category       (category) object 'below normal' 'near normal' 'above normal'
Data variables:
    t2m            (category, lead_time, forecast_time, latitude, longitude) float32 ...
    tp             (category, lead_time, forecast_time, latitude, longitude) float32 ...
Attributes:
    author:        Aaron Spring
    author_email:  aaron.spring@mpimet.mpg.de
    comment:       created for the s2s-ai-challenge as a template for the web...
    website:       https://s2s-ai-challenge.github.io/#evaluation

This template submissions file is available here.

Click on 📄 to see the metadata for the coordinates and variables.

We deal with two fundamentally different variables here: Total precipitation is precipitation flux pr accumulated over lead_time until valid_time and therefore describes a point observation. 2m temperature is averaged over lead_time(valid_time) and therefore describes an average observation. The submission file data model unifies both approaches and assigns 14 days for week 3-4 and 28 days for week 5-6 marking the first day of the biweekly aggregate.

Submissions have to be commited in git with git lfs in a repository hosted by renkulab.io.

After the competition, the code for training together with the gridded results must be made public, so the organizers and peer review can check adherence to the rules. Please indicate the resources used (number of CPUs/GPUs, memory, platform; see safeguards in examples) in your scripts/notebooks to allow reproducibility and document them fully to enable easy interpretation of the codes. Submissions, which cannot be independently reproduced by the organizers after the competition ends, cannot win prizes, please see rules.

Data

Timings

The organizers explicitly choose to run this competition on past 2020 data, instead of a real-time competition to enable a much shorter competition period and to keep momentum high. We are aware of the dangers of overfitting (see rules), if the ground truth data is accessible.

Please find here an explicit list of the forecast dates required.

1) Which forecast starts/target periods (weeks 3-4 & 5-6) to require to be submitted?

Please find a list of the dates when forecasts are issued (forecast_time with CF standard_name forecast_reference_time) and corresponding start and end in valid_time for week 3-4 and week 5-6.

lead_time week 3-4 start week 3-4 end week 5-6 start week 5-6 end
forecast_reference_time
valid_time 2020-01-02 2020-01-16 2020-01-29 2020-01-30 2020-02-12
2020-01-09 2020-01-23 2020-02-05 2020-02-06 2020-02-19
2020-01-16 2020-01-30 2020-02-12 2020-02-13 2020-02-26
2020-01-23 2020-02-06 2020-02-19 2020-02-20 2020-03-04
2020-01-30 2020-02-13 2020-02-26 2020-02-27 2020-03-11
2020-02-06 2020-02-20 2020-03-04 2020-03-05 2020-03-18
2020-02-13 2020-02-27 2020-03-11 2020-03-12 2020-03-25
2020-02-20 2020-03-05 2020-03-18 2020-03-19 2020-04-01
2020-02-27 2020-03-12 2020-03-25 2020-03-26 2020-04-08
2020-03-05 2020-03-19 2020-04-01 2020-04-02 2020-04-15
2020-03-12 2020-03-26 2020-04-08 2020-04-09 2020-04-22
2020-03-19 2020-04-02 2020-04-15 2020-04-16 2020-04-29
2020-03-26 2020-04-09 2020-04-22 2020-04-23 2020-05-06
2020-04-02 2020-04-16 2020-04-29 2020-04-30 2020-05-13
2020-04-09 2020-04-23 2020-05-06 2020-05-07 2020-05-20
2020-04-16 2020-04-30 2020-05-13 2020-05-14 2020-05-27
2020-04-23 2020-05-07 2020-05-20 2020-05-21 2020-06-03
2020-04-30 2020-05-14 2020-05-27 2020-05-28 2020-06-10
2020-05-07 2020-05-21 2020-06-03 2020-06-04 2020-06-17
2020-05-14 2020-05-28 2020-06-10 2020-06-11 2020-06-24
2020-05-21 2020-06-04 2020-06-17 2020-06-18 2020-07-01
2020-05-28 2020-06-11 2020-06-24 2020-06-25 2020-07-08
2020-06-04 2020-06-18 2020-07-01 2020-07-02 2020-07-15
2020-06-11 2020-06-25 2020-07-08 2020-07-09 2020-07-22
2020-06-18 2020-07-02 2020-07-15 2020-07-16 2020-07-29
2020-06-25 2020-07-09 2020-07-22 2020-07-23 2020-08-05
2020-07-02 2020-07-16 2020-07-29 2020-07-30 2020-08-12
2020-07-09 2020-07-23 2020-08-05 2020-08-06 2020-08-19
2020-07-16 2020-07-30 2020-08-12 2020-08-13 2020-08-26
2020-07-23 2020-08-06 2020-08-19 2020-08-20 2020-09-02
2020-07-30 2020-08-13 2020-08-26 2020-08-27 2020-09-09
2020-08-06 2020-08-20 2020-09-02 2020-09-03 2020-09-16
2020-08-13 2020-08-27 2020-09-09 2020-09-10 2020-09-23
2020-08-20 2020-09-03 2020-09-16 2020-09-17 2020-09-30
2020-08-27 2020-09-10 2020-09-23 2020-09-24 2020-10-07
2020-09-03 2020-09-17 2020-09-30 2020-10-01 2020-10-14
2020-09-10 2020-09-24 2020-10-07 2020-10-08 2020-10-21
2020-09-17 2020-10-01 2020-10-14 2020-10-15 2020-10-28
2020-09-24 2020-10-08 2020-10-21 2020-10-22 2020-11-04
2020-10-01 2020-10-15 2020-10-28 2020-10-29 2020-11-11
2020-10-08 2020-10-22 2020-11-04 2020-11-05 2020-11-18
2020-10-15 2020-10-29 2020-11-11 2020-11-12 2020-11-25
2020-10-22 2020-11-05 2020-11-18 2020-11-19 2020-12-02
2020-10-29 2020-11-12 2020-11-25 2020-11-26 2020-12-09
2020-11-05 2020-11-19 2020-12-02 2020-12-03 2020-12-16
2020-11-12 2020-11-26 2020-12-09 2020-12-10 2020-12-23
2020-11-19 2020-12-03 2020-12-16 2020-12-17 2020-12-30
2020-11-26 2020-12-10 2020-12-23 2020-12-24 2021-01-06
2020-12-03 2020-12-17 2020-12-30 2020-12-31 2021-01-13
2020-12-10 2020-12-24 2021-01-06 2021-01-07 2021-01-20
2020-12-17 2020-12-31 2021-01-13 2021-01-14 2021-01-27
2020-12-24 2021-01-07 2021-01-20 2021-01-21 2021-02-03
2020-12-31 2021-01-14 2021-01-27 2021-01-28 2021-02-10

2) Which data to “allow” to be used to make a specific ML forecast?

Sources

Main datasets for this competition are already available as renku datasets and in climetlab for both variables temperature and total precipitation. In climetlab, we have one dataset lab for the Machine Learning community and S2S forecasting community, which both lead to the same datasets:

tag in climetlab (ML community) tag in climetlab (S2S community) Description renku dataset(s)
training-input hindcast-input deterministic daily lead_time reforecasts/hindcasts initialized once per week 2000 to 2019 on dates of 2020 thursdays forecasts from models ECMWF, ECCC, NCEP biweekly lead_time: {model}_hindcast-input_2000-2019_biweekly_deterministic.zarr
test-input forecast-input deterministic daily lead_time real-time forecasts initialized on thursdays 2020 from models ECMWF, ECCC, NCEP biweekly lead_time: {model}_forecast-input_2020_biweekly_deterministic.zarr
training-output-reference hindcast-like-observations CPC daily observations formatted as 2000-2019 hindcasts with forecast_time and lead_time biweekly lead_time deterministic: hindcast-like-observations_2000-2019_biweekly_deterministic.zarr; probabilistic in 3 categories: hindcast-like-observations_2000-2019_biweekly_terciled.zarr
test-output-reference forecast-like-observations CPC daily observations formatted as 2020 forecasts with forecast_time and lead_time biweekly lead_time: forecast-like-observations_2020_biweekly_deterministic.zarr; binary in 3 categories: forecast-like-observations_2020_biweekly_terciled.nc
training-output-benchmark hindcast-benchmark ECMWF week 3+4 & 5+6 re-calibrated probabilistic 2000-2019 hindcasts in 3 categories -
test-output-benchmark forecast-benchmark ECMWF week 3+4 & 5+6 re-calibrated probabilistic real-time 2020 forecasts in 3 categories ecmwf_recalibrated_benchmark_2020_biweekly_terciled.nc
- - Observations-based tercile category edges calculated from 2000-2019 hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc

Note that tercile_edges separating observations into the category "below normal" [0.-0.33), "near normal" [0.33-0.67) or "above normal" [0.67-1.] depend on longitude (240), latitude (121), lead_time (46 days or 2 bi-weekly), forecast_time.weekofyear (53) and category_edge (2).

We encourage to use subseasonal forecasts from the S2S and SubX projects:

However, any other publicly available data sources (like CMIP, NMME, DCPP etc.) of dates prior to the forecast_time can be used for training-input and forecast-input. Also purely empirical methods like persistence or climatology could be used. The only essential data requirement concerns forecast times and dates, see timings.

Ground truth sources are NOAA CPC temperature and total precipitation from IRIDL:

Examples

Join

Follow the steps in the template renku project.

Training

Where to train?

We are looking for your smart solutions here. Find a quick start template here.

Discussion

Please use the issue tracker in the renkulab s2s-ai-challenge gitlab repository for questions to the organizers, discussions, bug reports. We have set up a Gitter chat room for informal communication.

FAQ

Answered questions from the issue tracker are regularly transferred to the FAQ.

Leaderboard

RPSS

Submissions have to beat the ECMWF re-calibrated benchmark and climatology while following the rules to qualify for prizes.

The leaderboard will be made public after the submission period ends and submission codes have been made public, i.e. early November 2021.

We will also publish RPSS subleaderboards, that are purely diagnostic and show RPSS for two variables (t2m, tp), two lead_times (weeks 3-4 & 5-6) and three subregions ([90N-30N], (30N-30S), [30S-60S]).

Peer review

From November 2021 to January 2022, there will be two peer review processes:

Peer review will evaluate:

Open peer review

One goal of this challenge is to foster a conversation about how AI/ML can improve S2S forecasts. Therefore, we will open the floor for discussions and evaluating all methods submitted in an open peer review process. The organizers will create a table of all submissions and everyone is invited to comment on submissions, like in the EGU’s public interactive discussions. This open peer review will be hosted on renku’s gitlab.

Expert peer review

The organizers decided that the top four submissions will be evaluated by expert peer review. This will include 2-3 reviews by experts from the fields of S2S & AI/ML. Additionally, the organizers will host a public showcase session in January 2022, in which these top four submission can present their method in 10 minutes followed by 15 minutes Q&A. The reviewers will give their review grades after an internal discussion moderated by Andrew Robertson and Frederic Vitart acting as editors.

Based on the criteria above, the expert peer reviewers will give a peer review grade. Comments from the open peer review can be taken into account by the expert peer review grades.

Final

The review grades will be ranked. The RPSS leaderboard will also be ranked. The final leaderboard will be determined from the average of both rankings. When two submissions have the same mean ranking, the review ranking counts more. The top three submissions based on the combined RPSS and expert peer-review score will receive prizes.

Rules

Organizers

WMO logo WWRP logo WCRP logo S2S logo SDSC logo ECMWF logo