Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save wolfram77/fffffc414960fc9d3dab3bddd9163a8a to your computer and use it in GitHub Desktop.
Save wolfram77/fffffc414960fc9d3dab3bddd9163a8a to your computer and use it in GitHub Desktop.
Learning skillful medium-range global weather forecasting : NOTES

In this paper Lam et al. propose GraphCast, an ML based method trained directly on reanalysis data. It predicts weather variables for the next 10 day at 0.25° resolution globally in under 1 minute (on a Google Cloud TPU v4). GraphCast outperforms the most accurate operational deterministic systems on 90% of 1380 verification targets --- and supports better severe event prediction, including tropical cyclone tracking, atmospheric rivers, and extreme temperatures.


Now in detail

ECMWF's IFS runs for less than am hour, every 6 hours, of every day, worldwide making weather forecasts. This is down using NWP, which involves solving the governing equations of weather using supercomputers. The success of NWP lies in rigorous and ongoing research. NWP scales to greater accuracy with greater computational resources. The top deterministic operational system in the world is ECMWF's HRES, a configuration of IFS that produces global 10-day forecasts at 0.1° latitude and longitude resolution, in around an hour.

Recently MLWP ahs helped improve NWP forecasts in regimes where NWP is relatively weak, i.e., in subseasonal heat wave prediction, and precipitation now-casting from radar images, where accurate equations and robust numerical methods are not as available.

GraphCast takes as input the two most recent states of Earth’s weather - the current time, and 6 hours earlier, and predicts the next state of the weather 6 hours ahead. A single weather state is represented by a 0.25° latitude-longitude grid (721 x 1440), which corresponds to roughly 28 km by 28 km resolution at the equator.

image

0.25° latitude-longitude grid comprising a total of 721 x 1440 = 1,038,240 points.

  • Yellow layers: 5 surface variables
  • Blue layers: 6 atmospheric variables

Repeated at 37 pressure levels, this totals to 5 + 6 x 37 = 227 variables per point, resulting in a state representation of 235,680,480 values.

image

GraphCast is based on GNNs in an “encoder-processor-decoder” configuration with a total of 36.7 million parameters.

The encoder uses a single GNN layer to map variables represented as node attributes on the input grid to learned node attributes on an internal “multimesh” representation.

The multimesh is defined by refining a regular icosahedron (12 nodes, 20 faces, 30 edges) iteratively six times, where each refinement divides each triangle into four smaller ones (leading to four times more faces and edges), and reprojecting the nodes onto the sphere. The multimesh contains the 40,962 nodes from the highest-resolution mesh (which is roughly 1/25 the number of latitude-longitude grid points at 0.25°) and the union of all the edges created in the intermediate graphs, forming a flat hierarchy of edges with varying lengths. The processor uses 16 unshared GNN layers to perform learned message-passing on the multimesh.

What does unshared mean here?

The decoder maps the final processor layer’s learned features from the multimesh representation back to the latitude-longitude grid. It uses a single GNN layer and predicts the output as a residual update to the most recent input state.

  • The encoder maps local regions into nodes of the multimesh.
  • The processor updates each multi-mesh node using learned message-passing.
  • The decoder maps the processed multimesh features back onto the grid representation.

image

The multimesh is derived from icosahedral meshes of increasing resolution, from the base mesh (M0, 12 nodes) to the finest resolution (M6, 40,962 nodes), which has uniform resolution across the globe. It contains the set of nodes from M6 and all the edges from M0 to M6. The learned message-passing over the different meshes’ edges happens simultaneously, so that each node is updated by all of its incoming edges.

image

During Lam et al. used 39 years (1979–2017) of historical data from ERA5. As a training objective, they averaged the MSE between GraphCast’s predicted states over N autoregressive steps and the corresponding ERA5 states, with the error weighted by vertical level.

The value of N was increased incrementally from 1 to 12 (i.e., from 6 hours to 3 days) over the course of training, and the gradient of the loss was computed by backpropagation through time. GraphCast was trained to minimize the training objective using gradient descent, which took roughly 4 weeks on 32 Cloud TPU v4 devices using batch parallelism. Lam et al. evaluated GraphCast on the held-out data from the years 2018 onward.

The regions of the atmosphere in which HRES had better performance than GraphCast were disproportionately localized in the stratosphere. When excluding the 50 hPa level, GraphCast significantly outperforms HRES on 96.9% of the remaining 1280 targets. When excluding levels 50 and 100 hPa, GraphCast significantly outperforms HRES on 99.7% of the remaining 1180 targets.

Lam et al. found that increasing the number of autoregressive steps in the MSE loss improves GraphCast performance at longer lead times. It also encourages GraphCast to blur to a degree at longer lead times, which means that its forecasts will lie somewhere between a traditional deterministic forecast and an ensemble mean. HRES’s underlying physical equations, however, do not lead to blurred predictions. Still, blurrier forecasts may not be desirable for some applications

For Tropical cyclone tracking, GraphCast has lower median track error than HRES over the period 2018–2021 (median was chosen to resist outliers).

Atmospheric rivers are narrow regions of the atmosphere that are responsible for most of the poleward water vapor transport across the mid-latitudes and generate 30 to 65% of annual precipitation on the US West Coast. Their strength can be characterized by the vertically integrated water vapor transport IVT, indicating whether an event will provide beneficial precipitation or be associated with catastrophic damage. IVT can be computed from the nonlinear combination of the horizontal wind speed (U and V) and specific humidity. Lam et al. evaluated GraphCast forecasts over coastal North America and the Eastern Pacific during cold months (October to April), when atmospheric rivers are most frequent.

Extreme heat and cold are characterized by large anomalies with respect to typical climatology, which can be dangerous and disrupt human activities.

GraphCast can be retrained periodically with recent data, which in principle allows it to capture weather patterns that change over time - in response to, for example, the effects of climate change—and long climate oscillations

With 36.7 million parameters, GraphCast is a relatively small model by modern ML standards. Engineering challenges in fitting higher-resolution data on hardware. One key limitation of GraphCast is the focus on deterministic forecasts. But the other pillar of ECMWF’s IFS, the ensemble forecasting system ENS, is especially important for quantifying the probability of extreme events and as the skill of the forecast decreases at longer lead times. The nonlinearity of weather dynamics means that there is increasing uncertainty at longer lead times, which is not well captured by a single deterministic forecast. ENS addresses this by generating multiple, stochastic forecasts, which approximate a predictive distribution over future weather; however, generating multiple forecasts is expensive. By contrast, GraphCast’s MSE training objective encourages it to spatially blur its predictions in the presence of uncertainty, which may not be desirable for some applications where knowing tail, or joint, probabilities of events is important. Building probabilistic forecasts that model uncertainty more explicitly, along the lines of ensemble forecasts, is a crucial next step.

GraphCast should not be regarded as a replacement for traditional weather forecasting methods, which have been developed for decades, rigorously tested in many real-world contexts, and offer many features we have not yet explored.

Beyond weather forecasting, GraphCast can open new directions for other important geospatiotemporal forecasting problems, including climate and ecology, energy, agriculture, and human and biological activity, as well as other complex dynamical systems.


Dead or Alive

A few simple ideas:

  • Avoiding extreme weather is interesting, if it can be done.

Abbreviations

  • ECMWF: European Centre for Medium-Range Weather Forecasts
  • HRES: ECMWF’s High-Resolution Forecast
  • ERA5: ECMWF’s Reanalysis Archive (fifth generation)
  • MARS: ECMWF’s Meteorological Archival and Retrieval System archive
  • IFS: Integrated Forecasting System
  • NWP: Numerical Weather Prediction
  • MLWP: Machine Learning–based Weather Prediction
  • ACC: Anomaly Correlation Coefficient
  • TIGGE: THORPEX Interactive Grand Global Ensemble
  • IBTrACS: International Best Track Archive for Climate Stewardship
  • IVT: vertically Integrated water Vapor Transport

Other things:

  • Pangu-Weather
  • WeatherBench
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment