Skip to main content

Statistics

Easing the generation and storage of climate data

New tool efficiently emulates climate data in near real-time.

The online stochastic generator developed by KAUST researchers compresses climate data storage and enables near real-time data generation, supporting faster insights into climate change.
 

A new tool created by KAUST researchers helps make climate models more practical to use[1]. The tool, called an online stochastic generator, reduces the space needed to store and analyze climate data while enabling researchers to generate nearly real-time climate data, helping them understand climate change in a timely manner.

An important element in climate modeling is reanalysis data, in which observations are incorporated into a model’s predictions to improve its accuracy. Reanalysis data can be extremely large and expensive to generate, making storage a serious issue. “The storage aspect is becoming a big problem for climate research centers because they run simulations on supercomputers that take weeks or months, and then they have terabytes of data to store somewhere for future use, which has a cost. And they’re reluctant to throw that data away,” says KAUST’s Al-Khawarizmi Distinguished Professor of Statistics Marc Genton, the study’s senior author.

To address this challenge, researchers can use tools called stochastic generators. A stochastic generator represents the climate data in a statistical model, which can be used to recreate statistically similar data. “If you fit a stochastic generator to the data, you only need to store the parameters. You could throw away the data and re-simulate it at any time quickly and cheaply,” explains Genton.

Stochastic generators also enable researchers to regenerate multiple ensembles of climate data from the stored parameters. This can give climate modelers better insight into the uncertainty of the data and help them reach more accurate predictions and conclusions about how the climate works.

However, existing stochastic generators suffer from a few shortcomings. They aren’t developed with storage constraints in mind and can’t be updated live as new data come in. “Since reanalysis data can come in real-time and span a considerable number of time points, a stochastic generator for reanalysis data must address these two challenges,” explains Dr. Yan Song, the postdoc who led the study.

Together with a collaborator at Lahore University of Management Sciences, Song and Genton have developed a stochastic generator which can incorporate new data as it comes in—an online stochastic generator. Their paper describing the new stochastic generator is one of the five finalists for the ADIA Lab Best Paper Award in Climate Data Sciences for “Pioneering Solutions for a Sustainable Future.”

The generator can take data into the model sequentially as blocks, so the model’s parameters can be updated as new data come in. “Our online stochastic generator can emulate near real-time data at high resolution, so it’s suitable for reanalysis data,” says Song. “The performance of our online stochastic generator is comparable to that of a stochastic generator developed using the entire dataset at a single time.”

Processing the data in blocks also offers a way to reduce the computational load of climate models. For example, when the available computational resources aren’t enough to store and analyze a data set, it can instead be processed as a sequence of blocks. “Because it doesn’t process all the data at once, the model we’ve developed can deal with higher resolutions in both space and time,” says Genton.

The new generator can also handle multiple variables rather than just one. The team used it to analyze two different wind speed components, but it could also be adjusted for other variables. Some variables, such as precipitation, are more complicated to model and would take more work to include in the stochastic generator. The researchers plan to continue developing the generator to handle such variables.

 

Reference
  1. Song, Y., Khalid, Z., & Genton, M.G. Online stochastic generators using Slepian bases for regional bivariate wind speed ensembles from ERA5. arXiv: arXiv:2410.08945v1 advance online publication, Oct 2024
You might also like