Pulling rank on spatial statistics
A technique that uses the power of computing could solve statistical problems cheaper and faster than current methods.
By applying the power of highperformance computing to one of the cornerstones of statistical methods, a technique developed by researchers from KAUST could analyze large datasets much more cheaply and quickly than current methods.
Spatial datasets can contain topographical, geometric or geographic information, such as environmental, climate or financial data, and comprise measurements taken across many locations and over long periods. The large size and high dimensionality of these datasets present significant statistical challenges for current statistical methods, which are unable to handle the computational burden and substantial cost—both increase rapidly as the size of the dataset grows—of analyzing such datasets.
These challenges led Marc Genton and David Keyes from KAUST, in collaboration with George Turkiyyah from the American University of Beirut in Lebanon, to develop a statistical method that exploits the hierarchical lowrank decomposition of covariance functions to significantly increase the speed of evaluating largescale multivariate datasets with normal probabilities.
“Our aim was to be able to evaluate highdimensional probabilities and do this faster than existing methods such that problems in statistics, which are currently intractable, become feasible,” explains Genton.
The efficient computation of multivariate normal distributed datasets, which contain correlated random variables that are grouped around a mean value, is important in many applications in statistics. However, as the dimensionality of such datasets increases, complex techniques like Monte Carlo simulations, which employ repeated random sampling and statistical analysis, must be used and can lead to computational inaccuracies at the tails of these datasets.
By exploiting the hierarchically lowrank nature of covariance matrices, in which the behavior of two random variables are related, the researchers were able to significantly reduce the computational burden, allowing them to tackle problems arising from large spatial datasets.
“The novelty of our approach arises from the collaboration between statistics and KAUST’s Extreme Computing Research Center because it allowed us to specifically bring the technology of hierarchical matrices to fundamental problems in statistical research,” says Genton.
The outcome is a practical one. The reduction in storage and arithmetic complexity required to perform matrixvector operations, as well as factorization and inversion operations from quadratic or even cubic to loglinear, makes large classes of computations viable, class that would normally be prohibitively expensive.
“The significance of our findings lies in the fact that the problem we solved is the cornerstone of many methods in statistics and therefore opens up a whole new path of exciting research problems that were out of reach before,” says Genton.
References
Genton, M.G., Keyes, D.E. & Turkiyyah, G. Hierarchical decompositions for the computation of highdimensional multivariate normal probabilities. Journal of Computational and Graphical Statistics 27, 268277 (2017). article
You might also like

Competition sheds light on approximation methods for large spatial datasets
Jan 19, 2022

How extreme weather exacerbates air pollution
Sep 4, 2018

New stats apps show a virtual reality
Nov 5, 2018

Approximating a kernel of truth
Mar 10, 2020

Improving connections for spatial analysis
Mar 4, 2017