Applied Mathematics and Computational Sciences
Getting a visual on complex data
A new method for visually presenting complex data distributions provides a muchneeded tool for management, analysis and interpretation.
As it becomes easier and cheaper to monitor and record information at everfiner resolutions in time and space, the data sets used by scientists, engineers and managers become larger and more complex. This complexity renders inadequate the tools traditionally used to visually summarize data. Now a KAUST researcher has helped to devise an analogous approach to data visualization that promises to make even complex, multidimensional data sets more accessible and informative.
“As technology advances, the measurement of complex data that varies over time or space is becoming prevalent in many fields, including medicine, ecology, biology, biometrics, bioinformatics, computer vision and finance,” said Ying Sun, Professor of Applied Mathematics and Computational Science. “There has been much progress in the development of statistical analysis tools to handle complex data; however, much less attention has been given to visualization, which is an integral step in exploratory data analysis.”
A boxandwhisker plot is a conventional visualization that involves calculating five key statistical metrics for a classical onedimensional data set, such as the heights of a population of people. These metrics provide the average, 25 and 75 percentiles, minimum and maximum, which can then be displayed on a chart as a box centered around the average with whiskers extending to the maximum and minimum. This enables quick and intuitive interpretation of the data without the need to understand or to analyze the underlying data.
Sun collaborated with Sebastian Kurtek, a Professor from Ohio State University, and other colleagues, to apply this idea to data with a similar distribution but that had been measured over space or time; for example, daily seasurface temperatures across many different locations. In this case, a continuous line or function of temperature variation can be drawn across all locations for each day, and plotting the data set for an entire year will result in 365 such lines drawn on one graph^{1}.
The team developed a boxplotlike visualization by analyzing important characteristics of each function–the translation or offset in the data, the amplitude and the phase or overall shape of the function.
“Our method involves calculating a new version of the fivemetric summary as a function, but we do this separately for the translation, amplitude and phase of the original function, which results in a much more meaningful visualization,” explained Sun (see image). “Our method can identify signals, such as El Nino or La Nina events based on seasurface temperature, or other patterns, like a heart attack based on echocardiogram traces.”
References

Xie, W., Kurtek, S., Bharath, K., & Sun, Y. A geometric approach to visualization of variability in functional data. Journal of the American Statistical Association advance online publication, 24 October (2016).  article
You might also like
Applied Mathematics and Computational Sciences
A new spin on bouncing sound waves
Applied Mathematics and Computational Sciences
Keep the data coming
Applied Mathematics and Computational Sciences
Quieter wind beneath the wings
Applied Mathematics and Computational Sciences
Steep learning in mathematics of big waves
Applied Mathematics and Computational Sciences
Speeding up the machine learning process
Applied Mathematics and Computational Sciences
Better racing car design through an industry partnership
Applied Mathematics and Computational Sciences
Mining Red Sea bacteria for industrial potential
Applied Mathematics and Computational Sciences