Splatter Terrain: An Interactive 3D Visualization Framework for Understanding Dense Scatterplots
Pranab K. Banerjee, (Space Dynamics Laboratory), Pranab.Banerjee@sdl.usu.edu
The information age has brought with it the challenge of managing data
deluge. Today, we are surrounded by more data than we can comprehend
and this gap between the technological capabilities in data
acquisition and information extraction seems to be widening. The field
of sensor hardware development has seen tremendous growth in recent
years fueled by our desire to understand and investigate the world
around us in greater detail, in multiple modalities, and from diverse
viewpoints. These developments have been beneficial to defense and
national security since newer sensors can capture data at higher
resolutions, sense more channels, and handle higher communication
bandwidth for data sharing, thus providing the data streams necessary
for real-time high fidelity intelligence gathering and situational
awareness. However, data does not equate to information. Domain
specific relevant information embedded in high volume, high velocity
raw data streams is often sparse and it requires careful data mining
and analysis to discover and comprehend such knowledge. The enormous
volume, velocity, variety, and high dimensionality of data produced by
modern data acquisition machinery overwhelm our capabilities to
analyze and comprehend such embedded information because of
computational limitations, as well as human cognitive constraints.
Effective comprehension of trends, outliers, and correlations in data streams is important for gaining crucial situational awareness for timely decision making. This is particularly important in defense and national security where the utility and effectiveness of extracted intelligence may have a short utility span and real-time or near real-time comprehension of embedded information is critical for optimal exploitation of this intelligence. Visualization can play a key role in such data analysis and knowledge discovery since the faculty of human visual cognition has evolved to be an efficient, massively parallel and fairly robust pattern discovery and recognition engine capable of identifying interesting visual features even in the presence of some noise. But, many time-proven visualization tools that are effective for smaller data sets break down when applied to large data volumes. As a result, innovative techniques and algorithms are needed for large dataset visualization.
A particularly useful and time proven method for discovery of correlation and outliers in data is the scatterplot which is a two dimensional x-y point plot where each axis represents an entity of interest and a point in the plot corresponds to an (x,y) tuple that appears in a record in the dataset. Visual inspection of these plots can reveal the relationships between these entities. Traditional 2D scatterplots, however, suffer from a serious drawback for large datasets - that of visual clutter resulting from spatial overlapping of points, making them unresolvable. The clutter mitigation techniques proposed in the past can be broadly classified into two categories: (i) data reduction, and (ii) spatial reorganization.
Techniques in the data reduction category are essentially based on various sampling or quantization schemes that result in fewer points to be displayed. Uniform random sampling can produce a low density representation that can maintain the overall trend in the data. However, sampling can introduce artifacts not present in original data and relatively small clusters may not be preserved. Besides, the process of crossing resolution boundaries in multi-resolution representation spaces resulting from various levels of sampling in order to explore the details as well as overviews can pose perceptual continuity challenges.
Non-uniform sampling techniques can address the issue of preserving small clusters in the lower density representation, but these schemes are undesirable for scatterplots because they alter the underlying statistical properties of the dataset.
Data filtering schemes have been proposed to reduce clutter by selectively displaying certain subsets of the original data space according to some selection or filtering criteria. For example, visual clustering schemes reduce clutter by aggregating pixels that are similar, based on a predefined similarity metric. However, these are not particularly suitable in the case of scatterplots because they alter the underlying data, and any correlation information is lost in the process. Techniques based on distortion of visual representation space, such as the ``fisheye view'' are useful in clutter reduction in many information visualization tasks. It displays a low density representation of the overall data space but allows higher resolution views of local regions of interest. However, this approach is not helpful for scatterplots because visual discovery of correlations and clusters between the two dimensions become difficult and cognitively stressful unless the entire display has a uniform resolution.
Spatial reorganization techniques are based on either a heuristic or an optimal bijective spatial mapping function that redefines the spatial coordinates of the points. The main drawback of this class of algorithms is that the number of points that can be effectively displayed is limited by the number of pixels in the display.
This paper describes a novel fully interactive and intuitive 3D environment for effective visualization and analysis of dense scatterplots. Conventional 2D scatterplots are transformed to a 3D terrain, called Splatter Terrain, by adapting the idea of splatting from the field of 3D volume rendering. This technique does not require any data reduction or data perturbation and it produces a visually intuitive and clutter free overview of the point densities for easy identification of interesting regions for further drill down analysis. This approach is motivated by Ben Schneiderman's visual information seeking mantra: ``Overview first, zoom and filter, then details-on-demand''.
To generate the Splatter Terrain, each point in the scatterplot is subjected to a splatting kernel that has the effect of distributing the influence of the point to its immediate neighborhood. A 2D image, called the ``splat image'' is generated where each pixel corresponds to the sum of the influences from all neighboring points affecting that spatial location. The splatting process, thus, converts a scatterplot consisting of discrete distribution of points to a 2D image covering the spatial extent of the plot. This image is then used as a height-map to render the final 3D Splatter Terrain that makes it easy to visually discover underlying statistical relationships between the parameters. The height of the terrain at a point corresponds to the number of points in the neighborhood around that spatial location. A 2D Gaussian is commonly used as the splatting kernel and the the influence of a point effected by the kernel is limited to a finite domain centered around the point for computational efficiency. For faster approximate computation of the 2D splat image, a regular grid of appropriate resolution spanning the entire scatter plot is generated where original points in the scatter plot are binned into appropriate grid locations. The splat image is then obtained as the convolution of the splatting kernel with the grid bins carried out in the frequency domain. The Splatter Terrain is texture mapped with a pseudo-colored version of the splat image for enhanced comprehension of the point distribution in the scatter plot.
The Splatter Terrain provides an easy to understand overview of an arbitrarily dense and large scatterplot but it is not good at showing outliers or small clusters as they may be hard to distinguish from a flat region. To address this issue, the visualization tool displays the original 2D scatter plot along with the Splatter Terrain in a spatially registered manner. A pair of interactive orthonormal semi-transparent planes make it easy to visually associate any point on the surface of the Splatter Terrain to a location in the scatterplot for on-demand drill down analysis. A heads-up display scheme is employed to show the coordinates of the parameter space as well as point density information dynamically as these planes are interactively manipulated to select points on the Splatter Terrain surface. This allows the user to view statistical information without moving his/her eyes away from the core visual representation. A common problem in 3D visualization is occlusion. The tool addresses this problem by providing a fully interactive environment that allows the user to manipulate the terrain and the spatially registered scatterplot in real time through rotation and zooming. In addition, a pair of semi-transparent parallel ``measuring planes'' are deployed for easy comparison of the heights of the Splatter Terrain at two different locations. This is useful for exploring regions with subtle differences in point densities that may be hard to comprehend from the pseudo-colormapped splat image or iso-contours. The measuring planes can be texture mapped with the splat image with a user defined colormap and their levels of transparency can be interactively varied.
The visualization framework has been tested with atmospheric datasets showing significant clutter in the scatterplot, and the initial response has been positive based on an informal user study.