## Navigating many views

Sometimes you have to consider way more views than you can possibly digest visually. In Multiple linked views, we explore some useful techniques for implementing the popular visualization mantra from Shneiderman (1996):

“Overview first, zoom and filter, then details-on-demand.”

In fact, Figure 3.15 from that section provides an example of this mantra put into practice. The correlation matrix provides an overview of the correlation structure between all the variables, and by clicking a cell, it populates a scatterplot between those two specific variables. This works fine with tens or hundreds or variables, but once you have thousands or tens-of-thousands of variables, this technique begins to fall apart. At that point, you may be better off defining a range of correlations that you’re interested in exploring, or better yet, incorporating another measure (e.g., a test statistic), then focusing on views that match a certain criteria.

Tukey and Tukey (n.d.) first described the idea of using quantitative measurements of scatterplot characteristics (e.g. correlation) to help guide exploratory analysis of many variables. This idea, coined scagnostics (short for scatterplot diagnostics), has since been made explicit, and many measures have been explored, even measures specifically useful for time-series have been proposed (Wilkinson, Anand, and Grossman 2005); (Wilkinson and Wills 2008); (Dang and Wilkinson 2012). Probably the most universally useful scagnostic is the outlying measure which helps identify projections of the data space that contain outlying observations. Of course, the idea of associating quantitative measures with a graphical display of data can be generalized to include more that just scatterplots, and in this more general case, these measures are sometimes referred to as cognostics.

The same problems and principles that inspired scagnostics has inspired work on more general divide & recombine technique(s) for working with navigating through many statistical artifacts (Cleveland and Hafen 2014); (Saptarshi Guha and Cleveland 2012), including visualizations (Hafen et al. 2013). The **trelliscope** package provides a system for computing arbitrary cognostics on each panel of a trellis display as well as an interactive graphical user interface for defining (and navigating through) interesting panels based on those cognostics (Hafen 2016). This system also allows users to define the graphical method for displaying each panel, so **plotly** graphs can easily be embedded. The **trelliscope** package is currently built upon **shiny**, but as Figure 3.12 demonstrates, the **trelliscopejs** package provides lower-level tools that allow one to create trelliscope displays without **shiny** (Hafen and Schloerke, n.d.).

```
library(trelliscopejs)
qplot(cty, hwy, data = mpg) +
xlim(7, 37) + ylim(9, 47) + theme_bw() +
facet_trelliscope(
~ manufacturer + class, nrow = 2, ncol = 4,
as_plotly = TRUE, plotly_args = list(dynamicTicks = T)
)
```

Shneiderman, Ben. 1996. “The Eyes Have It:A Task by Data Type Taxonomy for Information Visualizations.” *VL Proceedings of the IEEE Symposium on Visual Languages*, January, 1–9.

Tukey, J. W., and P. A. Tukey. n.d. “Computer Graphics and Exploratory Data Analysis: An Introduction.” In *In Proceedings of the Sixth Annual Conference and Exposition: Computer Graphics85*.

Wilkinson, Leland, Anushka Anand, and Robert Grossman. 2005. “Graph-Theoretic Scagnostics.” In *Proceedings of the Proceedings of the 2005 Ieee Symposium on Information Visualization*, 21. INFOVIS ’05. Washington, DC, USA: IEEE Computer Society. https://doi.org/10.1109/INFOVIS.2005.14.

Wilkinson, Leland, and Graham Wills. 2008. “Scagnostics Distributions.” *Journal of Computational and Graphical Statistics*, no. 2:473–91.

Dang, Tuan Nhon, and Leland Wilkinson. 2012. “Timeseer: Detecting interesting distributions in multiple time series data.” *VINCI*, October, 1–9.

Cleveland, William S., and Ryan Hafen. 2014. “Divide and Recombine (d&R): Data Science for Large Complex Data.” *Statistical Analysis and Data Mining: The ASA Data Science Journal* 7 (6):425–33.

Saptarshi Guha, Jeremiah Rounds, Ryan Hafen, and William S. Cleveland. 2012. “Large Complex Data: Divide and Recombine (d&R) with Rhipe.” *The ISI’s Journal for the Rapid Dissemination of Statistics Research*, August, 53–67.

Hafen, R., L. Gosink, J. McDermott, K. Rodland, K. K. V. Dam, and W. S. Cleveland. 2013. “Trelliscope: A System for Detailed Visualization in the Deep Analysis of Large Complex Data.” In *Large-Scale Data Analysis and Visualization (Ldav), 2013 Ieee Symposium on*, 105–12. https://doi.org/10.1109/LDAV.2013.6675164.