2.1 Scatter traces
A plotly visualization is composed of one (or more) trace(s), and every trace has a
type. The default trace type, “scatter”, can be used to draw a large amount of geometries, and actually powers many of the
add_*() functions such as
add_polygons(). Among other things, these functions make assumptions about the mode of the scatter trace, but any valid attribute(s) listed under the scatter section of the figure reference may be used to override defaults.
plot_ly() function has a number of arguments that make it easier to scale data values to visual aesthetics (e.g.,
sizes). These arguments are unique to the R package and dynamically determine what objects in the figure reference to populate (e.g.,
line.color). Generally speaking, the singular form of the argument defines the domain of the scale (data) and the plural form defines the range of the scale (visuals). To make it easier to alter default visual aesthetics (e.g., change all points from blue to black), “AsIs” values (values wrapped with the
I() function) are interpreted as values that already live in visual space, and thus do not need to be scaled. The next section on scatterplots explores detailed use of the
sizes arguments. The section on lineplots explores detailed use of the
The scatterplot is useful for visualizing the correlation between two quantitative variables. If you supply a numeric vector for x and y in
plot_ly(), it defaults to a scatterplot, but you can also be explicit about adding a layer of markers/points via the
add_markers() function. A common problem with scatterplots is overplotting, meaning that there are multiple observations occupying the same (or similar) x/y locations. There are a few ways to combat overplotting including: alpha transparency, hollow symbols, and 2D density estimation. Figure 2.1 shows how alpha transparency and hollow symbols can provide an improvement over the default.
In Figure 2.1, hollow circles are specified via
symbol = I(1). By default, the
symbol argument (as well as the
linetype arguments) assumes value(s) are “data”, which need to be mapped to a visual palette (provided by
symbols). Wrapping values with the
I() function notifies
plot_ly() that these values should be taken “AsIs”. If you compare the result of
plot(1:25, 1:25, pch = 1:25) to Figure 2.2, you’ll see that
plot_ly() can translate R’s plotting characters (pch), but you can also use plotly.js’ symbol syntax, if you desire.
When mapping a numeric variable to
symbol, it creates only one trace, so no legend is generated. If you do want one trace per symbol, make sure the variable you’re mapping is a factor, as Figure 2.3 demonstrates. When plotting multiple traces, the default plotly.js color scale will apply, but you can set the color of every trace generated from this layer with
color = I("black"), or similar.
color argument adheres to similar rules as
colorproduces one trace, but colorbar is also generated to aide the decoding of colors back to data values. The
colorbar()function can be used to customize the appearance of this automatically generated guide. The default colorscale is viridis, a perceptually-uniform colorscale (even when converted to black-and-white), and perceivable even to those with common forms of color blindness (Data Science 2016).
colorproduces one trace per value, meaning a legend is generated. If an ordered factor, the default colorscale is viridis (Garnier 2016); otherwise, it is the “Set2” palette from the RColorBrewer package (Neuwirth 2014)
There are a number of ways to alter the default colorscale via the
colors argument. This argument excepts: (1) a color brewer palette name (see the row names of
RColorBrewer::brewer.pal.info for valid names), (2) a vector of colors to interpolate, or (3) a color interpolation function like
scales::colour_ramp(). Although this grants a lot of flexibility, one should be conscious of using a sequential colorscale for numeric variables (& ordered factors) as shown in 2.5, and a qualitative colorscale for discrete variables as shown in 2.6.
col1 <- c("#132B43", "#56B1F7") col2 <- viridisLite::inferno(10) col3 <- colorRamp(c("red", "white", "blue")) subplot( add_markers(p, color = ~cyl, colors = col1) %>% colorbar(title = "ggplot2 default"), add_markers(p, color = ~cyl, colors = col2) %>% colorbar(title = "Inferno"), add_markers(p, color = ~cyl, colors = col3) %>% colorbar(title = "colorRamp") ) %>% hide_legend()
col1 <- "Pastel1" col2 <- colorRamp(c("red", "blue")) col3 <- c(`4` = "red", `5` = "black", `6` = "blue", `8` = "green") subplot( add_markers(p, color = ~factor(cyl), colors = col1), add_markers(p, color = ~factor(cyl), colors = col2), add_markers(p, color = ~factor(cyl), colors = col3) ) %>% hide_legend()
For scatterplots, the
size argument controls the area of markers (unless otherwise specified via sizemode), and must be a numeric variable. The
sizes argument controls the minimum and maximum size of circles, in pixels:
220.127.116.11 3D scatterplots
To make a 3D scatterplot, just add a
18.104.22.168 Scatterplot matrices
Scatterplot matrices can be made via
ggplotly() has a special method for translating ggmatrix objects from the GGally package to plotly objects (Schloerke et al. 2016). These objects are essentially a matrix of ggplot objects and are the underlying data structure which powers higher level functions in GGally, such as
ggpairs() – a function for creating a generalized pairs plot (Emerson et al. 2013). The generalized pairs plot can be motivated as a generalization of the scatterplot matrix with support for categorical variables and different visual representations of the data powered by the grammar of graphics. Figure 2.9 shows an interactive version of the generalized pairs plot made via
ggplotly(). In Linking views without shiny, we explore how this framework can be extended to enable linked brushing in the generalized pairs plot.
2.1.2 Dotplots & error bars
A dotplot is similar to a scatterplot, except instead of two numeric axes, one is categorical. The usual goal of a dotplot is to compare value(s) on a numerical scale over numerous categories. In this context, dotplots are preferable to pie charts since comparing position along a common scale is much easier than comparing angle or area (Cleveland and McGill 1984); (Bostock 2010). Furthermore, dotplots can be preferable to bar charts, especially when comparing values within a narrow range far away from 0 (Few 2006). Also, when presenting point estimates, and uncertainty associated with those estimates, bar charts tend to exaggerate the difference in point estimates, and lose focus on uncertainty (Messing 2012).
A popular application for dotplots (with error bars) is the so-called “coefficient plot” for visualizing the point estimates of coefficients and their standard error. The
coefplot() function in the coefplot package (Lander 2016) and the
ggcoef() function in the GGally both produce coefficient plots for many types of model objects in R using ggplot2, which we can translate to plotly via
ggplotly(). Since these packages use points and segments to draw the coefficient plots, the hover information is not the best, and it’d be better to use error objects. Figure 2.10 uses the
tidy() function from the broom package (Robinson 2016) to obtain a data frame with one row per model coefficient, and produce a coefficient plot with error bars along the x-axis.
m <- lm(Sepal.Length~Sepal.Width*Petal.Length*Petal.Width, data = iris) # to order categories sensibly arrange by estimate then coerce factor d <- broom::tidy(m) %>% arrange(desc(estimate)) %>% mutate(term = factor(term, levels = term)) plot_ly(d, x = ~estimate, y = ~term) %>% add_markers(error_x = ~list(value = std.error)) %>% layout(margin = list(l = 200))
2.1.3 Line plots
This section surveys useful applications of
add_paths(). The only difference between these functions is that
add_lines() connects x/y pairs from left to right, instead of the order in which the data appears. Both functions understand the
alpha attributes6, as well as groupings defined by
Figure 1.2 uses
group_by() to plot one line per city in the
txhousing dataset using a single trace. Since there can only be one tooltip per trace, hovering over that plot does not reveal useful information. Although plotting many traces can be computationally expensive, it is necessary in order to display better information on hover. Since the
color argument produces one trace per value (if the variable (
city) is discrete), hovering on Figure 2.11 reveals the top ~10 cities at a given x value. Since 46 colors is too many to perceive in a single plot, Figure 2.11 also restricts the set of possible
colors to black.
Generally speaking, it’s hard to perceive more than 8 different colors/linetypes/symbols in a given plot, so sometimes we have to filter data to use these effectively. Here we use the dplyr package to find the top 5 cities in terms of average monthly sales (
top5), then effectively filter the original data to contain just these cities via
semi_join(). As Figure 2.12 demonstrates, once we have the data is filtered, mapping city to
linetype is trivial. The color palette can be altered via the
colors argument, and follows the same rules as scatterplots. The linetype palette can be altered via the
linetypes argument, and accepts R’s
lty values or plotly.js dash values.
library(dplyr) top5 <- txhousing %>% group_by(city) %>% summarise(m = mean(sales, na.rm = TRUE)) %>% arrange(desc(m)) %>% top_n(5) p <- semi_join(txhousing, top5, by = "city") %>% plot_ly(x = ~date, y = ~median) subplot( add_lines(p, color = ~city), add_lines(p, linetype = ~city), shareX = TRUE, nrows = 2 )
22.214.171.124 Density plots
In Bars & histograms, we leveraged a number of algorithms in R for computing the “optimal” number of bins for a histogram, via
hist(), and routing those results to
add_bars(). We can leverage the
density() function for computing kernel density estimates in a similar way, and routing the results to
add_lines(), as is done in 2.13.