1.1 A case study of housing sales in Texas

The plotly package depends on ggplot2 which bundles a data set on monthly housing sales in Texan cities acquired from the TAMU real estate center. After the loading the package, the data is “lazily loaded” into your session, so you may reference it by name:

library(plotly)
txhousing
#> # A tibble: 8,602 × 9
#>      city  year month sales   volume median listings inventory  date
#>     <chr> <int> <int> <dbl>    <dbl>  <dbl>    <dbl>     <dbl> <dbl>
#> 1 Abilene  2000     1    72  5380000  71400      701       6.3  2000
#> 2 Abilene  2000     2    98  6505000  58700      746       6.6  2000
#> 3 Abilene  2000     3   130  9285000  58100      784       6.8  2000
#> 4 Abilene  2000     4    98  9730000  68600      785       6.9  2000
#> 5 Abilene  2000     5   141 10590000  67300      794       6.8  2000
#> 6 Abilene  2000     6   156 13910000  66900      780       6.6  2000
#> # ... with 8,596 more rows

In attempt to understand house price behavior over time, we could plot date on x, median on y, and group the lines connecting these x/y pairs by city. Using ggplot2, we can initiate a ggplot object with the ggplot() function which accepts a data frame and a mapping from data variables to visual aesthetics. By just initiating the object, ggplot2 won’t know how to geometrically represent the mapping until we add a layer to the plot via one of geom_*() (or stat_*()) functions (in this case, we want geom_line()). In this case, it is also a good idea to specify alpha transparency so that 5 lines plotted on top of each other appear as solid black, to help avoid overplotting.

If you’re new to ggplot2, the ggplot2 cheatsheet provides a nice quick overview. The online docs or R graphics cookbook are helpful for learning by example, and the ggplot2 book provides a nice overview of the conceptual underpinnings.

p <- ggplot(txhousing, aes(date, median)) +
  geom_line(aes(group = city), alpha = 0.2)

1.1.1 The ggplotly() function

Now that we have a valid ggplot2 object, p, the plotly package provides the ggplotly() function which converts a ggplot object to a plotly object. By default, it supplies the entire aesthetic mapping to the tooltip, but the tooltip argument provides a way to restrict tooltip info to a subset of that mapping. Furthermore, in cases where the statistic of a layer is something other than the identity function (e.g., geom_bin2d() and geom_hex()), relevant “intermediate” variables generated in the process are also supplied to the tooltip. This provides a nice mechanism for decoding visual aesthetics (e.g., color) used to represent a measure of interest (e.g, count/value). Figure 1.1 demonstrates tooltip functionality for a number of scenarios, and uses subplot() function from the plotly package (discussed in more detail in Arranging multiple views) to concisely display numerous interactive versions of ggplot objects.

subplot(
  p, ggplotly(p, tooltip = "city"), 
  ggplot(txhousing, aes(date, median)) + geom_bin2d(),
  ggplot(txhousing, aes(date, median)) + geom_hex(),
  nrows = 2, shareX = TRUE, shareY = TRUE,
  titleY = FALSE, titleX = FALSE
)