1.1 A case study of housing sales in Texas
The plotly package depends on ggplot2 which bundles a data set on monthly housing sales in Texan cities acquired from the TAMU real estate center. After the loading the package, the data is “lazily loaded” into your session, so you may reference it by name:
library(plotly) txhousing #> # A tibble: 8,602 x 9 #> city year month sales volume median listings inventory date #> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Abilene 2000 1 72.0 5380000 71400 701 6.30 2000 #> 2 Abilene 2000 2 98.0 6505000 58700 746 6.60 2000 #> 3 Abilene 2000 3 130 9285000 58100 784 6.80 2000 #> 4 Abilene 2000 4 98.0 9730000 68600 785 6.90 2000 #> 5 Abilene 2000 5 141 10590000 67300 794 6.80 2000 #> 6 Abilene 2000 6 156 13910000 66900 780 6.60 2000 #> # ... with 8,596 more rows
In attempt to understand house price behavior over time, we could plot
date on x,
median on y, and group the lines connecting these x/y pairs by
city. Using ggplot2, we can initiate a ggplot object with the
ggplot() function which accepts a data frame and a mapping from data variables to visual aesthetics. By just initiating the object, ggplot2 won’t know how to geometrically represent the mapping until we add a layer to the plot via one of
stat_*()) functions (in this case, we want
geom_line()). In this case, it is also a good idea to specify alpha transparency so that 5 lines plotted on top of each other appear as solid black, to help avoid overplotting.
If you’re new to ggplot2, the ggplot2 cheatsheet provides a nice quick overview. The online docs or R graphics cookbook are helpful for learning by example, and the ggplot2 book provides a nice overview of the conceptual underpinnings.
p <- ggplot(txhousing, aes(date, median)) + geom_line(aes(group = city), alpha = 0.2)
Now that we have a valid ggplot2 object,
p, the plotly package provides the
ggplotly() function which converts a ggplot object to a plotly object. By default, it supplies the entire aesthetic mapping to the tooltip, but the
tooltip argument provides a way to restrict tooltip info to a subset of that mapping. Furthermore, in cases where the statistic of a layer is something other than the identity function (e.g.,
geom_hex()), relevant “intermediate” variables generated in the process are also supplied to the tooltip. This provides a nice mechanism for decoding visual aesthetics (e.g., color) used to represent a measure of interest (e.g, count/value). Figure 1.1 demonstrates tooltip functionality for a number of scenarios, and uses
subplot() function from the plotly package (discussed in more detail in Arranging multiple views) to concisely display numerous interactive versions of ggplot objects.
subplot( p, ggplotly(p, tooltip = "city"), ggplot(txhousing, aes(date, median)) + geom_bin2d(), ggplot(txhousing, aes(date, median)) + geom_hex(), nrows = 2, shareX = TRUE, shareY = TRUE, titleY = FALSE, titleX = FALSE )