October 15, 2015

Intro

Dave Childers

  • Data Scientist with Powerley
  • Before that: Data Science Consulting
  • Before that: CSCAR at UM

Example Data Set

All Cocaine Seizures in 2007

state potency weight month price
WA 77 217 1 5000
CT 51 248 1 4800
FL 68 43 1 3500
OH 69 123 1 3500
MS 75 118 1 3400

Motivation

## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.

Motivation

  • Graphics moving to web
  • Interactivity
  • Fast Exploratory Data Analysis

Outline

  1. Dependencies
  2. Syntax
  3. Exploratory Data Analysis
  4. Interactivity

ggvis dependencies

pipe operator

read code like a book

# easy to read/write
cocaine %>%
  mutate(log_weight = log(weight)) %>%
  filter(weight <= 10)

# hard to read/write
filter(mutate(cocaine, log_weight = log(weight)), weight <= 10)

pipe introduction

Syntax

ggplot

ggplot(cocaine, aes(x = weight, y = price)) +
  geom_point()

ggvis

ggvis(cocaine, x = ~weight, y = ~price) %>%
  layer_points

ggplot v ggvis

Syntax Comparison

ggplot(cocaine, aes(x = weight, y = price)) +
  geom_point()

ggvis(cocaine, x = ~weight, y = ~price) %>%
  layer_points

Similarities

  • Define graphic by composing small functions
  • Mapping variables to visual properties
  • Property Inheritance

ggplot v ggvis

Syntax Comparison

ggplot(cocaine, aes(x = weight, y = price)) +
  geom_point()

ggvis(cocaine, x = ~weight, y = ~price) %>%
  layer_points

Differences (ggplot -> ggvis)

  1. plus -> pipe
  2. layer -> geom
  3. point -> points
  4. aes() -> ~

Table of Geoms/Layers

geom layer
geom_bar layer_rects
geom_histogram layer_histograms
geom_density layer_densities
geom_line layer_lines
geom_smooth layer_smooths
geom_text layer_text

No Layers

geom layer
geom_abline TBD
geom_errorbar TBD
geom_jitter TBD
geom_freqpoly TBD

Setting Constants, Mapping Variables

# mapping state -> color
ggplot(cocaine, aes(x = weight, y = price)) +
  geom_point(aes(color = state))

# setting color to a constant value
ggplot(cocaine, aes(x = weight, y = price)) +
  geom_point(color = "orange")

Variables/Constants Mapping/Setting

Variable Constant
Map fill = ~state fill = state
Set fill := ~state fill := state

Exploratory Data Analysis

Cocaine Seizures

Histograms

ggvis(cocaine, x = ~price) %>% 
  layer_histograms(width = 100, center = 50)

Densities

ggvis(cocaine, x = ~price) %>% layer_densities %>% 
  add_axis("y", title_offset = 50)

Bar Plot

ggvis(cocaine, x = ~state) %>% layer_bars

Bar Plot #2

What happens if we say text = ~n ?

Bar Plot Code

cocaine %>% count(state, sort = TRUE) %>%
  ggvis(x = ~reorder(state, -n), y = ~n) %>%
  filter(n >= 100) %>%
  add_axis("x", title = "State") %>%
  layer_bars %>%
  layer_text(text := ~n, fontSize := 20) 
  # what happens if we say text = ~n ?

Title?

Title?

A Hack

cocaine %>% count(state, sort = TRUE) %>%
  ggvis(x = ~reorder(state, -n), y = ~n) %>%
  filter(n >= 100) %>%
  layer_bars %>%
  layer_text(text := ~n) %>%
  add_axis("x", title = "State") %>%
  add_axis(
    "x", 
    title = "2007 Cocaine Seizures by State", 
    orient = "top",
    ticks = 0,
    properties = axis_props(
      axis = list(stroke = "white"),
      labels = list(fontSize = 0)
    )
    )

Not (Yet) Implemented

  • ggtitle()
  • coord_flip()
  • themes
  • Faceting

Smooths

%>% layer_smooths

Linear Model

%>% layer_model_predictions(model = "lm", formula = price ~ weight)

Smooths by state

Smooths by state

Code

ggvis(cocaine, ~weight, ~price, fill = ~state, stroke = ~state) %>%
  filter(state %in% c("FL", "IN", "NY")) %>%
  mutate(log_weight = log(weight), log_price = log(price)) %>%
  layer_points(~log_weight, ~log_price, opacity := 0.2) %>%
  auto_group() %>%
  layer_model_predictions(
    model = "lm", 
    formula = log_price ~ log_weight
  ) %>%
  add_axis("x", title = "Log Price") %>%
  add_axis("y", title = "Log Weight")

dplyr methods on ggvis objects

methods(class = "ggvis")
##  [1] arrange_                 compute_align           
##  [3] compute_bin              compute_boxplot         
##  [5] compute_count            compute_density         
##  [7] compute_model_prediction compute_stack           
##  [9] compute_tabulate         distinct_               
## [11] explain                  filter_                 
## [13] group_by_                groups                  
## [15] knit_print               mutate_                 
## [17] print                    rename_                 
## [19] select_                  slice_                  
## [21] summarise_               transmute_              
## [23] ungroup                 
## see '?methods' for accessing help and source code

ggvis scales options

type scale
time scale_datetime
categorical scale_nominal
numeric scale_numeric
ordered scale_ordinal

Interactivity

Control Size and Model

## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.

Interactive Control Commands

%>% layer_points(
    size := input_slider(min = 50, max = 500, value = 50, step = 50)
)

Interactive Controls: Shiny & ggvis

Shiny ggvis
checkboxGroupInput input_checkboxgroup
checkboxInput input_checkbox
radioButtons input_radiobuttons
numericInput input_numeric
selectInput input_select
sliderInput input_slider
textInput input_text
dateInput [NA]

Tooltip

## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.

Tooltip Commands

%>% add_tooltip(function(x) x$id) 

ggvis interactivity

Limitations

  • cannot switch between data sets

  • cannot add/remove layers

  • need Shiny for full interactivity

add ggvis to Shiny

type server.R ui.R
plot renderPlot plotOutput
ggvis bind_shiny ggvisOutput

Why ggvis?

  • More interactive than ggplot

  • Faster exploratory analysis than shiny

  • Be cautious about using in production

Stackoverflow Tags

Thanks!

Resources