The Grammar of Graphics: Designing and Transforming Data

Sandhya Kambhampati
June 4, 2016

Follow along

bit.ly/dataharvestggplot

What is R?


“R is a language and environment for statistical computing and graphics”

Source: R-project.org

What is ggplot2?

  • ggplot2 is a data visualization package for R
    • developed by Hadley Wickham
    • inspired by Leland Wilkinson's “The Grammar of Graphics”
    • provides an organizing philosophy for building graphs – a structured approach to graphing

“The emphasis in ggplot2 is reducing the amount of thinking time by making it easier to go from the plot in your brain to the plot on the page.”

Wickham, 2012

The philosophy of ggplot2

  • A ggplot2 graph is built up from a few basic elements:
    • Data: the raw data you want to plot
    • Aesthetics: including Mapping e.g., which variable is on the x-axis? the y-axis? Should the color/size/position of the plotted data that be mapped to some variable?
    • Geometries: the geometric shapes that represent the data
    • Statistics: statistical transformations that are used to summarize the data

Source: Hopper (2014)

How to install ggplot2


install.packages('ggplot2')
library('ggplot2')


Make sure you have the most recent version of R to get the most recent version of ggplot2

The Data: Road Casualties in Great Britain 1969–84

data(Seatbelts)
s <- as.data.frame(Seatbelts)

Add in the time series data

ts <- data.frame(Year=floor(time(Seatbelts)),
Month=factor(cycle(Seatbelts),
labels=month.abb), Seatbelts)

The Data: Road Casualties in Great Britain 1969–84


  • DriversKilled: car drivers killed
  • drivers: monthly totals of car drivers in Great Britain killed or seriously injured Jan 1969 to Dec 1984
  • front: front-seat passengers killed or seriously injured
  • rear: rear-seat passengers killed or seriously injured
  • kms: distance driven
  • PetrolPrice: price of petrol
  • VanKilled: number of van (‘light goods vehicle’) drivers
  • law: 0/1: was the law in effect that month?

    Source: UK Driver Deaths via R datasets

Look at your data

head(ts)

How to plot your data: quick plot

qplot( data = ts,x= Year,y= DriversKilled, main= "Drivers Killed by Year")

plot of chunk unnamed-chunk-5

Basic Features of ggplot2

  • Scatter Plot
  • Bar Graph
  • Line Graph

Scatter Plot

ggplot(data = ts, 
       aes(x = Year, 
           y = DriversKilled)) + 
  geom_point() +
  ggtitle("Drivers killed by Year")

Scatter Plot

plot of chunk unnamed-chunk-7

Notice something?

qplot( data = ts,
       x= Year,
       y= DriversKilled,
       main= "Drivers Killed by Year")
ggplot(data = ts, 
       aes(x = Year, 
           y = DriversKilled)) + 
  geom_point() +
  ggtitle("Drivers killed by Year")

Bar Graph - simple

ggplot(data = ts, 
       aes(x = Year, 
           y = VanKilled)) +
  geom_bar(stat = 'identity')

Bar Graph - simple

plot of chunk unnamed-chunk-11

Line Graph

ggplot(data = ts, 
       aes(x = Year, 
           y = front)) +
  geom_point() +
  geom_line()

Line Graph

plot of chunk unnamed-chunk-13

Now, let's make the charts a little more easy to read

Scatter Plot - relabel y-axis

ggplot(data = ts, 
       aes(x = Year, 
           y = DriversKilled)) + 
  geom_point() +
  scale_y_continuous(limits = c(50,200))

Scatter Plot - relabel y-axis

plot of chunk unnamed-chunk-15

Scatter Plot - color mapped to month

ggplot(data = ts, 
       aes(x = Year, 
           y = DriversKilled, 
           color = Month)) + 
  geom_point() 

Scatter Plot - color mapped to month

plot of chunk unnamed-chunk-17

Bar Graph - simple, transparent background

ggplot(data = ts, 
       aes(x = Year, 
           y = VanKilled)) +
  geom_bar(stat = 'identity') +
  theme(panel.background = element_blank())

Bar Graph - transparent background

plot of chunk unnamed-chunk-19

The philosophy of ggplot2 - recap

  • A ggplot2 graph is built up from a few basic elements:
    • Data: the raw data you want to plot
    • Aesthetics: including Mapping e.g., which variable is on the x-axis? the y-axis? Should the color/size/position of the plotted data that be mapped to some variable?
    • Geometries: the geometric shapes that represent the data
    • Statistics: statistical transformations that are used to summarize the data

Source: Hopper (2014)

Resources

Advanced Features

  • Text Labels
  • Facetting
  • Fitted Lines

Text Labels

ggplot(data = ts, 
       aes(x = Month, 
           y = DriversKilled)) +
        geom_text(aes(label = Year)) 

Text Labels

plot of chunk unnamed-chunk-21

Facetting

  • Facetting allows you to split up your data by one or two variables
  • facet_grid() places one or two variables in either vertical or horizontal directions
  • facet_wrap() places facets next to each other, wrapping with a certain # of rows and/or columns
facet_grid(vertical ~ horizontal)

facet_wrap(~ variable, nrow = ___, ncol = ___)

Advanced Line Graph

ggplot(data = ts, 
       aes(x = Year, 
           y = DriversKilled)) + 
  geom_line() +
  facet_wrap(~ Month)

Advanced Line Graph

plot of chunk unnamed-chunk-23

Fitted Lines

  • How do we draw a fitted line through these points?

plot of chunk unnamed-chunk-24

Fitted Lines

ggplot(data = ts, 
       aes(x = Year, 
           y = DriversKilled)) + 
  geom_point() +
  stat_smooth(method = 'lm')

Fitted Lines

plot of chunk unnamed-chunk-26

Thank You + Questions