Review
Static results presentation
Automatic table creation
Plotting best practices
ggplot2 for general graphing
21 October 2016
Review
Static results presentation
Automatic table creation
Plotting best practices
ggplot2 for general graphing
Today we will learn how to communicate your research findings with automatically generated tables and static plots.
Why automatically generate?
Saves time: don't have to re-enter numbers by hand into a table or restyle a graph each time you change the data/analysis.
Easier to find and correct errors: all source code that created all tables and figures is linked and output updated when corrections are made.
More reproducible: everything is clearly linked together.
In general include the functions to create the tables/figures in a code chunk.
Include in the code chunk head echo=FALSE, warning=FALSE, error=FALSE, message=FALSE
.
You may need to also include results='asis'
for some table functions.
See previous weeks 4 and 5 for figure code chunk options.
There are a number of tools for automatically generating tables in R/R Markdown.
We will focus on kable
and stargazer
.
kable
is a good, simple tool for creating tables from data frames (or matrices).
stargazer
is useful for creating more complex tables of regression model output.
Example Docs: HertieDataScience/Examples/PaperWithRegressionTables
kable
example: predicted probabilitiesSet up (from Lecture 9):
# Load data URL <- 'http://www.ats.ucla.edu/stat/data/binary.csv' Admission <- read.csv(URL) # Estimate model Logit1 <- glm(admit ~ gre + gpa + as.factor(rank), data = Admission, family = 'binomial') # Create fitted data fitted <- with(Admission, data.frame(gre = mean(gre), gpa = mean(gpa), rank = factor(1:4)))
kable
example: predicted probabilitieslibrary(knitr) fitted$predicted <- predict(Logit1, newdata = fitted, type = 'response') kable(fitted)
gre | gpa | rank | predicted |
---|---|---|---|
587.7 | 3.3899 | 1 | 0.5166016 |
587.7 | 3.3899 | 2 | 0.3522846 |
587.7 | 3.3899 | 3 | 0.2186120 |
587.7 | 3.3899 | 4 | 0.1846684 |
kable
example: predicted probabilitiesYou can stylise the table.
kable(fitted, align = 'c', digits = 2, caption = 'Predicted Probabilities for Fitted Values')
gre | gpa | rank | predicted |
---|---|---|---|
587.7 | 3.39 | 1 | 0.52 |
587.7 | 3.39 | 2 | 0.35 |
587.7 | 3.39 | 3 | 0.22 |
587.7 | 3.39 | 4 | 0.18 |
Don't show more digits to the right of the decimal than are statistically and substantively meaningful.
A rule of thumb: more than one or two digits are rarely meaningful.
See also: http://andrewgelman.com/2012/07/02/moving-beyond-hopeless-graphics/
stargazer
kable
is limited if we want to create regression output tables, especially for multiple models.
stargazer
is good for this.
stargazer
exampleEstimate models
L1 <- glm(admit ~ gre, data = Admission, family = 'binomial') L2 <- glm(admit ~ gre + gpa, data = Admission, family = 'binomial') L3 <- glm(admit ~ gre + gpa + as.factor(rank), data = Admission, family = 'binomial')
stargazer
example HTMLWhen you are creating a table for an HTML doc with stargazer use:
type = 'html'
# Create cleaner covariate labels labels <- c('GRE Score', 'GPA Score', '2nd Ranked School', '3rd Ranked School', '4th Ranked School', '(Intercept)') stargazer::stargazer(L1, L2, L3, covariate.labels = labels, title = 'Logistic Regression Estimates of Grad School Acceptance', digits = 2, type = 'html')
stargazer
example HTMLstargazer
example PDFWhen you are creating a PDF use the arguments:
type = 'latex'
header = FALSE
# Create cleaner covariate labels labels <- c('GRE Score', 'GPA Score', '2nd Ranked School', '3rd Ranked School', '4th Ranked School', '(Intercept)') stargazer::stargazer(L1, L2, L3, covariate.labels = labels, title = 'Logistic Regression Estimates of Grad School Acceptance', digits = 2, type = 'latex', header = FALSE)
stargazer
output in PDFstargazer
plain text outputYou may want to compare multiple models at once in your R console, use stargazer
with type = 'text'
:
stargazer(L1, L2, L3, type = 'text')
Tables are important to include so that readers can explore details, but are usually not the best way to show your results.
Figures are often more effective.
(A Selection of) Tufte's Principles for Excellent Statistical Graphics (2001, 13):
Show the data
Encourage the eye to compare differences in the data
Serve a clear purpose
Avoid distorting the data
Be closely integrated with the text
Show the data, not other things like silly graphics or unnecessary words.
Have a high data ink ratio:
\[ \mathrm{Data\:Ink\:Ratio} = \mathrm{\frac{data - ink}{total\:ink}} \]
How did the budgets change? (Orange is 2013, Blue is 2012)
In general: Avoid using the size of a circle to mean something!
So, avoid:
bubble charts
pie charts
Circles can distort data.
It is difficult to compare their size.
The Ebbinghause Illusion!
Order the circles from smallest to largest.
The circles are on a scale of 0-100, so what are their values?
Which circle is bigger?
Which square is darkest?
Only give graphical features (e.g. bars in a bar chart) different colours if it means something in the data.
Colours should be used to:
highlight particular data,
group items,
encode quantitative values
Values of continuous variables should be represented using increasing hues of the same colour.
Categorical variables should be represented with different colours. (rule of thumb: avoid using more than about 7 colours in a plot)
Color Blindness
People who are colour blind can have difficulty distinguishing between red-green and blue-yellow.
About 5-8% of men are colour blind.
We need to choose colour schemes for our graphics that are colour blind friendly.
For more information see http://www.usability.gov/get-involved/blog/2010/02/color-blindness.html.
Color Brewer is a great resource for selecting colours: http://colorbrewer2.org/.
"gg" means "Grammar of Graphics".
"2" just means that it is the second one.
Each plot is made of layers. Layers include the coordinate system (x-y), points, labels, etc.
Each layer has aesthetics (aes
) including the x & y, size, shape, and colour.
The main layer types are called geometrics (geom
). These include lines, points, density plots, bars, and text.
library(devtools) library(ggplot2) source_url("http://bit.ly/OTWEGS") # Create data with no missing values of infant mortality InfantNoMiss <- subset(MortalityGDP, !is.na(InfantMortality)) # Create High/Low Income Variable InfantNoMiss$DumMort[InfantNoMiss$InfantMortality >= 15] <- "high" InfantNoMiss$DumMort[InfantNoMiss$InfantMortality < 15] <- "low"
ggplot(data = MortalityGDP, aes(x = InfantMortality, y = GDPperCapita)) + geom_point()
ggplot(data = MortalityGDP, aes(x = InfantMortality, y = GDPperCapita)) + geom_point() + theme_bw(base_size = 13)
There are a number of ways to specify colours in ggplot2.
The simplest way is to let ggplot choose the colours for you.
ggplot(data = InfantNoMiss, aes(log(InfantMortality), log(GDPperCapita))) + geom_point(aes(colour = income)) + theme_bw()
There are many ways to pick specific colors.
In this class we will mainly use hexadecimal colours.
This is probably the most commonly used system for choosing colours on the web.
Every colour is given six digits.
A good website for getting hexadecimal colour schemes is: http://colorbrewer2.org/.
# Create colour vector Colours <- c("#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E", "#E6AB02") # Create graph ggplot(data = InfantNoMiss, aes(log(InfantMortality), log(GDPperCapita))) + geom_point(aes(colour = income)) + scale_color_manual(values = Colours) + xlab("\nLog Infant Mortality") + ylab("Log GDP/Capita\n") + ggtitle("Log Transformed Data\n") + theme_bw()
# Create a violin Plot ggplot(InfantNoMiss, aes(factor(DumMort), log(GDPperCapita))) + geom_violin(fill = "#E7298A", colour = "#E7298A", alpha = I(0.5)) + geom_jitter(color = "#7570B3") + xlab("\n Infant Mortality") + ylab("Log GDP Per Capital\n") + theme_bw(base_size = 16)
Create tables and visualisations of descriptive statistics for your final project in an R Markdown document using the techniques covered in class.
Tufte, Edward R. 2001. The Visual Display of Quantitative Information. 2nd ed. Cheshire, CT: Graphics Press.