Collaborative Research Project
Next Class
Review
Static maps with ggmap
Dynamic results presentation
- Static website hosting with gh-pages
18 April 2016
Collaborative Research Project
Next Class
Review
Static maps with ggmap
Dynamic results presentation
First two hours as usual.
Then 14:00-15:00.
Purposes: Pose an interesting research question and try to answer it using data analysis and standard academic practices. Effectively communicate your results to a variety of audiences in a variety of formats.
Deadline:
Presentation: In-class Monday 2 May
Website/Paper: 13 May 2016
The project can be thought of as a 'dry run' for your thesis with multiple presentation outputs.
Presentation: 10 minutes maximum. Engagingly present your research question and key findings to a general academic audience (fellow students).
Paper: 5,000 words maximum. Standard academic paper, properly cited laying out your research question, literature review, data, methods, and findings.
Website: An engaging website designed to convey your research to a general audience.
Project total: 50% of your final mark.
10% presentation
10% website
30% paper
As always, you should submit one GitHub repository with all of the materials needed to completely reproduce your data gathering, analysis, and presentation documents.
Note: Because you've had two assignments already to work on parts of the project, I expect high quality work.
Find one other group to be a discussant for your presentation.
The discussants will provide a quick (max 2 minute) critique of your presentation–ideas for things you can improve on your paper–pose questions.
I will have normal office hours every week for the rest of the term.
Please take advantages of this opportunity to improve your final project.
Be prepared.
What is the data-ink ratio? Why is it important for effective plotting.
Why should you avoid using the size of circles to have meaning about continuous variables?
Why not use red-green colour contrasts to indicate contrasting data?
How many decimal places should you report in a table and why?
Last class we didn't have time to cover mapping with ggmap.
We've already seen how ggmap can be used to find latitude and longitude.
library(ggmap) places <- c('Bavaria', 'Seoul', '6 Pariser Platz, Berlin') geocode(places)
## lon lat ## 1 11.49789 48.79045 ## 2 126.97797 37.56654 ## 3 13.37854 52.51701
qmap(location = 'Berlin', zoom = 15)
Example from: Kahle and Wickham (2013)
Use crime data set that comes with ggmap
names(crime)
## [1] "time" "date" "hour" "premise" "offense" "beat" ## [7] "block" "street" "type" "suffix" "number" "month" ## [13] "day" "location" "address" "lon" "lat"
# find a reasonable spatial extent qmap('houston', zoom = 13) # gglocator(2) see in RStudio
# only violent crimes violent_crimes <- subset(crime, offense != "auto theft" & offense != "theft" & offense != "burglary") # order violent crimes violent_crimes$offense <- factor(violent_crimes$offense, levels = c("robbery", "aggravated assault", "rape", "murder")) # restrict to downtown violent_crimes <- subset(violent_crimes, -95.39681 <= lon & lon <= -95.34188 & 29.73631 <= lat & lat <= 29.78400)
# Set up base map HoustonMap <- qmap("houston", zoom = 14, source = "stamen", maptype = "toner") # Add points FinalMap <- HoustonMap + geom_point(aes(x = lon, y = lat, colour = offense), data = violent_crimes) + xlab('') + ylab('') + theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank()) + guides(size = guide_legend(title = 'Offense'), colour = guide_legend(title = 'Offense'))
print(FinalMap)
When your output documents are in HTML, you can create interactive visualisations.
Potentially more engaging and could let users explore data on their own.
Big distinction:
Client Side: Plots are created on the user's (client's) computer. Often JavaScript in the browser. You simply send them static HTML/JavaScript needed for their browser to create the plots.
Server Side: Data manipulations and/or plots (e.g. with Shiny Server) are done on a server in R. Browsers don't come with R built in.
There are lots of free services (e.g. GitHub Pages) for hosting webpages for client side plot rendering.
You usually have to use a paid service for server side data manipulation plotting.
You can use R to (relatively) easily create server side web applications with R.
To do this use Shiny.
We are not going to cover Shiny in the class as it usually requires a paid service to host.
You already know how to create HTML documents with R Markdown.
results='asis'
in code chunk head (not needed for some packages).
There is a growing set of tools for interactive plotting, e.g.:
These packages simply create an interface between R and (usually) JavaScript.
Debugging often requires some knowledge of JavaScript and the DOM.
In sum: usually simple, but can be mysteriously difficult without a good knowledge of JavaScript/HTML.
The plotly package allows you to convert (most) ggplot2 plots to JavaScript.
Simply create your ggplot2 object, then pass it to ggplotly
.
Using an example from last class:
mort_plot <- ggplot(data = MortalityGDP, aes(x = InfantMortality, y = GDPperCapita)) + geom_point()
Then . . .
library(plotly) ggplotly(mort_plot)
plot_ly(MortalityGDP, x = InfantMortality, y = GDPperCapita, mode = 'markers')
ggplotly
works with simGLM# sim_gpa created with http://hertiedatascience.github.io/Examples/ ggplotly(sim_gpa)
The googleVis package can create Google plots from R.
# Create fake data fake_compare <- data.frame( country = c('2010', '2011', '2012'), US = c(10,13,14), GB = c(23,12,32))
(Example modified from googleVis Vignettes.)
library(googleVis) line_plot <- gvisLineChart(fake_compare) print(line_plot, tag = 'chart')
Note: To show in interactive R use plot
instead of print
and don't include tag = 'chart'
.
library(WDI) co2 <- WDI(indicator = 'EN.ATM.CO2E.PC', start = 2010, end = 2010) co2 <- co2[, c('iso2c','EN.ATM.CO2E.PC')] # Clean names(co2) <- c('iso2c', 'CO2 Emissions per Capita') co2[, 2] <- round(log(co2[, 2]), digits = 2) # Plot co2_map <- gvisGeoChart(co2, locationvar = 'iso2c', colorvar = 'CO2 Emissions per Capita', options = list( colors = "['#fff7bc', '#d95f0e']" ))
CO2 Emissions (metric tons per capita)
print(co2_map, tag = 'chart')
More examples are available at: http://HertieDataScience.github.io/Examples/
Any HTML file called index.html in a GitHub repository branch called gh-pages will become a hosted website.
The URL will be:
http://GITHUB_USER_NAME.github.io/REPO_NAME
Note: you can use a custom URL if you own one. See https://help.github.com/articles/setting-up-a-custom-domain-with-github-pages/
First create a new branch in your repository called gh-pages
:
Then sync your branch with the local version of the repository.
Finally switch to the gh-pages branch.
You can use R Markdown to create the `index.html page.
Simply place a new .Rmd file in the repository called index.Rmd and knit it to HTML. Then sync it.
Your website will now be live.
Every time you push to the gh-pages branch, the website will be updated.
Note branches in git repositories can have totally different files from one another.
Example: networkD3
You can create interactive 'dashboards' for displaying an information overview using the flexdashboard package.
The package is not on CRAN yet, so install with:
devtools::install_github("rstudio/flexdashboard")
flexdashboard builds on R Markdown.
To set a .Rmd
file as a flexdasboard, in the header use:
output: flexdashboard::flex_dashboard
Each element of the dashboard is delimited with the Markdown third level header: ###
.
You can create different columns and rows with:
Column -------------------------------------
Row -------------------------------------
A minimal code example is available at: https://raw.githubusercontent.com/HertieDataScience/flexdashboard_example/gh-pages/index.Rmd
The output is at: http://hertiedatascience.github.io/flexdashboard_example/
You can host these on Github pages as before.
Then can also be integrated with shiny.
Begin to create a website for your project with RMarkdown and graphics (either static or interactive).
If relevant include:
A table of key results
A googleVis map
A bar or line chart with plotly or other package
A simulation plot created with Zelig, simGLM or other tool showing key results from your analysis.
Push to the gh-pages branch.