class: center, middle, inverse, title-slide # Intro to data visualisation ### Jakub Nowosad
nowosad.jakub@gmail.com
### 2017-04-26 --- ## Data visualisation in R - R has several systems for making graphics - **base R**, the **lattice** package, and the **ggplot2** package are among the most popular ones ![](Intro_to_data_visualisation_files/figure-html/unnamed-chunk-1-1.png)<!-- -->![](Intro_to_data_visualisation_files/figure-html/unnamed-chunk-1-2.png)<!-- -->![](Intro_to_data_visualisation_files/figure-html/unnamed-chunk-1-3.png)<!-- --> --- ## **ggplot2** - basic information - **ggplot2** is an implementation of the grammar of graphics (["Grammar of Graphics"](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448) book written by Leland Wilkinson) - data frames are (most of the times) an input data - the **ggplot2** package has two functions for plot creation - `qplot()` and `ggplot()` - **ggplot2** documentation is available at http://docs.ggplot2.org ```r install.packages("ggplot2") ``` ```r library('ggplot2') ``` --- ## **ggplot2** - the first plot ```r df <- data.frame(A = c(1:10), B = seq(22, 4, by= -2)) ggplot(data = df, aes(x = A, y = B)) + geom_point() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## ggplot2 - basic dictionary - `geom` - type of a plot ("histogram", "boxplot", "point", etc.) - aesthetics (`aes`) - visual properties of geoms, such as color, size, shape, ... - faceting (`facet`) - panels with different subsets of the data --- ## Dataset ```r install.packages('gapminder') library('gapminder') ``` or ```r gapminder <- readRDS('data/gapminder.rds') ``` http://www.gapminder.org/data/ http://github.com/jennybc/gapminder http://www.youtube.com/watch?v=jbkSRLYSojo ```r gapminder2007 <- subset(gapminder, year==2007) ``` --- ## Histogram ```r ggplot(data=gapminder2007, aes(x=gdpPercap)) + geom_histogram() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- ## Histogram .pull-left[ ```r ggplot(data=gapminder2007, aes(x=gdpPercap)) + geom_histogram(binwidth=10000) ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-9-1.png" width="400px" /> ] .pull-right[ ```r ggplot(data=gapminder2007, aes(x=gdpPercap)) + geom_histogram(binwidth=800) ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-10-1.png" width="400px" /> ] --- ## Bar plot There are two types of bar plots in **ggplot2**: - `geom_bar()` makes the height of the bar proportional to the number of cases in each group - `geom_col()` makes the height of the bars to represent values in the data ```r ggplot(data=gapminder2007, aes(x=continent)) + geom_bar() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- ## Bar plot There are two types of bar plots in **ggplot2**: - `geom_bar()` makes the height of the bar proportional to the number of cases in each group - `geom_col()` makes the height of the bars to represent values in the data ```r library('dplyr') gapminder2007_2 <- gapminder2007 %>% group_by(continent) %>% summarise(number=n()) ggplot(data=gapminder2007_2, aes(x=continent, y=number)) + geom_col() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ## Line plot ```r gapminder2 <- gapminder %>% group_by(year) %>% summarise(mean.lifeExp=mean(lifeExp, na.rm=TRUE)) ggplot(data=gapminder2, aes(x=year, y=mean.lifeExp)) + geom_line() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## Scatter plot ```r ggplot(data=gapminder2007, aes(x=gdpPercap, y=lifeExp)) + geom_point() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Box plot ```r ggplot(data=gapminder2007, aes(x=continent, y=lifeExp)) + geom_boxplot() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- ## Aesthetic attributes for quantitative variables (1) ```r ggplot(data=gapminder2007, aes(x=gdpPercap, y=lifeExp, color=pop)) + geom_point() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ## Aesthetic attributes for quantitative variables (2) ```r ggplot(data=gapminder2007, aes(x=gdpPercap, y=lifeExp, size=pop)) + geom_point() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- ## Aesthetic attributes for qualitative variables (1) ```r ggplot(data=gapminder2007, aes(x=gdpPercap, y=lifeExp, color=continent)) + geom_point() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ## Aesthetic attributes for qualitative variables (2) ```r ggplot(data=gapminder2007, aes(x=gdpPercap, y=lifeExp, shape=continent)) + geom_point() + scale_shape(solid = FALSE) ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- ## Scales ```r ggplot(data=gapminder2007, aes(x=gdpPercap, y=lifeExp, color=pop)) + geom_point()+ scale_colour_gradientn(colours=rainbow(5)) ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## Data transformations ```r ggplot(data=gapminder2007, aes(x=gdpPercap, y=lifeExp, color=log10(pop))) + geom_point() ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- ## Facets - Two functions, `facet_grid()` and `facet_wrap()` can be used to plot the subsets of data together ```r ggplot(data=gapminder2007, aes(x=lifeExp)) + geom_histogram() + facet_wrap(~continent) ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- ## Plot customizing - labs ```r p <- ggplot(data=gapminder2007, aes(x=lifeExp, y=gdpPercap, color=continent)) + geom_point() p <- p + labs(x='Life expectancy', y=NULL, title='GDP vs Life expectancy', color='Continent: ') p ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- ## Plot customizing - themes ```r p <- p + theme_bw() p ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> --- ## Saving plots ```r p <- ggplot(data=gapminder2007, aes(x=lifeExp, y=gdpPercap, color=continent)) + geom_point() + labs(x='Life expectancy', y=NULL, title='GDP vs Life expectancy', color='Continent: ') + theme_bw() p ``` <img src="Intro_to_data_visualisation_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ```r ggsave(filename = "Plot.pdf", plot = p) ``` ```r ggsave(filename = "Plot.png", plot = p, dpi = 300) ``` --- ## That's not all (folks) - More geoms: - `geom_density()` - `geom_jitter()` - `geom_text()` - ... - Axes and legends customizing - Additional packages (**ggplot2** extensions) - http://www.ggplot2-exts.org/ - Spatial data visualisation - ... and many more --- ## Interactive plots - R has many interfaces to interactive visualization libraries, such as `plotly`, `dygraphs`, `rCharts`, `googleVis`, etc. - What's even better is that R allows for building interactive web applications using the `shiny` package --- ## `plotly` ```r # devtools::install_github("ropensci/plotly") library('plotly') ggplotly(p=p) ```
--- ## `dygraphs` ```r library('dygraphs') library('tidyr') gapminder3 <- gapminder %>% group_by(year, continent) %>% summarise(mean.lifeExp=mean(lifeExp, na.rm=TRUE)) %>% spread(continent, mean.lifeExp) dygraph(gapminder3) %>% dyRangeSelector() ```
--- ## Resources: - [ggplot2](http://ggplot2.tidyverse.org/) - an official website of the `ggplot2` package - [An introduction to ggplot2](https://rawgit.com/eco-data-science/VisualizingData/master/ggplot2_intro.html) - a very good introduction to `ggplot2`; more information about how to deal with colors - [Cookbook for R >> Graphs](http://www.cookbook-r.com/Graphs/) - a clear introduction to the most useful functions of `ggplot2` - [Data Visualization with ggplot2 Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2015/05/ggplot2-cheatsheet.pdf) - an official `ggplot2` cheatsheet - [Beautiful plotting in R: A ggplot2 cheatsheet](http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/) - a ton of examples how to customize `ggplot2` - [R Graph Catalog](http://shiny.stat.ubc.ca/r-graph-catalog/) - a catalog of good and bad plots - [A Compendium of Clean Graphs in R](http://shinyapps.org/apps/RGraphCompendium/index.php) - examples of informative plots - [Great examples of ggplot2 plots](https://rud.is/b/category/ggplot) - a lot of them... - [Interactive visualizations with R - a minireview](http://ouzor.github.io/blog/2014/11/21/interactive-visualizations.html) - an overview of R interactive capabilities - [plotly for R](https://cpsievert.github.io/plotly_book/) - an open online book about the `plotly` package - [dygraphs for R](https://rstudio.github.io/dygraphs/) - an official documentation of the `dygraphs` package - [Shiny by RStudio](http://shiny.rstudio.com/) - an official documentation of the `shiny` package