Advanced topics in markup languages and literate programming.
R Markdown Headers
Footnotes
BibTeX/Pandoc and citing R Packages
Time consuming analyses: caching and Make files
22 February 2016
Advanced topics in markup languages and literate programming.
R Markdown Headers
Footnotes
BibTeX/Pandoc and citing R Packages
Time consuming analyses: caching and Make files
Proposal for your Collaborative Research Project.
Deadline: 25 March
Submbit: A (max) 2,000 word proposal created with R Markdown. The proposal will:
State your research question. And justify why it is interesting.
Provide a basic literature review (properly cited with BibTeX).
Identify data sources and appropriate research methodologies for answering your question.
As always, submit the entire GitHub repo.
Definitely see me with your ideas/draft.
Start thinking about types of statistical models that you want to use. I can include these in Lecture 8 (Statistical Modeling with R).
An example of a paper + analysis + data project using many of the tools we cover today is available at:
An R Markdown file is just a text file with markup instructions that RStudio understands.
The key to document-consistent formatting is the header.
It is at the start of a file and comes between ---
.
The header is written in YAML.
YAML is a human read-able data format.
Elements are separated from values with a colon (:
).
Each element is separated by new lines.
Hierarchy is maintained with tabs.
--- title: 'MPP-E1180 Lecture 5' author: "Christopher Gandrud" date: "22 February 2016" output: ioslides_presentation: css: css/font-awesome.min.css logo: img/logo.png beamer_presentation: default ---
YAML is a recursive acronym: ''YAML Ain't Markup Language''.
By defualt, R Markdown uses the ioslides HTML presentation slides style.
You can also use reveal.js.
First install the revealjs R package:
devtools::install_github("jjallaire/revealjs")
Then in the YAML header use:
output: revealjs::revealjs_presentation
For further styling see https://github.com/jjallaire/revealjs
You can add a table of contents and numbered sections to your PDF output:
output: pdf_document: toc: true number_sections: true fig_captions: true
To do the same for HTML also include the information under html_document
.
Create consistent figure formatting:
output: pdf_document: fig_width: 7 fig_height: 6 fig_caption: true
fig_caption: true
attaches captions to figures.
To set the actual caption label, use the fig.cap='SOME CAPTION'
code chunk option.
R Markdown can use Pandoc footnotes.
In-text: In the text place a unique footnote key in the format:
[^KEY]
At the end of your document put the full footnote starting with the key, e.g.
[^KEY]: This is a footnote.
BibTex allows you to create a database of all of the literature/packages you cite.
You can then insert them into your text and they will:
Be automatically formatted consistently.
Generate an appropriately ordered, consistently formatted reference list at the end of your document with only the works you actually cited.
A BibTeX database is just a text file with the extension .bib
.
Each entry follows a standard format depending on the type of media.
@DOCUMENT_TYPE{CITE_KEY, title = {TITLE}, author = {AUTHOR}, . . . = {. . .}, }
Note: Commas are very important!
The cite key links a specific citation in your presentation document to a specific BibTeX database entry.
They must be unique.
It does not matter what order your BibTeX entries are in the .bib
file.
@article{Acemoglu2000, author = {Acemoglu, Daron and Robinson, James A.}, title = {Why Did the West Extend the Franchise? Democracy, Inequality, and Growth in Historical Perspective}, journal = {The Quarterly Journal of Economics}, year = {2000}, volume = {115}, number = {4}, pages = {1167--1199}, }
@book{Cox1997, title={Making Votes Count: Strategic Coordination in the World's Electoral Systems}, author={Gary W. Cox}, year={1997}, volume = {7}, publisher={Cambridge University Press}, address = {Cambridge} }
For more media types and entry fields see http://en.wikipedia.org/wiki/BibTeX.
Google scholar generates BibTeX entries.
On an entry click Cite > BibTeX
.
For a YouTube how-to see https://www.youtube.com/watch?v=SsJSR2b4_qc.
Sometimes they need to be cleaned a little.
To link your .bib
file to your RMarkdown document add to the header:
bibliography: - BIB_FILE_NAME.bib - ANOTHER_BIB_FILE_NAME.bib
Note: The files should be in the same directory as your R Markdown file.
R Markdown uses Pandoc syntax to include a citation in-text.
General format: @CITE_KEY
.
So if the cite key is Box1973
then @Box1973
will return Box and Tiao (1973) in the text of the presentation document.
Markup | Result |
---|---|
[@Box1973] |
(Box and Tiao 1973) |
[see @Box1973] |
(see Box and Tiao 1973) |
[see @Box1973, 33-40] |
(see Box and Tiao 1973, 33–40) |
[@Box1973; @Acemoglu2000] |
(Box and Tiao 1973; Acemoglu and Robinson 2000) |
@Box1973 [33-40] |
Box and Tiao (1973, 33–40) |
A reference list with the full bibliographic details of all cited documents will be automatically created at the end of your document.
Tip: Put # References
at the very end of your R Markdown document to have a section heading before the reference list.
Why cite?
Give credit to the software authors (just like when citing literature).
Enable reproducible research: identify which software you used and which version.
Base R way: print citation, copy BibTeX entry into your .bib file.
Cite R:
toBibtex(citation())
## @Manual{, ## title = {R: A Language and Environment for Statistical Computing}, ## author = {{R Core Team}}, ## organization = {R Foundation for Statistical Computing}, ## address = {Vienna, Austria}, ## year = {2015}, ## url = {https://www.R-project.org/}, ## }
Cite R Packages:
toBibtex(citation('dplyr'))
## @Manual{, ## title = {dplyr: A Grammar of Data Manipulation}, ## author = {Hadley Wickham and Romain Francois}, ## year = {2015}, ## note = {R package version 0.4.3}, ## url = {https://CRAN.R-project.org/package=dplyr}, ## }
The dynamic literate programming way: Use LoadandCite
from the repmis package.
Load all of the packages at the beginning of you R Markdown file in a chunk with include=FALSE
.
LoadandCite
loads the packages and creates a BibTeX file with all of the citations.
pkgs <- c('dplyr', 'ggplot2') repmis::LoadandCite(pkgs, file = 'RpackageCitations.bib')
Note: Use a file name that is different from your literature BibTeX file!
Include the .bib file in your RMarkdown header.
Each cite key follows: R-PKG_NAME
.
R itself has the key CiteR
.
So @R-dplyr
and @CiteR
create the citations:
Wickham and Francois (2015)
R Core Team (2015)
Knitting your analysis and presentation documents together by placing all of your R code into code chunks can sometimes be problematic:
When they are time consuming: requires a lot of computational time.
When they access files over the internet: bad practice to make many repeated calls to the same URL, can crash the site. This is equivalent to a denial-of-service attack.
When they are many lines long.
Long lines: use source()
to run R code in other files.
Caching for time/computationally intensive work: cache=TRUE
code chunk option: only runs the chunk when the chunk code changes.
Make files are the ultimate solution to these problems.
Make is a command line program.
Big Idea: run a make file that runs a list of specific files in order.
Files are only run if they have been changed since the last time the make file was last run.
See Ch. 6 of RRRR if you might want to do this.
Clone the HertieDataScience/Examples repo and play around with SimplePaperWithAnalysis.
Begin working with your partner on your research proposal.
Identify the research area and key literature.
Create a new repo and R Markdown document for your proposal.
Begin building a BibTeX database for your key literature and try including them in your proposal.
Begin identifying data sources.
Acemoglu, Daron, and James A. Robinson. 2000. “Why Did the West Extend the Franchise? Democracy, Inequality, and Growth in Historical Perspective.” The Quarterly Journal of Economics 115 (4): 1167–99.
Box, G. E. P., and G. C. Tiao. 1973. Bayesian Inference in Statistical Analysis. New York: Wiley Classics.
R Core Team. 2015. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Wickham, Hadley, and Romain Francois. 2015. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.