30 September 2016

Objectives for the topic

  • Review

  • What is literate programming? Why is it important for reproducible research?

  • Introduction to Markdown

  • Introduction to R Markdown.

    • Simple webpages

    • PDF papers

    • Presentations

Class change!

4 November class -> 18 November

Reversed order of Automatic Table/Plot Generation and Statistical Modelling Topics.

Pair Assignment 1

  • Due: Friday 7 October

  • Learning objectives: develop your understanding of

    • file structures,

    • version control,

    • basic R data structures and descriptive statistics.

Pair Assignment 1

Each pair will create a new public GitHub repository

  • Must be fully documented, including with a descriptive README.md file. Your code must be human readable and clearly commented.

  • Include R source code files that:

    • Access at least two data sets from the R core data sets and/or fivethirtyeight

    • Illustrate the datas' distributions using a variety of relevant descriptive statistics

    • Two files must be dynamically linked

  • Another pair makes a pull request. And this is discussed/merged.

Review (1)

  • In R, what is the difference between a matrix and a data frame?

  • What is the assignment operator?

  • What is the component selector?

  • What two things do you need to describe the distribution of a continuous variable?

Review (2)

With a partner:

  • What is the difference between relative and absolute file paths?

  • What is a commit?

  • What does it mean to pull and push a repo?

Note

What is Literate Programming?

Donald Knuth (1992): explanation of a program using natural language interspersed with code snippets that are compilable by a computer.

This produces two representations of the program:

  • A formatted easily human readable document (e.g. a paper).

  • Source code that can be compiled by a computer.

Benefits

Creates better programs. Programmers have to explicitly state thoughts and in so doing find flaws.

Clear documentation so that others can understand and build on the program more easily.

How literate programming relates to research?

Quantitative social science is computer programming.

You are creating a program that gathers and analyses data.

You then advertise this work (a paper) in a way that is completely understandable to others.


Added benefit: allows you to automatically update documents when there are changes.

Implementing literate programming

In addition to the computer language, we need:

  1. Natural language part formatted using a markup language. Markup language: typesetting instructions. E.g. Markdown, \(\LaTeX\), HTML.

  2. A way to tangle or weave the computer language part into the natural language part.

Tangle/Weave for R

In R you can use Yihui Xie's knitr package.

  • Language dependent:

    • .Rmd .html (using Markdown)

    • .Rnw .pdf (using \(\LaTeX\))


Note to use knitr in RStudio you need go to Preferences > Sweave > Weave Rnw files using: knitr

Knitr

Two parts:

  • Natural language part written in intended markup language.

  • R code (or almost any other language on your system) written in code chunks.

Latex Example

Note that in a knitr-\(\LaTeX\) document (also known as R Sweave and has the file extension .Rnw) code chunks are delimited with << >>= and @.

PDF Output

The knitr process

More recently

Most of the focus is on RStudio's R Markdown.

  • Directly builds on knitr (Yihui works at RStudio now).

    • Code chunk syntax is almost identical to Markdown in knitr.
  • But uses Pandoc to be more output agnostic.

    • You can write in (easy) R Markdown and output to many different formats.

Markdown

Originally created by John Gruber to be an easy way to:

  • write HTML files

  • that are human readable as text files.

Markdown: HTML less painful

HTML:

<h1>A header</h1>

<p>This is some text with a <a href="http://www.example.com">link</a></p>

<p>Here is some <strong>bold</strong> text.</p>

Markdown:

# A header

This is some text with a [link](http://www.example.com).

Here is some **bold** text.

Markdown syntax: Headers

# Header 1

## Header 2

### Header 3

And so on.

Markdown syntax

Horizontal lines:

---

Bold text:

**bold**

Italics:

*italics*

Markdown syntax

Links:

[link](http://www.example.com)

Images:

![text description](FILE/PATH.png)

Markdown syntax

Unordered Lists:

- An item

- An item

- An item

Ordered Lists:

1. Item one

2. Item two

3. Item three

Tables

| Name   | Something |
| ------ | --------- |
| Stuff  | Things    |
| Things | Stuff     |


Name Something
Stuff Things
Things Stuff

Math

R Markdown from RStudio supports MathJax. So, you can write any \(\LaTeX\) math with R Markdown.

Inline equations have one dollar sign $s^2 =
\frac{\sum(x - \bar{x})^2}{n - 1}$.

Inline equations have one dollar sign \(s^2 = \frac{\sum(x - \bar{x})^2}{n - 1}\).

Display equations have two dollar signs:

$$s^2 = \frac{\sum(x - \bar{x})^2}{n - 1}$$

\[s^2 = \frac{\sum(x - \bar{x})^2}{n - 1}\]

Expansion

You can include any HTML syntax in a Markdown document. You can also change the formatting by adding a custom CSS file (just like a website).

However, this will only render in HTML output.

If you are using \(\LaTeX\) (other than math syntax), you can also include \(\LaTeX\) syntax in your RMarkdown document for rendering as a PDF.

Code chunks Inline

To use syntax highlighting on code chunks inline with the text, surround your text with ``

Knitable inline chunks with a back-tick then r.

For example:

Two plus two equals `r 2 + 2`.

Produces:

Two plus two equals 4.

Code chunks in Display

Use three ticks (```) to start and end a code chunk that is not run.

Create a knit-able code chunk begin the chunk with ```{r}.

Automatic table generation (basics)

You can turn any matrix or data frame into a well formatted table with the knitr function kable.

knitr::kable(mtcars)

Make sure that the code chunk option results='asis'.

Output Web

Output PDF or Word

This R Markdown file can be compiled to PDF (via \(\LaTeX\)) or MS Word with RStudio.

choose output

Output PDF

Output Word

Chunk options

Change how R Markdown chunks behave with options. Place options in the chunk head: ```{r echo=FALSE, error=FALSE}

Option What it Does
echo=FALSE Does not print the code only the output
error=FALSE Does not print errors
include=FALSE Does not include the code or output, but does run the code
fig.width Sets figure width
cache=TRUE Cache the chunk. It is only run when the contents change.

Many others at http://yihui.name/knitr/options

RMarkdown Presentations

These lecture slides are created using R Markdown.

presentation select

RMarkdown Presentations

All of the syntax is the same, except:

## Does not mean Header 2. It is creates a new slide and title.

You can create a slide with no title using ---.

R Markdown Header

The header lets you make changes to the whole document.

This presentation's head is:

---
title: 'MPP-E1180 Lecture 4: Intro to Markup Lang. & Literate Programming (1)'
author: "Christopher Gandrud"
date: "22 February 2016"
output:
  ioslides_presentation:
    css: https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css
    logo: https://raw.githubusercontent.com/christophergandrud/Hertie_Collab_Data_Science/master/img/HertieCollaborativeDataLogo_v1.png
  beamer_presentation: default
---

We'll look at headers more next lecture

More reference

Seminar: Create R Markdown Documents

Convert what work you have done on your Pair Assignment 1 to R Markdown and output to multiple formats.

  • Add relevant equations.

Create a basic R Markdown presentation.

Seminar: Collaborative Research Project



Begin trying to find a partner for the Collaborative Research Project.

Discuss topics you might be interested in researching.