R Notebook Workflows

What We’ll Cover

  1. Productivity: Navigation and output tips
  2. Internals: How R Notebooks really work
  3. Ecosystem: Publishing, sharing, and version control

Basic familiarity with R Notebooks is presumed.

Getting Work Done

Notebook Recap

notebook demo

Notebook Recap

  • Notebooks are R Markdown documents + new interaction model
  • Code chunks are executed individually
  • Code output appears beneath the chunk and is saved with the document
  • Combines iterative approach with reproducible result

Notebook Recap

If you’ve never used them before, watch the R Notebook Webinar

webinar

Output Management

output-management

Reproducibility

  • Notebooks can be less reproducible than traditional R Markdown documents.
  • Chunks can be run in any order.
  • Chunks can access the global environment.

Simulating a Knit

restart and run

Consistency

run chunks above

How R Notebooks Really Work

Input and Output

Input and Output

  1. When you execute chunks in your notebook, the output is simultaneously displayed by RStudio and stored in a local cache in the .Rproj.user folder.
  2. When you save your notebook, the output and your R Markdown document are combined into a single, self-contained .nb.html file.

Publishing and Collaborating

Publishing

Two choices:

  1. Publish the notebook file directly (HTML)
  2. Render to a separate output format

Publishing: Notebook HTML

  • Self-contained
  • Compatible with any hosting service
  • No viewer required
  • Hydrates to full notebook (more on this later)

Publishing: Another Format

---
title: "A Monte-Carlo Analysis of Monte Carlo"
output:
  html_notebook: default
  pdf_document:
    fig_width: 9
    fig_height: 5
---

Understanding Multiple Formats

Publishing: RStudio Connect

  • One-click publish inside your org
  • Fine-grained access control
  • Execute on the server
  • Schedule executions

Collaborating: Plain Files

To: analyst@contoso.com
From: customer@northwind.com

I need some help with this analysis. Could you 
take a look at what I have so far?

Thanks,
Charles

[ Attachment: foo.nb.html ]

Collaborating: Execution in Reverse

Collaborating: Opening Code in RStudio

download R markdown

Collaborating: Opening Notebook in RStudio

save as notebook

Collaborating: Open Inside RStudio

pre open

Collaborating: Open Notebook

hydrated

Collaborating: Version Control #1

  • Add *.nb.html to your .gitignore or similar
  • Check in only the .Rmd file
  • All diffs are plain text
  • Encourages reproducibility & independent verification of results

Collaborating: Version Control #2

  • Check in the .nb.html file and the .Rmd file
  • Diffs are noisier
  • RStudio loads outputs from .nb.html if newer
  • Outputs and inputs are versioned together
  • No need to re-execute lengthy or fragile computations

Versioned Output

When a notebook is opened:

  • The local cache modified time is compared to the .nb.html
  • If .nb.html is older, it is ignored
  • If .nb.html is newer, it replaces the local cache
  • No merging or conflict management is performed!

The End

lego cat