March 10, 2016

About Me

  • Data Science and Software Engineering at Seelio
  • Have worked with R for nearly a decade
  • Use Docker extensively at Seelio

What to Expect

  • What is Docker?
  • Docker architecture
  • Key components
  • Images, containers, networks, and volumes
  • Example: Launching your first Docker container
  • Example: Running R within a Docker container
  • Example: Building your own custom image
  • Example: Using Docker Compose

Software Versions Used

  • Mac OS X El Capitan (10.11.3)
  • Docker Engine 1.10.2
  • Docker Compose 1.6.2 (must use at least docker-compose 1.6.0 to support the version 2 format of docker-compose.yml)
  • Docker Machine 0.6.0

Docker Concepts

What is Docker?

From https://www.docker.com/what-docker:

Docker containers wrap up a piece of software in a complete filesystem that
contains everything it needs to run: code, runtime, system tools, system
libraries – anything you can install on a server. This guarantees that it will
always run the same, regardless of the environment it is running in.
  • Docker daemon uses Linux-specific kernel features, which means containers only run on Linux
  • Docker daemon is run on Mac OS X and Windows via a Linux VM
  • Best practice is to have one application/service per container

Why Use Containers?

  • Same arguments as one would use for microservices: encourages you to compose functional units of software to create your final product
  • Consistent environment (package application with its configuration and dependencies together)
  • Secure (containers isolated from one another)
  • Lightweight (containers share same operating system kernel)
  • Scalable (if you make your services stateless, can easily scale up/down as needed)
  • Shareable (as you generate releases, push tagged images to your image repository on Docker Hub)
  • Develop in an environment that mimics your production environment as closely as possible

How is Docker Different From a VM?

  • VMs also create an isolated environment; what makes containers different?
  • Containers have similar resource isolation and allocation benefits as virtual machines but a different architectural approach allows them to be much more portable and efficient:

Docker Architecture

Docker Architecture

  • Docker Host is a VM on Windows and Mac OS X:

Key Components

  • Docker Machine (docker-machine): used on Windows and Mac OS X to manage the Linux VM on which the Docker daemon runs, or to create a Docker swarm
  • Docker Engine (docker): the combination of the client with which you communicate with other parts of the Docker architecture as well as the daemon that receives commands from the client and manages containers
  • Docker Hub: public Docker registry in which built images can be stored
  • Docker Compose (docker-compose): tool for defining and running multi-container Docker applications (Compose not yet supported on Windows)
  • Docker Swarm: Docker Swarm is native clustering for Docker. It turns a pool of Docker hosts into a single, virtual Docker host. Outside the scope of this talk

In a development environment, the docker client and docker daemon will usually run on the same physical system. However, this need not be true.

Docker Architecture

Images and Containers

  • Docker uses a layered filesystem. Layers are combined to form a unified filesystem; later layers hide the content of earlier layers at the same path
  • Images are pre-built, read-only templates that are used to create containers
  • Containers add a read-write layer to the filesystem
  • A Docker container holds everything that is needed for an application to run

Networks and Volumes

  • For different containers to be able to communicate with one another, they must be on the same Docker network
  • You create a network either via the network directive within your docker-compose.yml file or via the docker network create command
  • Volumes are used to store data
  • You create a volume either via the volumes directive within your docker-compose.yml file or via the docker volume create command

Applying Docker

VM Setup for Windows and Mac

Assumes Docker Engine and Docker Machine have already been installed

  1. Create the virtual machine:
    • docker-machine create --driver virtualbox default
    • On my Mac, I also found that adding --engine-storage-driver overlay to the above docker-machine create command worked better
  2. Configure your local environment to be able to communicate with the Docker Host:
    • eval $(docker-machine env)
  3. (optional) Verify that your setup is working properly
    • docker run hello-world

Demo Setup

Example: Launching your first Docker container

docker run -it --rm debian bash

  1. Downloads the latest debian Docker image
  2. Creates a container
  3. Allocates a filesystem and mounts a read-write layer
  4. Allocates a network interface
  5. Sets up an IP address
  6. Runs your command (in this case bash)

Example: Running R within a Docker container

docker run -it --rm r-base

  • r-base image available on Docker Hub
  • Maintained by Carl Boettiger and Dirk Eddelbuettel (Rcpp)
  • r-base includes base R and littler, and is primarily intended as a starting point for other images that include additional libraries and packages

Details about r-base package can be found on Docker Hub

Example: Running R within a Docker container

docker run -it --rm -v "$(pwd)":/home/docker -w /home/docker -u docker r-base bash

  • The command above will put you in a bash shell in the Docker container with your local directory available at /home/docker within the container
  • You can run R from the command line (use R interactively or batch)
  • Instead of specifying bash as the command to docker run, you could run R directly: R CMD ...
  • Can also leverage littler to run scripts instead (e.g., 'r hello.R')

Example: Running R within a Docker container

Other R images available:

  • rocker/shiny: r-base plus Shiny Server
  • rocker/r-devel: r-base plus R-devel
  • rocker/rstudio: r-base plus RStudio Server
  • rocker/hadleyverse: rstudio plus Hadley packages and LaTeX
  • rocker/rOpenSci: hadleyverse plus rOpenSci packages

Links to the rocker Github and Docker Hub accounts can be found in the Resources section at the end of the presentation

Example: Building your own custom image

Example Dockerfile:

# Builds image upon the existing rocker/hadleyverse image
FROM rocker/hadleyverse

# Example of installing a library
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        libpqxx-dev \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/ \
    && rm -rf /tmp/downloaded_packages/ /tmp/*.rds

# Example of installing additional R packages
RUN install2.r --error \
    -r "https://cran.rstudio.com" \
    RCurl

Example: Building your own custom image

Building your image: docker build -t <name>:<tag> .

Assumes Dockerfile resides in the current directory

Push image to Docker Hub via docker push (beyond the scope of this talk, see docker push --help for more info)

Example: Docker Compose

Example docker-compose.yml available on Github

  1. Execute docker-compose up -d from the diretory in which docker-compose.yml resides
  2. Populate the Postgres database service by running populate_postgres.sh script uploaded to Github as part of this presentation
  3. Launch your web browser
  4. Execute docker-machine ip to find out the IP address of your Docker host (only needed on Mac and Windows, can just use localhost on Linux)
  5. Navigate to http://docker_host, where you replace docker_host with hostname from previous step
  6. Enter rstudio/rstudio for user/pass to login to RStudio
  7. Check out all the goodness!

To shut down services, execute docker-compose down

Further Reading

Questions?