George Vega Yon
Garrett Weaver
vegayon@usc.edu
gmweaver@usc.edu
USC Integrative Methods of Analysis for Genomic Epidemiology (IMAGE)
Department of Preventive Medicine
March 29, 2018
High-Performance Computing: An overview
Parallel computing in R
Examples:
Exercises
Loosely, from R’s perspective, we can think of HPC in terms of two, maybe three things:
Big data: How to work with data that doesn’t fit your computer
Parallel computing: How to take advantage of multiple core systems
Compiled code: Write your own low-level code (if R doesn’t has it yet…)
(Checkout CRAN Task View on HPC)
Buy a bigger computer/RAM memory (not the best solution!)
Use out-of-memory storage, i.e., don’t load all your data in the RAM. e.g. The bigmemory, data.table, HadoopStreaming R packages
Store it more efficiently, e.g.: Sparse Matrices (take a look at the dgCMatrix
objects from the Matrix R package)
Why are we still using CPUs instead of GPUs?
GPUs have far more processor cores than CPUs, but because each GPU core runs significantly slower than a CPU core and do not have the features needed for modern operating systems, they are not appropriate for performing most of the processing in everyday computing. They are most suited to compute-intensive operations such as video processing and physics simulations. (bwDraco at superuser)
Why use OpenMP if GPU is suited to compute-intensive operations? Well, mostly because OpenMP is VERY easy to implement (easier than CUDA, which is the easiest way to use GPU).
While there are several alternatives (just take a look at the High-Performance Computing Task View), we’ll focus on the following R-packages/tools for explicit parallelism:
R packages
parallel: R package that provides ‘[s]upport for parallel computation, including random-number generation’.
foreach: ‘[A] new looping construct for executing R code repeatedly […] that supports parallel execution.’
iterators: ‘tools for iterating over various R data structures’
RcppArmadillo + OpenMP
RcppArmadillo: ‘Armadillo is a C++ linear algebra library, aiming towards a good balance between speed and ease of use.’ ‘[RcppArmadillo] brings the power of Armadillo to R.’
OpenMP: ‘Open Multi-Processing is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms, processor architectures and operating systems, including Solaris, AIX, HP-UX, Linux, macOS, and Windows.’ (Wiki)
Implicit parallelism, on the other hand, are out-of-the-box tools that allow the programmer not to worry about parallelization, e.g. such as gpuR for Matrix manipulation using GPU.
Create a cluster:
PSOCK Cluster: makePSOCKCluster
: Creates brand new R Sessions (so nothing is inherited from the master), even in other computers!
Fork Cluster: makeForkCluster
: Using OS Forking, copies the current R session locally (so everything is inherited from the master up to that point). Not available on Windows.
Other: makeCluster
passed to snow
Copy/prepare each R session:
Copy objects with clusterExport
Pass expressions with clusterEvalQ
Set a seed
Do your call:
mclapply
, mcmapply
if you are using Fork
parApply
, parLapply
, etc. if you are using PSOCK
Stop the cluster with clusterStop
# 1. CREATING A CLUSTER
library(parallel)
cl <- makePSOCKcluster(2)
# 2. PREPARING THE CLUSTER
clusterSetRNGStream(cl, 123) # Equivalent to `set.seed(123)`
# 3. DO YOUR CALL
ans <- parSapply(cl, 1:2, function(x) runif(1e3))
(ans0 <- var(ans))
# [,1] [,2]
# [1,] 0.0861888293 -0.0001633431
# [2,] -0.0001633431 0.0853841838
# I want to get the same!
clusterSetRNGStream(cl, 123)
ans1 <- var(parSapply(cl, 1:2, function(x) runif(1e3)))
all.equal(ans0, ans1) # All equal!
# [1] TRUE
# 4. STOP THE CLUSTER
stopCluster(cl)
In the case of makeForkCluster
# 1. CREATING A CLUSTER
library(parallel)
# The fork cluster will copy the -nsims- object
nsims <- 1e3
cl <- makeForkCluster(2)
# 2. PREPARING THE CLUSTER
RNGkind("L'Ecuyer-CMRG")
set.seed(123)
# 3. DO YOUR CALL
ans <- do.call(cbind, mclapply(1:2, function(x) {
runif(nsims) # Look! we use the nsims object!
# This would have fail in makePSOCKCluster
# if we didn't copy -nsims- first.
}))
(ans0 <- var(ans))
# Same sequence with same seed
set.seed(123)
ans1 <- var(do.call(cbind, mclapply(1:2, function(x) runif(nsims))))
ans0 - ans1 # A matrix of zeros
# 4. STOP THE CLUSTER
stopCluster(cl)
We know that \(\pi = \frac{A}{r^2}\). We approximate it by randomly adding points \(x\) to a square of size 2 centered at the origin.
So, we approximate \(\pi\) as \(\Pr\{\|x\| \leq 1\}\times 2^2\)
The R code to do this
pisim <- function(i, nsim) { # Notice we don't use the -i-
# Random points
ans <- matrix(runif(nsim*2), ncol=2)
# Distance to the origin
ans <- sqrt(rowSums(ans^2))
# Estimated pi
(sum(ans <= 1)*4)/nsim
}
# Setup
cl <- makePSOCKcluster(10)
clusterSetRNGStream(cl, 123)
# Number of simulations we want each time to run
nsim <- 1e5
# We need to make -nsim- and -pisim- available to the
# cluster
clusterExport(cl, c("nsim", "pisim"))
# Benchmarking: parSapply and sapply will run this simulation
# a hundred times each, so at the end we have 1e5*100 points
# to approximate pi
rbenchmark::benchmark(
parallel = parSapply(cl, 1:100, pisim, nsim=nsim),
serial = sapply(1:100, pisim, nsim=nsim), replications = 1
)[,1:4]
# test replications elapsed relative
# 1 parallel 1 0.455 1.000
# 2 serial 1 1.842 4.048
ans_par <- parSapply(cl, 1:100, pisim, nsim=nsim)
ans_ser <- sapply(1:100, pisim, nsim=nsim)
stopCluster(cl)
# par ser R
# 3.141561 3.141247 3.141593
The ‘foreach’ package provides a looping construct to execute R code repeatedly in parallel
The general syntax of foreach
is similar to a standard for loop
# With parallelization --> %dopar%
output <- foreach(i = 'some object to iterate over', 'options') %dopar% {some r code}
# Without parallelization --> %do%
output <- foreach(i = 'some object to iterate over', 'options') %do% {some r code}
As a first example, we use foreach
without parallelization
result <- foreach(x = c(4, 9, 16, 25)) %do% sqrt(x)
result
# [[1]]
# [1] 2
#
# [[2]]
# [1] 3
#
# [[3]]
# [1] 4
#
# [[4]]
# [1] 5
The default object returned is a list that contains the individual results compiled across all iterations
class(result)
# [1] "list"
Method 1: makeCluster
+ registerDoParallel
# Create cluster
myCluster <- makeCluster("# of cores", type = "PSOCK" or "FORK")
# Register cluster
registerDoParallel(myCluster)
Method 2: registerDoParallel
+ cores =
option
# Create and register cluster
registerDoParallel(cores = "# of cores")
Use foreach
with %dopar%
result <- foreach(x = c(4, 9, 16, 25)) %dopar% sqrt(x)
Stop the cluster (only required if you used Method 1 above)
stopCluster(myCluster)
By default, Method 2 will create a SOCK cluster on Windows systems and a FORK cluster on Unix systems
getDoParWorkers()
can be used to verify the number of workers (cores)
With some extra work and packages (Rmpi, doMPI), we can also use an MPI backend to enable parallelization on multiple nodes
The .combine
option is used to specify the function used to combine results
# Combine the results in a vector
foreach(x = c(4, 9, 16, 25), .combine = 'c') %dopar% sqrt(x)
# [1] 2 3 4 5
# Add the results together
foreach(x = c(4, 9, 16, 25), .combine = '+') %dopar% x
# [1] 54
By default, .combine
assumes that the function accepts two arguments (except for c
, cbind
, rbind
)
# Custom combine that creates a running product
customCombine <- function(i, j) {
cat(i, "\n")
c(i, i[length(i)] * j)
}
foreach(x = c(2, 4, 6, 8, 10), .combine = customCombine) %dopar% x
# 2
# 2 8
# 2 8 48
# 2 8 48 384
# [1] 2 8 48 384 3840
.multicombine = TRUE
specifies that our combine function can accept more than two arguments
.maxcombine
sets the max number of arguments that can be passed to the combine function
The ‘iterators’ package provides functions to generate: “A special type of object that supplies data on demand, one element at a time.”
iter()
is a generic function to create iterators over common R objects (vector, list, data frame)
myIterator <- iter(object_to_iterate_over, by = "How to iterate over object", checkFunc = "optional function", recycle = FALSE)
Example: A simple vector iterator over odd integers
vector_iterator <- iter(c(5, 10, 15, 20), checkFunc = function(x) x %% 2 != 0)
str(vector_iterator)
# List of 4
# $ state :<environment: 0x3d2d748>
# $ length : int 4
# $ checkFunc:function (x)
# ..- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 55 1 77 55 77 1 1
# .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x3dc5c58>
# $ recycle : logi FALSE
# - attr(*, "class")= chr [1:2] "containeriter" "iter"
cat("obj:", vector_iterator$state$obj, " i:", vector_iterator$state$i, "\n")
# obj: 5 10 15 20 i: 0
cat("current element:", nextElem(vector_iterator), " i:", vector_iterator$state$i, "\n")
# current element: 5 i: 1
cat("current element:", nextElem(vector_iterator), " i:", vector_iterator$state$i, "\n")
# current element: 15 i: 3
An iterator that traverses over blocks of columns of a matrix
iblkcol <- function(mat, num_blocks) {
# initialize variables to hold num of cols and current col index
nc <- ncol(mat)
i <- 1
# define nextElem function
next_element <- function() {
if (num_blocks <= 0 || nc <= 0) stop('StopIteration') # stop when we have no more columns
block_size <- ceiling(nc / num_blocks) # determine block size
col_idx <- seq(i, length = block_size) # get indices
i <<- i + block_size # update current col index
nc <<- nc - block_size # update num cols remaining
num_blocks <<- num_blocks - 1 # update num blocks remaining
mat[, col_idx, drop = FALSE] # return next set of columns
}
# output list element with necessary class definitions
itr <- list(nextElem = next_element)
class(itr) <- c('iblkcol', 'abstractiter', 'iter')
itr
}
myMatrix <- matrix(runif(200), nrow = 2, ncol = 100)
splitMatrix <- iblkcol(myMatrix, num_blocks = 25)
nextElem(splitMatrix)
# [,1] [,2] [,3] [,4]
# [1,] 0.1090886 0.9651406 0.01187498 0.8530483
# [2,] 0.8045434 0.7274180 0.08226454 0.5665807
Note: The nextElem() function in splitMatrix is an example of a closure (encloses the environment of its parent function)
Example pseudocode for a simple bootstrap to estimate the standard error for a sample statistic
# vector to hold estimates across bootstrap samples
est <- vector(mode = "numeric", length = Num_Bootstrap_Samples)
for (i in 1:Num_Bootstrap_Samples) {
# Sample with replacement from original data (bootstrap sample size same as original data)
boot_sample <- sample(1:n, n, replace = TRUE)
# Compute statistic in bootstrap sample using some function g()
est[i] <- g(dat[boot_sample])
}
# Compute standard error of estimate based on bootstrap samples
se_est <- sd(est)
Suppose we have the following sample of 100 observations for which we want to estimate the sample median
set.seed(123)
dat <- rnorm(n = 100, mean = 2.3, sd = 1.5)
dat[1:5]
# [1] 0.8471109 3.3591636 4.5335320 -0.4226388 2.7956144
We can modify the previous code to run 10,000 bootstrap replicates and obtain an estimate of the standard error for the median
# vector to hold estimates across bootstrap samples
n <- length(dat)
num_reps <- 10000
median_est <- vector(mode = "numeric", length = num_reps)
for (i in 1:num_reps) {
# Sample with replacement from original data (bootstrap sample size same as original data)
boot_sample <- sample(1:n, n, replace = TRUE)
# Compute statistic in bootstrap sample
median_est[i] <- median(dat[boot_sample])
}
# Compute mean and standard error of median estimate based on bootstrap sample statistics
mean(median_est)
# [1] 1.820074
sd(median_est)
# [1] 0.7335949
The foreach loop is very similar
median_est_fe <- foreach(i = seq.int(num_reps), .combine = c) %dopar% {
boot_sample <- sample(1:n, n, replace = TRUE)
median(dat[boot_sample])
}
mean(median_est_fe)
# [1] 1.811854
sd(median_est_fe)
# [1] 0.7306325
Timing: foreach is slower due to communication overheard and the low computational burden of the individual tasks
summary(microbenchmark::microbenchmark(
for_loop = for (i in 1:num_reps) {
boot_sample <- sample(1:n, n, replace = TRUE)
median_est[i] <- median(dat[boot_sample])
},
foreach = foreach(i = seq.int(num_reps), .combine = c) %dopar% {
boot_sample <- sample(1:n, n, replace = TRUE)
median(dat[boot_sample])
},
times = 1)
)[,c("expr","mean","median")]
# expr mean median
# 1 for_loop 389.4132 389.4132
# 2 foreach 3288.3632 3288.3632
To speed up, you may also consider “chunking” tasks to reduce communication overhead
To simplify the example, we sample both the outcome and predictors (i.e. the entire observation), this is known as case sampling
library(boot)
# get last observation point for each person (i.e. last time point tested for bacteria)
bact <- do.call(rbind, by(MASS::bacteria, MASS::bacteria$ID, function(x) x[which.max(x$week), ]))
# function to compute coefficents
boot_fun <- function(dat, idx, fmla) {
coef(glm(fmla, data = dat[idx, ], family = binomial))
}
Rather than write a for loop, the ‘boot’ package provides functions to easily apply our function across bootstrap replicates
set.seed(123)
boot(bact, boot_fun, R = 10000, fmla = as.formula(y ~ ap))
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = bact, statistic = boot_fun, R = 10000, fmla = as.formula(y ~
# ap))
#
#
# Bootstrap Statistics :
# original bias std. error
# t1* 0.7985077 0.0395186 0.4309512
# t2* 0.3646431 0.1520327 1.6066249
We can replicate the results from ‘boot’ with ‘foreach’
set.seed(123)
n <- nrow(bact)
# generate bootstrap sample indices
indices <- sample.int(n, n * 10000, replace = TRUE)
dim(indices) <- c(10000, n)
# fit all bootstrap logistic regression models and extract coefficients into a matrix
results_foreach <- foreach(x = iter(indices, by = 'row'), .combine = cbind, .inorder = FALSE) %dopar% {
coef(glm(y ~ ap, data = bact[x, ], family = binomial))
}
# standard error estimates
apply(results_foreach, 1, sd)
# (Intercept) app
# 0.4309512 1.6066249
The ‘doRNG’ package allows us to easily handle random number generation (note that the results will not match the previous)
library(doRNG)
results_foreach <- foreach(i = seq.int(10000), .combine = cbind, .inorder = FALSE) %dorng% {
boot_samp <- sample.int(n, n, replace = TRUE)
coef(glm(y ~ ap, data = bact[boot_samp, ], family = binomial))
}
apply(results_foreach, 1, sd)
# (Intercept) app
# 0.4349333 1.6874428
Timing: With increased computational burden for each task, we see an improvement by using the parallel approach
summary(microbenchmark::microbenchmark(
boot = boot::boot(bact, boot_fun, R = 10000, fmla = as.formula("y ~ ap")),
foreach = foreach(i = seq.int(10000), .combine = cbind, .inorder = FALSE) %dopar% {
boot_samp <- sample.int(n, n, replace = TRUE)
coef(glm(y ~ ap, data = bact[boot_samp, ], family = binomial))
},
times = 1, unit = 's')
)[,c("expr","mean","median")]
# expr mean median
# 1 boot 18.381147 18.381147
# 2 foreach 9.703243 9.703243
stopCluster(cl)
Below we generate two classes and a set of potential predictors
# Random Forest Example
# Number of observations and predictor variables
n <- 1500
p <- 500
# Predictor data simulated as MVN(0,sigma) with AR(1) correlation
means_x <- rep(0, p)
var_x <- 1
rho <- 0.8
sigma_x <- matrix(NA, nrow = p, ncol = p)
for(i in 1:p){
for(j in 1:p){
sigma_x[i,j] <- var_x*rho^abs(i-j)
}
}
X <- MASS::mvrnorm(n = n, mu = means_x, Sigma = sigma_x)
# Outcome is binary (two classes)
y <- gl(2, 750)
Two changes in the call to foreach
# random forest sequentially
rf <- randomForest(X, y, ntree = 1000, nodesize = 3)
# random forest parallel (split up tree generation)
rf_par <- foreach(ntree = rep(250, 4),
.combine = combine,
.packages = "randomForest") %dopar% {
randomForest(X, y, ntree = ntree, nodesize = 3)
}
.combine = combine
.packages
option must be used to export the ‘randomForest’ package to all the coresforeach
are exported.export
and .noexport
can be used to control which objects are exportedTiming
# expr mean median
# 1 rf 41.90121 41.90121
# 2 rf_parallel 11.57544 11.57544
Friendlier than RcppParallel… at least for ‘I-use-Rcpp-but-don’t-actually-know-much-about-C++’ users (like myself!).
Must run only ‘Thread-safe’ calls, so calling R within parallel blocks can cause problems (almost all the time).
Use arma
objects, e.g. arma::mat
, arma::vec
, etc. Or, if you are used to them std::vector
objects as these are thread safe.
Pseudo Random Number Generation is not very straight forward… But C++11 has a nice set of functions that can be used together with OpenMP
Need to think about how processors work, cache memory, etc. Otherwise you could get into trouble… if your code is slower when run in parallel, then you probably are facing false sharing
If R crashes… try running R with a debugger (see Section 4.3 in Writing R extensions):
~$ R --debugger=valgrind
Add the following to your C++ source code to use OpenMP, and tell Rcpp that you need to include that in the compiler:
#include <omp.h>
// [[Rcpp::plugins(openmp)]]
Tell the compiler that you’ll be running a block in parallel with openmp
#pragma omp [directives] [options]
{
...your neat parallel code...
}
You’ll need to specify how OMP should handle the data:
shared
: Default, all threads access the same copy.private
: Each thread has its own copy (although not initialized).firstprivate
Each thread has its own copy initialized.lastprivate
Each thread has its own copy. The last value is the one stored in the main program.Setting default(none)
is a good practice.
Compile!
#include <omp.h>
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(openmp)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::mat dist_par(arma::mat X, int cores = 1) {
// Some constants
int N = (int) X.n_rows;
int K = (int) X.n_cols;
// Output
arma::mat D(N,N);
D.zeros(); // Filling with zeros
// Setting the cores
omp_set_num_threads(cores);
#pragma omp parallel for shared(D, N, K, X) default(none)
for (int i=0; i<N; i++)
for (int j=0; j<i; j++) {
for (int k=0; k<K; k++)
D.at(i,j) += pow(X.at(i,k) - X.at(j,k), 2.0);
// Computing square root
D.at(i,j) = sqrt(D.at(i,j));
D.at(j,i) = D.at(i,j);
}
// My nice distance matrix
return D;
}
# Compiling the function
Rcpp::sourceCpp("dist.cpp")
# Simulating data
set.seed(1231)
K <- 5000
n <- 500
x <- matrix(rnorm(n*K), ncol=K)
# Are we getting the same?
table(as.matrix(dist(x)) - dist_par(x, 10)) # Only zeros
#
# 0
# 250000
# Benchmarking!
rbenchmark::benchmark(
dist(x), # stats::dist
dist_par(x, cores = 1), # 1 core
dist_par(x, cores = 4), # 4 cores
dist_par(x, cores = 10), # 10 cores
replications = 1, order="elapsed"
)[,1:4]
# test replications elapsed relative
# 4 dist_par(x, cores = 10) 1 0.512 1.000
# 3 dist_par(x, cores = 4) 1 1.180 2.305
# 2 dist_par(x, cores = 1) 1 2.358 4.605
# 1 dist(x) 1 5.463 10.670
#include <omp.h>
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(openmp)]]
// C++11 provides several RNGs algorithms that can be set up to be thread safe.
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
double sim_pi(int m, int cores = 1, int seed = 100) {
// Setting the cores
omp_set_num_threads(cores);
int n = m / cores;
double ans = 0.0, d;
double val = 4.0/m;
double piest;
int i;
#pragma omp parallel default(none) shared(ans, cores) \
firstprivate(val, n, m, seed) \
private(piest, i, d)
{
// Which core are we
int core_num = omp_get_thread_num();
// Setting up the RNG
// - The first line creates an engine that uses the 64-bit Mersenne Twister by
// Matsumoto and Nishimura, 2000. One seed per core.
// - The second line creates a function based on the real uniform between -1
// and 1. This receives as argument the engine
std::mt19937_64 engine((core_num + seed)*10);
std::uniform_real_distribution<double> my_runif(-1.0, 1.0);
double p0, p1;
piest = 0.0;
for (i = n*core_num; i < (n + n*core_num); i++) {
// Random number generation (see how we pass the engine)
p0 = my_runif(engine);
p1 = my_runif(engine);
d = sqrt(pow(p0, 2.0) + pow(p1, 2.0));
if (d <= 1.0)
piest += val;
}
// This bit of code is executed one thread at a time.
// Instead of -atomic-, we could have use -critical-, but that has
// a higher overhead.
#pragma omp atomic
ans += piest;
}
return ans;
}
# Compiling c++
Rcpp::sourceCpp("simpi.cpp")
# Does the seed work?
sim_pi(1e5, cores = 4, seed = 50)
# [1] 3.1478
sim_pi(1e5, cores = 4, seed = 50)
# [1] 3.1478
# Benchmarking
nsim <- 1e8
rbenchmark::benchmark(
pi01 = sim_pi(nsim, 1),
pi04 = sim_pi(nsim, 4),
pi10 = sim_pi(nsim, 10),
replications = 1
)[,1:4]
# test replications elapsed relative
# 1 pi01 1 2.578 6.095
# 2 pi04 1 0.915 2.163
# 3 pi10 1 0.423 1.000
No big speed gains… but at least you know how to use it now :)! Nice speed gains!
rslurm
packageThe rslurm
package (Marchand, 2017) provides a wrapper of Slurm in R.
Without the need of knowing much about the syntax of slurm
, this R package does the following:
Writes an R source file that sets up each node with your current config (packages, libpath, etc.). The outputs are stored in a known folder so these can be fetched out later.
Writes a bash file with the call to sbatch
(you can specify options).
Executes the bash file and returns the jobid (you can query its status interatively).
Here is a simple example with our sim_pi
function (so we are mixing OpenMP with Slurm!):
library(rslurm)
# How many nodes are we going to be using
nnodes <- 2L
# The slurm_apply function is what makes all the work
sjob <- slurm_apply(
# We first define the job as a function
f = function(n) {
# Compiling Rcpp
Rcpp::sourceCpp("~/simpi.cpp")
# Returning pi
sim_pi(1e9, cores = 8, seed = n*100)
},
# The parameters that `f` receives must be passed as a data.frame
params = data.frame(n = 1:nnodes), jobname = "sim-pi",
# How many cpus we want to use (this when calling mcapply)
cpus_per_node = 1,
# Here we are asking for nodes with 8 CPUS
slurm_options = list(`cpus-per-task` = 8),
nodes = nnodes,
submit = TRUE
)
# We save the image so that later we can use the `sjob` object to retrieve the
# results
save.image("~/sim-pi.rda")
# R version 3.4.3 (2017-11-30)
# Platform: x86_64-redhat-linux-gnu (64-bit)
# Running under: CentOS Linux 7 (Core)
#
# Matrix products: default
# BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
# [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
# [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
# attached base packages:
# [1] parallel stats graphics grDevices utils datasets methods
# [8] base
#
# other attached packages:
# [1] randomForest_4.6-14 doParallel_1.0.11 iterators_1.0.9
# [4] foreach_1.4.4
#
# loaded via a namespace (and not attached):
# [1] Rcpp_0.12.16 codetools_0.2-15 digest_0.6.15 rprojroot_1.3-2
# [5] backports_1.1.2 magrittr_1.5 evaluate_0.10.1 highr_0.6
# [9] stringi_1.1.7 rmarkdown_1.9 tools_3.4.3 stringr_1.3.0
# [13] yaml_2.1.18 compiler_3.4.3 htmltools_0.3.6 knitr_1.20
Generating multivariate normal random samples using parallel and foreach (random-mvn.R
for pseudo-code, and random-mvn-solution.R
for the implementation).
Fibonacci with Rcpp (fib.R
for pseudo-code, and fib-solution.R
for the implementation).
Rewriting the scale
function using Rcpp and OpenMP (scale.cpp
for pseudo-code, and scale-solution.cpp
for the implementation).
Another example is provided in the file wordcount.R
(you’ll need the dataset ulysses.txt
)