Slides: [jumpingrivers.com]

Who am I

  • Dr Colin Gillespie

Jumping Rivers

  • Statistical and R consultancy
  • R, Scala, python, & Stan training
  • Predictive analytics
  • Dashboard development
  • Questionnaires

My R code is slow

Use the byte compiler!

Byte compiler

  • The compiler package has been part of R since version 2.13.0
    • It translates R functions into another language that can be interpreted by a very fast interpreter
  • Since R 2.14.0, all of the standard functions and packages in R will be pre-compiled into byte-code

Byte compiler: the mean() function

#> function (x, ...) 
#> UseMethod("mean")
#> <bytecode: 0x73098b8>
#> <environment: namespace:base>

note the bytecode line

Byte compiler

  • We can compile our own R functions and obtain byte code version that may run faster.

Example: Bad mean

mean_r = function(x) {
  total = 0
  n = length(x)
  for(i in 1:n)
    total = total + x[i]/n
  total
}

Compiled version

library("compiler")
cmp_mean_r = cmpfun(mean_r)
cmp_mean_r  
#> function(x) {
#>   total = 0
#>   n = length(x)
#>   for(i in 1:n)
#>     total = total + x[i]/n
#>   total
#> }
#> <bytecode: 0x5608fa8>

Benchmarks

# Generate some data
x = rnorm(1000)
microbenchmark::microbenchmark(times = 10, unit = "ms", # milliseconds
          mean_r(x), cmp_mean_r(x), mean(x))
#> Unit: milliseconds
#>           expr   min    lq  mean median    uq  max neval cld
#>      mean_r(x) 0.358 0.361 0.370  0.363 0.367 0.43    10   c
#>  cmp_mean_r(x) 0.050 0.051 0.052  0.051 0.051 0.07    10  b 
#>        mean(x) 0.005 0.005 0.008  0.007 0.008 0.03    10 a  

Benchmarks

Compiling code

There are a number of ways to complile code.

  • Compile individual functions using cmpfun()
  • Enable just-in-time (JIT) compilation
    • At the top of your R code add

where \(N\) indices the level of optimisation (\(0\) to \(3\))

Compiling code

  • If you create a package, then you automatically compile the package on installation by adding

to the DESCRIPTION file

  • Most R packages installed using install.packages() are not compiled
    • We can force packages to be compiled by starting R with the environment variable R_COMPILE_PKGS
    • Add R_COMPILE_PKGS=3 to ~/.Renviron

Compiling packages

## Windows users need Rtools
install.packages("ggplot2", 
                 type = "source", 
                 INSTALL_opts = "--byte-compile") 

My R code is slow

Change your BLAS library

Basic Linear Algebra System (BLAS)

  • R uses BLAS for linear algebra operations
    • Anything involving matrices
  • By switching to a different BLAS library, it may be possible to speed-up your R code.
    • Easy for Linux/Apple, but can be tricky for Windows users
  • Two open source alternative BLAS libraries are ATLAS and OpenBLAS.

Issues

  • ATLAS and OpenBLAS use multiple cores
    • Occassionally this can be a problem (if it's embedded in a parallel problem)

A quick break

install.packages("benchmarkme")

My R code is slow

Buy a better computer!

Or should you?

benchmarkme

library("benchmarkme")
get_ram()
#> 16.3 GB
get_cpu()
#> $vendor_id
#> [1] "GenuineIntel"
#> 
#> $model_name
#> [1] "Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz"
#> 
#> $no_of_cores
#> [1] 8

benchmarkme

library("benchmarkme")## On CRAN
## Tests based on a script by
## Simon Urbanek & Douglas Bates
res = benchmark_std(runs = 3)

benchmarkme

library("benchmarkme")
res = benchmark_std(runs = 2)
# # Programming benchmarks (5 tests):
#     3,500,000 Fibonacci numbers calculation (vector calc): 0.52 (sec).
#     Grand common divisors of 1,000,000 pairs (recursion): 0.965 (sec).
#     Creation of a 3500x3500 Hilbert matrix (matrix calc): 0.306 (sec).
#     Creation of a 3000x3000 Toeplitz matrix (loops): 11.5 (sec).
#     Escoufier's method on a 60x60 matrix (mixed): 1.17 (sec).
# # Matrix calculation benchmarks (5 tests):
#    Creation, transp., deformation of a 5000x5000 matrix: 0.794 (sec).
#    2500x2500 normal distributed random matrix ^1000: 0.522 (sec).
#    Sorting of 7,000,000 random values: 0.598 (sec).
#    2500x2500 cross-product matrix (b = a' * a): 6.56 (sec).
#    Linear regr. over a 3000x3000 matrix (c = a \ b'): 4.5 (sec).
# # Matrix function benchmarks (5 tests):

benchmarkme

# Upload results +
# RAM, CPU, 
# OS, byte-compile, BLAS
upload_results(res)

benchmarkme

plot(res)

Uploaded results

Hardware: RAM

Results: Programming benchmarks

And the winner is….

Results: Matrix benchmarks

Intel CPU Differences (relative times)

Input/output benchmark_io

Adding benchmarkme to your package

  • upload_results takes a five column matrix
    • Columns 1 to 3: system.time output
    • Columns 4 & 5 are benchmark labels
  • Easy to add to your own package
    • Results will be automatically incorparated in future benchmarkme releases

Summary

  • Upgrade hardware
  • Byte-compiling and BLAS are easy optimisations
    • No-one byte compiles!
  • Network drives are slow

Links