Memory Use

Users of lower level languages will be familiar with memory allocation issues. R deals with this for us, and performs garbage collection of objects we are no longer using (though not perfectly, in my experience): as a consequence one may fail to appreciate that allocating memory and copying objects carries a computational cost.

## function to calculate sin (of course, sin() is already
## vectorized, so this is not actually useful)
sin1 = function(x) {
  out = c()                # start with empty vector
  for (i in seq_along(x)) out = c(out, sin(x[i]))
  out
}

sin2 = function(x) {
  out = numeric(length(x)) # allocate memory in advance
  for (i in seq_along(x)) out[i] = sin(x[i])
  out
}

x = rnorm(1e3)
microbenchmark(sin1(x), sin2(x), times=100)
## Unit: microseconds
##     expr    min     lq median     uq   max neval
##  sin1(x) 1684.8 1779.7 1886.7 2097.8 17506   100
##  sin2(x)  783.4  833.6  864.8  899.7  1072   100

The same is true for cbind() and (even more so) for rbind(). When you use one of these functions, R allocates an entirely new space in memory, and then copies the old object into the new space. If you allocate the space in advance, none of this is necessary.

## functions to make a multiplication table
mult_table1 = function(k) {
  out = matrix(NA, nrow=0, ncol=k)
  for (i in seq_len(k)) out = rbind(out, (1:k)*i)
  out
}

mult_table2 = function(k) {
  out = matrix(NA, nrow=k, ncol=k)
  for (i in seq_len(k)) out[i,] = (1:k)*i
  out
}

mult_table1(5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    2    4    6    8   10
## [3,]    3    6    9   12   15
## [4,]    4    8   12   16   20
## [5,]    5   10   15   20   25
microbenchmark(mult_table1(100), mult_table2(100), times=100)
## Unit: microseconds
##              expr    min     lq median   uq    max neval
##  mult_table1(100) 1539.1 1642.7 1902.0 2103 2395.1   100
##  mult_table2(100)  339.1  358.8  368.9  389  976.9   100

A common symptom of running out of memory or using too much is that your program slows down as it runs.

If you want to learn more about the way R deals with memory use, see Hadley Wickham’s page on the subject.

Avoiding Copying

In some other languages it is possible to see exactly when an object is being copied, and therefore to identify code bottlenecks. Life in R is a little more complicated. If we do something like

library(pryr)
x = rnorm(1e4)
y = x
c(address(x), address(y))
## [1] "0x3bac310" "0x3bac310"

then R doesn’t immediately make a new copy of x and call it y, it will just remember that x and y are the same. But if we later modify one of x or y, R is forced to make a copy.

y[1] = 0
c(address(x), address(y))
## [1] "0x3bac310" "0x3f8cd20"

There are few tricks to speed things up which are useful to remember. If you use a function like matrix() to change the dimension of something, it will create an entirely new object in order to return it. An alternative is to edit the dimensions directly.

mat_direct = function(n) {
  x = matrix(1:n^2, n, n)
  x
}

mat_edit_dim = function(n) {
  x = 1:n^2
  dim(x) = c(n, n)
  x
}

microbenchmark(mat_direct(100), mat_edit_dim(100))
## Unit: microseconds
##               expr    min     lq median     uq    max neval
##    mat_direct(100) 87.415 100.45 101.96 103.67 724.31   100
##  mat_edit_dim(100)  6.877  12.98  14.29  15.26  18.02   100

Method Selection

Like many programming languages, R allows different functions with the same name to perform different tasks depending upon the input they are given. The function solve(), for example, has two basic methods: one for a matrix and one for the QR-decomposition of a matrix.

solve
## function (a, b, ...) 
## UseMethod("solve")
## <bytecode: 0x4205328>
## <environment: namespace:base>
methods(solve)
## [1] solve.default solve.qr

If we call the default method directly, we can reduce the overhead slightly.

A = matrix(rnorm(9), 3, 3)
microbenchmark(solve(A), solve.default(A))
## Unit: microseconds
##              expr   min    lq median    uq    max neval
##          solve(A) 21.95 22.59  22.93 23.38  45.22   100
##  solve.default(A) 17.59 18.31  18.64 19.28 457.22   100

The difference becomes less marked if the final computation is more demanding.

A = matrix(rnorm(1e4), 100, 100)
microbenchmark(solve(A), solve.default(A))
## Unit: microseconds
##              expr   min    lq median    uq  max neval
##          solve(A) 598.0 602.8  606.2 628.4 1177   100
##  solve.default(A) 592.8 597.5  601.6 616.6 1061   100

Using unique.default() instead of unique() can produce more dramatic improvements.

x = sample(1:10, replace=TRUE)
microbenchmark(unique(x), unique.default(x))
## Unit: microseconds
##               expr   min    lq median    uq   max neval
##          unique(x) 4.830 5.094  5.372 5.654 39.36   100
##  unique.default(x) 1.745 2.056  2.202 2.381 14.70   100

Other functions often have stripped down versions which can be discovered by looking at the documentation. Some examples are:

generic function faster versions
sort() sort.int()
rep() rep.int(), rep_len()
seq(), : seq_len(), seq_along()
t() t.default()
solve() solve.default()
unique() unique.default()
rowSums(), colSums() .rowSums(), .colSums()

Internal Methods

Sometimes even the overhead cost of the correct method is annoyingly high: this usually happens if you have to run a very easy problem a very large number of times (say in the middle of a loop). In such a case it may be beneficial to go straight to the lower level routine which does the job you want. In the case of sort.int(x), inspection of the code reveals that the function calls .Internal(qsort(x, decreasing)) to do the work.

(x = sample(1:10))
##  [1]  8  6  5  2 10  4  1  9  3  7
.Internal(qsort(x, FALSE))
##  [1]  1  2  3  4  5  6  7  8  9 10

It runs orders of magnitude faster than the wrapper function, because it doesn’t check the content of x at all.

microbenchmark(sort.int(x),
               .Internal(qsort(x, FALSE)),
               times=1e5)
## Unit: nanoseconds
##                        expr   min    lq median    uq      max neval
##                 sort.int(x) 20415 23221  24064 25003 27517262 1e+05
##  .Internal(qsort(x, FALSE))   154   237    394   468    14182 1e+05

Strictly speaking we shouldn’t write code which calls internal routines, as they may not behave the way we expect, their functionality may change, and we are skipping the checks on our data that the wrapper functions make for us. For this reason, you can’t submit packages to CRAN which use internal calls. However, sometimes the gains are too good to pass up.

Compilation

R comes with the package compiler, which allows you to compile functions in order to (hopefully) make them run faster.

library(compiler)
sin3 = cmpfun(sin1)

x = rnorm(10)
microbenchmark(sin1(x), sin3(x), times=1000)
## Unit: microseconds
##     expr   min   lq median    uq   max neval
##  sin1(x) 5.057 5.54  5.766 6.073 15.81  1000
##  sin3(x) 3.927 4.30  4.531 4.822 12.04  1000

The advantage of this compilation is that it’s very quick: if it doesn’t work, then no harm was done by trying.

Parallelization

We won’t cover this in detail, but many computations can be structured so that separate pieces are dispatched to different processors at the same time (i.e. in parallel). Every instance of an apply() family function is of this kind, for example.

The parallel package in R can do this:

library(parallel)
detectCores()
## [1] 8

So there are up to 8 processors we can use in parallel.

x = replicate(1e4, rnorm(10))

microbenchmark(lapply(x, mean), 
               mclapply(x, mean, mc.cores=8), 
               times=10)
## Unit: milliseconds
##                             expr   min    lq median    uq   max neval
##                  lapply(x, mean) 560.7 569.3  585.7 594.6 658.7    10
##  mclapply(x, mean, mc.cores = 8) 199.6 253.3  298.5 371.3 405.0    10

The number of cores to be used is specified with mc.cores, which should not exceed the number on your machine. A disadvantage is that performance will vary widely between computers, and the mclapply() function doesn’t work on Windows. This reduces R’s portability, which is one of its attractions.

Certain problems can’t be structured in a parallel way without some loss of efficiency, such as simulating from Markov chains.