Users of lower level languages will be familiar with memory allocation issues. R deals with this for us, and performs garbage collection of objects we are no longer using (though not perfectly, in my experience): as a consequence one may fail to appreciate that allocating memory and copying objects carries a computational cost.

```
## function to calculate sin (of course, sin() is already
## vectorized, so this is not actually useful)
sin1 = function(x) {
out = c() # start with empty vector
for (i in seq_along(x)) out = c(out, sin(x[i]))
out
}
sin2 = function(x) {
out = numeric(length(x)) # allocate memory in advance
for (i in seq_along(x)) out[i] = sin(x[i])
out
}
x = rnorm(1e3)
microbenchmark(sin1(x), sin2(x), times=100)
```

```
## Unit: microseconds
## expr min lq median uq max neval
## sin1(x) 1684.8 1779.7 1886.7 2097.8 17506 100
## sin2(x) 783.4 833.6 864.8 899.7 1072 100
```

The same is true for `cbind()`

and (even more so) for `rbind()`

. When you use one of these functions, R allocates an entirely new space in memory, and then copies the old object into the new space. If you allocate the space in advance, none of this is necessary.

```
## functions to make a multiplication table
mult_table1 = function(k) {
out = matrix(NA, nrow=0, ncol=k)
for (i in seq_len(k)) out = rbind(out, (1:k)*i)
out
}
mult_table2 = function(k) {
out = matrix(NA, nrow=k, ncol=k)
for (i in seq_len(k)) out[i,] = (1:k)*i
out
}
mult_table1(5)
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 2 4 6 8 10
## [3,] 3 6 9 12 15
## [4,] 4 8 12 16 20
## [5,] 5 10 15 20 25
```

`microbenchmark(mult_table1(100), mult_table2(100), times=100)`

```
## Unit: microseconds
## expr min lq median uq max neval
## mult_table1(100) 1539.1 1642.7 1902.0 2103 2395.1 100
## mult_table2(100) 339.1 358.8 368.9 389 976.9 100
```

A common symptom of running out of memory or using too much is that your program *slows down* as it runs.

If you want to learn more about the way R deals with memory use, see Hadley Wickham’s page on the subject.

In some other languages it is possible to see exactly when an object is being copied, and therefore to identify code bottlenecks. Life in R is a little more complicated. If we do something like

```
library(pryr)
x = rnorm(1e4)
y = x
c(address(x), address(y))
```

`## [1] "0x3bac310" "0x3bac310"`

then R doesn’t immediately make a new copy of `x`

and call it `y`

, it will just remember that `x`

and `y`

are the same. But if we *later* modify one of `x`

or `y`

, R is forced to make a copy.

```
y[1] = 0
c(address(x), address(y))
```

`## [1] "0x3bac310" "0x3f8cd20"`

There are few tricks to speed things up which are useful to remember. If you use a function like `matrix()`

to change the dimension of something, it will create an entirely new object in order to return it. An alternative is to edit the dimensions directly.

```
mat_direct = function(n) {
x = matrix(1:n^2, n, n)
x
}
mat_edit_dim = function(n) {
x = 1:n^2
dim(x) = c(n, n)
x
}
microbenchmark(mat_direct(100), mat_edit_dim(100))
```

```
## Unit: microseconds
## expr min lq median uq max neval
## mat_direct(100) 87.415 100.45 101.96 103.67 724.31 100
## mat_edit_dim(100) 6.877 12.98 14.29 15.26 18.02 100
```

Like many programming languages, R allows different functions with the same name to perform different tasks depending upon the input they are given. The function `solve()`

, for example, has two basic methods: one for a matrix and one for the QR-decomposition of a matrix.

`solve`

```
## function (a, b, ...)
## UseMethod("solve")
## <bytecode: 0x4205328>
## <environment: namespace:base>
```

`methods(solve)`

`## [1] solve.default solve.qr`

If we call the default method directly, we can reduce the overhead slightly.

```
A = matrix(rnorm(9), 3, 3)
microbenchmark(solve(A), solve.default(A))
```

```
## Unit: microseconds
## expr min lq median uq max neval
## solve(A) 21.95 22.59 22.93 23.38 45.22 100
## solve.default(A) 17.59 18.31 18.64 19.28 457.22 100
```

The difference becomes less marked if the final computation is more demanding.

```
A = matrix(rnorm(1e4), 100, 100)
microbenchmark(solve(A), solve.default(A))
```

```
## Unit: microseconds
## expr min lq median uq max neval
## solve(A) 598.0 602.8 606.2 628.4 1177 100
## solve.default(A) 592.8 597.5 601.6 616.6 1061 100
```

Using `unique.default()`

instead of `unique()`

can produce more dramatic improvements.

```
x = sample(1:10, replace=TRUE)
microbenchmark(unique(x), unique.default(x))
```

```
## Unit: microseconds
## expr min lq median uq max neval
## unique(x) 4.830 5.094 5.372 5.654 39.36 100
## unique.default(x) 1.745 2.056 2.202 2.381 14.70 100
```

Other functions often have stripped down versions which can be discovered by looking at the documentation. Some examples are:

generic function | faster versions |
---|---|

`sort()` |
`sort.int()` |

`rep()` |
`rep.int()` , `rep_len()` |

`seq()` , `:` |
`seq_len()` , `seq_along()` |

`t()` |
`t.default()` |

`solve()` |
`solve.default()` |

`unique()` |
`unique.default()` |

`rowSums()` , `colSums()` |
`.rowSums()` , `.colSums()` |

Sometimes even the overhead cost of the correct method is annoyingly high: this usually happens if you have to run a very easy problem a very large number of times (say in the middle of a loop). In such a case it may be beneficial to go straight to the lower level routine which does the job you want. In the case of `sort.int(x)`

, inspection of the code reveals that the function calls `.Internal(qsort(x, decreasing))`

to do the work.

`(x = sample(1:10))`

`## [1] 8 6 5 2 10 4 1 9 3 7`

`.Internal(qsort(x, FALSE))`

`## [1] 1 2 3 4 5 6 7 8 9 10`

It runs orders of magnitude faster than the wrapper function, because it doesn’t check the content of `x`

at all.

```
microbenchmark(sort.int(x),
.Internal(qsort(x, FALSE)),
times=1e5)
```

```
## Unit: nanoseconds
## expr min lq median uq max neval
## sort.int(x) 20415 23221 24064 25003 27517262 1e+05
## .Internal(qsort(x, FALSE)) 154 237 394 468 14182 1e+05
```

Strictly speaking **we shouldn’t write code which calls internal routines**, as they may not behave the way we expect, their functionality may change, and we are skipping the checks on our data that the wrapper functions make for us. For this reason, you can’t submit packages to CRAN which use internal calls. However, sometimes the gains are too good to pass up.

R comes with the package `compiler`

, which allows you to compile functions in order to (hopefully) make them run faster.

```
library(compiler)
sin3 = cmpfun(sin1)
x = rnorm(10)
microbenchmark(sin1(x), sin3(x), times=1000)
```

```
## Unit: microseconds
## expr min lq median uq max neval
## sin1(x) 5.057 5.54 5.766 6.073 15.81 1000
## sin3(x) 3.927 4.30 4.531 4.822 12.04 1000
```

The advantage of this compilation is that it’s very quick: if it doesn’t work, then no harm was done by trying.

We won’t cover this in detail, but many computations can be structured so that separate pieces are dispatched to different processors at the same time (i.e. in parallel). Every instance of an `apply()`

family function is of this kind, for example.

The `parallel`

package in R can do this:

```
library(parallel)
detectCores()
```

`## [1] 8`

So there are up to 8 processors we can use in parallel.

```
x = replicate(1e4, rnorm(10))
microbenchmark(lapply(x, mean),
mclapply(x, mean, mc.cores=8),
times=10)
```

```
## Unit: milliseconds
## expr min lq median uq max neval
## lapply(x, mean) 560.7 569.3 585.7 594.6 658.7 10
## mclapply(x, mean, mc.cores = 8) 199.6 253.3 298.5 371.3 405.0 10
```

The number of cores to be used is specified with `mc.cores`

, which should not exceed the number on your machine. A disadvantage is that performance will vary widely between computers, and the `mclapply()`

function doesn’t work on Windows. This reduces R’s portability, which is one of its attractions.

Certain problems can’t be structured in a parallel way without some loss of efficiency, such as simulating from Markov chains.