Newton’s Algorithm finds the root of a function. Let $$\boldsymbol f(\boldsymbol x)$$ be a multivariate function, and define a sequence by $x_{k+1} = x_k - \left(\frac{\partial \boldsymbol f(x_k)}{\partial \boldsymbol x}\right)^{-1} f(x_k).$ Under reasonable conditions, if this sequence converges then it converges to a root of $$f$$. We can write an R function to take generic vector-valued function and apply a Newton step to it. We will use a numerical approximation for the gradient.

## step()
##' Obtain one step of a Newton algorithm
##'
##' @param FUN    function taking single vector argument
##' @param x      current value of argument to \code{FUN}
##' @param eps    length of step size used for calculating numerical derivative
##'
##' @return numeric vector giving next \code{x} value
step = function(FUN, x, ..., eps = 1e-8) {
val = FUN(x, ...)
deriv = matrix(NA, length(val), length(val))

for (i in 1:length(val)) {
eps_vector = rep(0, length(val))
eps_vector[i] = eps

deriv[,i] = (FUN(x + eps_vector, ...) - FUN(x, ...)) / eps
}
out = -c(solve(deriv) %*% val)
return(out)
}

## newton()
##' Run Newton algorithm to find root of function
##'
##' @param FUN    function taking single vector argument
##' @param start  starting value of argument to FUN
##' @param tol    maximum value of function for convergence
##' @param eps    length of step size used for calculating numerical derivative
##'
##' @return numeric vector of point achieved after convergence
##' of the Newton algorithm
newton = function(FUN, start, ..., tol = 1e-8, eps = 1e-8) {
x = start

while (max(abs(FUN(x, ...))) > tol) {
move = step(FUN, x, ..., eps=eps)
x = x + move
}

return(x)
}

The syntax in the comments is chosen carefully to work with the R package Roxygen for generating documentation, but we don’t need to worry about that yet.

A variation of Newton’s algorithm is used to find the MLEs for generalized linear models (Fisher Scoring). Consider a Poisson GLM with $$Y_i \sim \text{Poisson}(\mu_i)$$ with $\log \mu_i = \beta_0 + \beta_1 x_{i1} + \cdots \beta_k x_{ik}.$ You can check that the derivative of the log-likelihood for $$\beta_i$$ is $\dot{l}(\beta_0, \boldsymbol\beta) = X^T Y - X^T \exp(\beta_0 + X \boldsymbol\beta)$ where $$\boldsymbol\beta = (\beta_1, \cdots, \beta_k)^T$$. In order to use our Newton’s algorithm to solve the likelihood equations, we can write a function for $$\dot{l}$$ whose first argument is the vector $$(\beta_0, \boldsymbol\beta)$$.

## log-likelihood function for Poisson
Dloglik = function(beta, X, y) {
c(sum(y - exp(beta[1] + X %*% beta[-1])),
t(X) %*% y - t(X) %*% exp(beta[1] + X %*% beta[-1]))
}

Now let’s generate some data: 1,000 observations.

set.seed(94) # reproducible results
X = matrix(rnorm(3000), ncol=3)
y = rpois(1000, lambda = exp(1 - 0.5*X[,1] + 0.2*X[,3]))

Dloglik(c(0,0,0,0),X,y)
## [1]  2169 -1704    44   669

Now let’s run our Newton algorithm:

newton(Dloglik, c(0,0,0,0), X=X, y=y)
## [1]  0.99221 -0.53425 -0.00411  0.17601

We can check the solution with glm():

glm(y ~ X, family="poisson")$coef ## (Intercept) X1 X2 X3 ## 0.99221 -0.53425 -0.00411 0.17601 #### How quick is our function? The quickest way to check the time taken by some code is to use system.time(). system.time(newton(Dloglik, c(0,0,0,0), X=X, y=y)) ## user system elapsed ## 0.005 0.001 0.006 The microbenchmark package allows more sophisticated comparisons. library(microbenchmark) microbenchmark( newton(Dloglik, c(0,0,0,0), X=X, y=y), glm(y ~ X, family="poisson")$coef,
times=10)
## Unit: milliseconds
##                                          expr  min   lq mean median   uq   max neval
##  newton(Dloglik, c(0, 0, 0, 0), X = X, y = y) 5.59 5.75 7.41    6.7 7.42 12.77    10
##           glm(y ~ X, family = "poisson")$coef 2.35 2.40 3.37 3.4 4.08 4.48 10 It seems our solution is a bit slower than using glm(). #### What’s Slowing it Down? Use Rprof() to take a quick look at where your code takes time. This records information in a file whose default name is Rprof.out. It’s good practice to use a temporary file for this instead. tmp = tempfile() # get a temporary file Rprof(tmp, interval=0.001) newton(Dloglik, c(0,0,0,0), X=X, y=y) Rprof(NULL) summaryRprof(tmp) This displays something like $by.self
self.time self.pct total.time total.pct
"exp"            0.46    46.94       0.46     46.94
"%*%"            0.40    40.82       0.40     40.82
"-"              0.06     6.12       0.06      6.12
"t.default"      0.04     4.08       0.04      4.08
"sum"            0.02     2.04       0.02      2.04

$by.total total.time total.pct self.time self.pct "FUN" 0.98 100.00 0.00 0.00 "newton" 0.98 100.00 0.00 0.00 "step" 0.90 91.84 0.00 0.00 "exp" 0.46 46.94 0.46 46.94 "%*%" 0.40 40.82 0.40 40.82 "-" 0.06 6.12 0.06 6.12 "t.default" 0.04 4.08 0.04 4.08 "t" 0.04 4.08 0.00 0.00 "sum" 0.02 2.04 0.02 2.04$sample.interval
[1] 0.02

\$sampling.time
[1] 0.98

Most of the function’s time is spent evaluating FUN (i.e. Dloglik), so we should start with that. There are several obvious inefficiencies:

Dloglik2 = function(beta, X, y) {
yxB = y - exp(beta[1] + X %*% beta[-1])
c(sum(yxB), t(X) %*% yxB)
}
## roughly halves the time taken
microbenchmark(newton(Dloglik, c(0,0,0,0), X=X, y=y),
newton(Dloglik2, c(0,0,0,0), X=X, y=y), times=10)
## Unit: milliseconds
##                                           expr  min   lq mean median   uq  max neval
##   newton(Dloglik, c(0, 0, 0, 0), X = X, y = y) 4.13 4.17 4.53   4.64 4.81 4.90    10
##  newton(Dloglik2, c(0, 0, 0, 0), X = X, y = y) 2.22 2.39 3.37   2.72 3.50 7.76    10

Notice also that the function step() evaluates the function at x repeatedly, which is unnecessary; in fact this is already calculated by newton(), so we could use this instead:

step = function(FUN, x, val, ..., eps = 1e-8) {
if(missing(val)) val = FUN(x, ...)
deriv = matrix(NA, length(val), length(val))

for (i in 1:length(val)) {
eps_vector = rep(0, length(val))
eps_vector[i] = eps

deriv[,i] = (FUN(x + eps_vector, ...) - val) / eps
}
out = -c(solve(deriv) %*% val)
return(out)
}

newton = function(FUN, start, ..., tol = 1e-8, eps = 1e-8) {
x = start
val = FUN(x, ...)

while (max(abs(val)) > tol) {
move = step(FUN, x, val=val, ..., eps=eps)
x = x + move
val = FUN(x, ...)
}

return(x)
}

microbenchmark(newton(Dloglik2, c(0,0,0,0), X=X, y=y), times=10)
## Unit: milliseconds
##                                           expr  min   lq mean median   uq  max neval
##  newton(Dloglik2, c(0, 0, 0, 0), X = X, y = y) 1.19 1.19 2.29   1.24 1.32 11.3    10

We have shaved another third off the time taken!