Newton’s Algorithm finds the root of a function. Let \(\boldsymbol f(\boldsymbol x)\) be a multivariate function, and define a sequence by \[ x_{k+1} = x_k - \left(\frac{\partial \boldsymbol f(x_k)}{\partial \boldsymbol x}\right)^{-1} f(x_k). \] Under reasonable conditions, if this sequence converges then it converges to a root of \(f\). We can write an R function to take generic vector-valued function and apply a Newton step to it. We will use a numerical approximation for the gradient.
## step()
##' Obtain one step of a Newton algorithm
##'
##' @param FUN function taking single vector argument
##' @param x current value of argument to \code{FUN}
##' @param eps length of step size used for calculating numerical derivative
##'
##' @return numeric vector giving next \code{x} value
step = function(FUN, x, ..., eps = 1e-8) {
val = FUN(x, ...)
deriv = matrix(NA, length(val), length(val))
for (i in 1:length(val)) {
eps_vector = rep(0, length(val))
eps_vector[i] = eps
deriv[,i] = (FUN(x + eps_vector, ...) - FUN(x, ...)) / eps
}
out = -c(solve(deriv) %*% val)
return(out)
}
## newton()
##' Run Newton algorithm to find root of function
##'
##' @param FUN function taking single vector argument
##' @param start starting value of argument to FUN
##' @param tol maximum value of function for convergence
##' @param eps length of step size used for calculating numerical derivative
##'
##' @return numeric vector of point achieved after convergence
##' of the Newton algorithm
newton = function(FUN, start, ..., tol = 1e-8, eps = 1e-8) {
x = start
while (max(abs(FUN(x, ...))) > tol) {
move = step(FUN, x, ..., eps=eps)
x = x + move
}
return(x)
}
The syntax in the comments is chosen carefully to work with the R package Roxygen
for generating documentation, but we don’t need to worry about that yet.
A variation of Newton’s algorithm is used to find the MLEs for generalized linear models (Fisher Scoring). Consider a Poisson GLM with \(Y_i \sim \text{Poisson}(\mu_i)\) with \[ \log \mu_i = \beta_0 + \beta_1 x_{i1} + \cdots \beta_k x_{ik}. \] You can check that the derivative of the log-likelihood for \(\beta_i\) is \[ \dot{l}(\beta_0, \boldsymbol\beta) = X^T Y - X^T \exp(\beta_0 + X \boldsymbol\beta) \] where \(\boldsymbol\beta = (\beta_1, \cdots, \beta_k)^T\). In order to use our Newton’s algorithm to solve the likelihood equations, we can write a function for \(\dot{l}\) whose first argument is the vector \((\beta_0, \boldsymbol\beta)\).
## log-likelihood function for Poisson
Dloglik = function(beta, X, y) {
c(sum(y - exp(beta[1] + X %*% beta[-1])),
t(X) %*% y - t(X) %*% exp(beta[1] + X %*% beta[-1]))
}
Now let’s generate some data: 1,000 observations.
set.seed(94) # reproducible results
X = matrix(rnorm(3000), ncol=3)
y = rpois(1000, lambda = exp(1 - 0.5*X[,1] + 0.2*X[,3]))
Dloglik(c(0,0,0,0),X,y)
## [1] 2169 -1704 44 669
Now let’s run our Newton algorithm:
newton(Dloglik, c(0,0,0,0), X=X, y=y)
## [1] 0.99221 -0.53425 -0.00411 0.17601
We can check the solution with glm()
:
glm(y ~ X, family="poisson")$coef
## (Intercept) X1 X2 X3
## 0.99221 -0.53425 -0.00411 0.17601
The quickest way to check the time taken by some code is to use system.time()
.
system.time(newton(Dloglik, c(0,0,0,0), X=X, y=y))
## user system elapsed
## 0.005 0.001 0.006
The microbenchmark package allows more sophisticated comparisons.
library(microbenchmark)
microbenchmark(
newton(Dloglik, c(0,0,0,0), X=X, y=y),
glm(y ~ X, family="poisson")$coef,
times=10)
## Unit: milliseconds
## expr min lq mean median uq max neval
## newton(Dloglik, c(0, 0, 0, 0), X = X, y = y) 5.59 5.75 7.41 6.7 7.42 12.77 10
## glm(y ~ X, family = "poisson")$coef 2.35 2.40 3.37 3.4 4.08 4.48 10
It seems our solution is a bit slower than using glm()
.
Use Rprof()
to take a quick look at where your code takes time. This records information in a file whose default name is Rprof.out
. It’s good practice to use a temporary file for this instead.
tmp = tempfile() # get a temporary file
Rprof(tmp, interval=0.001)
newton(Dloglik, c(0,0,0,0), X=X, y=y)
Rprof(NULL)
summaryRprof(tmp)
This displays something like
$by.self
self.time self.pct total.time total.pct
"exp" 0.46 46.94 0.46 46.94
"%*%" 0.40 40.82 0.40 40.82
"-" 0.06 6.12 0.06 6.12
"t.default" 0.04 4.08 0.04 4.08
"sum" 0.02 2.04 0.02 2.04
$by.total
total.time total.pct self.time self.pct
"FUN" 0.98 100.00 0.00 0.00
"newton" 0.98 100.00 0.00 0.00
"step" 0.90 91.84 0.00 0.00
"exp" 0.46 46.94 0.46 46.94
"%*%" 0.40 40.82 0.40 40.82
"-" 0.06 6.12 0.06 6.12
"t.default" 0.04 4.08 0.04 4.08
"t" 0.04 4.08 0.00 0.00
"sum" 0.02 2.04 0.02 2.04
$sample.interval
[1] 0.02
$sampling.time
[1] 0.98
Most of the function’s time is spent evaluating FUN
(i.e. Dloglik
), so we should start with that. There are several obvious inefficiencies:
Dloglik2 = function(beta, X, y) {
yxB = y - exp(beta[1] + X %*% beta[-1])
c(sum(yxB), t(X) %*% yxB)
}
## roughly halves the time taken
microbenchmark(newton(Dloglik, c(0,0,0,0), X=X, y=y),
newton(Dloglik2, c(0,0,0,0), X=X, y=y), times=10)
## Unit: milliseconds
## expr min lq mean median uq max neval
## newton(Dloglik, c(0, 0, 0, 0), X = X, y = y) 4.13 4.17 4.53 4.64 4.81 4.90 10
## newton(Dloglik2, c(0, 0, 0, 0), X = X, y = y) 2.22 2.39 3.37 2.72 3.50 7.76 10
Notice also that the function step()
evaluates the function at x
repeatedly, which is unnecessary; in fact this is already calculated by newton()
, so we could use this instead:
step = function(FUN, x, val, ..., eps = 1e-8) {
if(missing(val)) val = FUN(x, ...)
deriv = matrix(NA, length(val), length(val))
for (i in 1:length(val)) {
eps_vector = rep(0, length(val))
eps_vector[i] = eps
deriv[,i] = (FUN(x + eps_vector, ...) - val) / eps
}
out = -c(solve(deriv) %*% val)
return(out)
}
newton = function(FUN, start, ..., tol = 1e-8, eps = 1e-8) {
x = start
val = FUN(x, ...)
while (max(abs(val)) > tol) {
move = step(FUN, x, val=val, ..., eps=eps)
x = x + move
val = FUN(x, ...)
}
return(x)
}
microbenchmark(newton(Dloglik2, c(0,0,0,0), X=X, y=y), times=10)
## Unit: milliseconds
## expr min lq mean median uq max neval
## newton(Dloglik2, c(0, 0, 0, 0), X = X, y = y) 1.19 1.19 2.29 1.24 1.32 11.3 10
We have shaved another third off the time taken!