Reversible Jump

We want to do inference on the following transdimensional problem. For k=1 the “model” is \(\mathcal{N}(0,1)\). For k=2 the “model” is \(\mathcal{N}\left(\begin{pmatrix}0 \\ 0\end{pmatrix}, \begin{pmatrix}1 &0 \\ 0 & 1\end{pmatrix}\right)\). The value of \(\mathbb{P}(k = 1)\) is assumed unknown. We assume that within each model, we can only evaluate the likelihood up to a normalising constant. Hence in model \(1\) we can evaluate \[\pi(\theta \mid k = 1) = \exp\left(-\frac{1}{2}\theta^2\right)\] and in model \(2\) we can evaluate \[\pi(\theta \mid k = 2) = \exp\left(-\frac{1}{2}(\theta_1^2 + \theta_2^2)\right).\] We introduce a random walk Metropolis-Hastings on each of the models. For the first model we have the following target density function.

target_1_slow <- function(x) dnorm(x = x, mean = 0, sd = 1, log = TRUE)
# or a bit faster
target_1 <- function(x) (-0.5 * (x)^2)
# compare execution speed
library(microbenchmark)
microbenchmark(target_1(rnorm(1)), target_1_slow(rnorm(1)), times = 10)
## Unit: microseconds
##                     expr   min    lq   mean median    uq    max neval cld
##       target_1(rnorm(1)) 1.638 1.858 3.9460 2.0715 4.385 12.259    10   a
##  target_1_slow(rnorm(1)) 2.226 2.257 9.3064 2.4675 3.531 67.827    10   a

We can thus implement the following Metropolis-Hastings within the first model.

proposal_1 <- function(x) rnorm(n = 1, mean = x, sd = 1)
# MH kernel
MHstep1 <- function(current) {
    prop <- proposal_1(current)
    prop_logdensity <- target_1(prop)
    current_logdensity <- target_1(current)
    if (log(runif(1)) < (prop_logdensity - current_logdensity)) {
        return(prop)
    } else {
        return(current)
    }
}

For the second model we have the following target density function.

mu <- c(0, 0)
Sigma <- matrix(c(1, 0, 0, 1), nrow = 2)
library(mvtnorm)
target_2_slow <- function(x) dmvnorm(x, mean = mu, sigma = Sigma, log = TRUE)
# or much faster
target_2 <- function(x) (-0.5 * (x[1] - mu[1])^2 - 0.5 * (x[2] - mu[2])^2)
# let's compare execution speed
library(microbenchmark)
microbenchmark(target_2(rnorm(2)), target_2_slow(rnorm(2)), times = 10)
## Unit: microseconds
##                     expr    min     lq     mean  median     uq     max
##       target_2(rnorm(2))  3.246  3.337   9.5316  4.6315  6.134  37.299
##  target_2_slow(rnorm(2)) 66.175 69.372 141.8584 76.7335 95.202 717.455
##  neval cld
##     10   a
##     10   a

We can thus implement the following Metropolis-Hastings within the second model.

proposal_2 <- function(x) rnorm(n = 2, mean = x, sd = 1)
MHstep2 <- function(current) {
    prop <- proposal_2(current)
    prop_logdensity <- target_2(prop)
    current_logdensity <- target_2(current)
    if (log(runif(1)) < (prop_logdensity - current_logdensity)) {
        return(prop)
    } else {
        return(current)
    }
}

We now design “between model” kernels. First from model 1 to model 2. We need to introduce an auxiliary variable for dimension matching. We need to be able to sample from it, and to evaluate its probability density function in order to compute the reversible jump acceptance ratio.

For instance we can sample \(u\sim \varphi\) in \(\mathbb{R}\) for dimension matching, and then take as deterministic mapping \(G_{1\to 2}(\theta, u) = (\theta, u)\). The associated Jacobian is equal to \(1\). For the distribution \(\varphi\), we can introduce a Cauchy distribution (arbitrarily).

# Auxiliary variable for dimension matching.
rauxiliary <- function() rcauchy(1, location = 0, scale = 1)
dauxiliary <- function(x) dcauchy(x, location = 0, scale = 1, log = TRUE)
auxiliary <- list(generate = rauxiliary, logdensity = dauxiliary)
# Reversible Jump kernel for moves from 1 to 2
RJ12step <- function(current, auxiliary) {
    extension <- auxiliary$generate()
    prop <- c(current, extension)
    prop_logdensity <- target_2(prop)
    current_logdensity <- target_1(current) + auxiliary$logdensity(extension)
    accept <- (log(runif(1)) < (prop_logdensity - current_logdensity))
    if (accept) {
        return(list(accept = accept, value = prop))
    } else {
        return(list(accept = accept, value = current))
    }
}

Then from model 2 to model 1. In this case we don’t need to sample additional variables. We can take as deterministic mapping \(G_{2\to 1}(\theta_1, \theta_2) = (\theta_1, \theta_2)\). The associated Jacobian is equal to \(1\).

RJ21step <- function(current, auxiliary) {
    prop <- current[1]
    prop_logdensity <- target_1(prop) + auxiliary$logdensity(current[2])
    current_logdensity <- target_2(current)
    accept <- (log(runif(1)) < (prop_logdensity - current_logdensity))
    if (accept) {
        return(list(accept = accept, value = prop))
    } else {
        return(list(accept = accept, value = current))
    }
}

We can now implement the full RJ algorithm. I make it a function of the auxiliary distribution because I want to try out a few different ones for comparison. The algorithm attempts a “between-model” move with probability “proba_modeljump”, set to \(0.1\). Otherwise the algorithm does a “within-model” move with probability 1 - “proba_modeljump”, that is, a standard Metropolis-Hastings move. I count along the way the number of “between-model” move attempts, and the number of success of those moves, so that I can report the acceptance rate of those attempts.

I also report the proportion of visits in each model, which converges to the ratio of the normalising constants associated to each model. Here this ratio is equal to \[\frac{\int \int \exp\left(-\frac{1}{2}(\theta_1^2 + \theta_2^2)\right) d\theta_1 d\theta_2}{\int \exp\left(-\frac{1}{2}(\theta^2)\right)d\theta} = \frac{2\pi}{\sqrt{2\pi}} = \sqrt{2\pi}.\]

ReversibleJump <- function(niterations, auxiliary) {
    current_model <- 1  # starting value for the model index
    current_theta1 <- 0  # starting value for the parameter
    proba_modeljump <- 0.1  # probability of attempting a model jump
    chain1 <- matrix(NA, ncol = 1, nrow = niterations)  # store the parameters when in model 1
    chain1[1, ] <- current_theta1  # here we start from model 1
    chain2 <- matrix(NA, ncol = 2, nrow = niterations)  # store the parameters when in model 2
    model_indices <- rep(1, niterations)  # store the model indices
    n_modeljumpattempts <- 0  # store the number of attempted model jumps 
    n_modeljumpsuccess <- 0  # store the number of successful model jumps 
    for (t in 2:niterations) {
        if (runif(1) < proba_modeljump) {
            n_modeljumpattempts <- n_modeljumpattempts + 1
            # do a transdimensional move if in model 1, we try to jump to model 2
            if (current_model == 1) {
                result <- RJ12step(current_theta1, auxiliary)
                if (result$accept) {
                  n_modeljumpsuccess <- n_modeljumpsuccess + 1
                  current_model <- 2
                  current_theta2 <- result$value
                  chain2[t, ] <- current_theta2
                } else {
                  current_theta1 <- result$value
                  chain1[t, ] <- current_theta1
                }
            } else {
                # if in model 2, we try to jump to model 1
                result <- RJ21step(current_theta2, auxiliary)
                if (result$accept) {
                  n_modeljumpsuccess <- n_modeljumpsuccess + 1
                  current_model <- 1
                  current_theta1 <- result$value
                  chain1[t, ] <- current_theta1
                } else {
                  current_theta2 <- result$value
                  chain2[t, ] <- current_theta2
                }
            }
        } else {
            # do a move within the current model, that is a standard Metropolis-Hastings
            if (current_model == 1) {
                current_theta1 <- MHstep1(current_theta1)
                chain1[t, ] <- current_theta1
            } else {
                current_theta2 <- MHstep2(current_theta2)
                chain2[t, ] <- current_theta2
            }
        }
        model_indices[t] <- current_model
    }
    number_visits <- as.numeric(tabulate(model_indices))
    cat("The chain spent ", round(number_visits[2]/number_visits[1], 2), "more time in model 2 than in model 1.\n")
    cat("The ratio of normalizing constants is ", sqrt((2 * pi)), ".\n", sep = "")
    cat("Acceptance probability of between-model moves: ", round(100 * n_modeljumpsuccess/n_modeljumpattempts, 
        2), "%. \n", sep = "")
    
    return(list(model_indices = model_indices, chain1 = chain1, chain2 = chain2))
}
niterations <- 10000  # number of iterations to perform
rjresults <- ReversibleJump(niterations, auxiliary)  # launch the algorithm
## The chain spent  2.39 more time in model 2 than in model 1.
## The ratio of normalizing constants is 2.506628.
## Acceptance probability of between-model moves: 43.55%.

We can plot the histogram of the parameters obtained for each model. For the first model we obtain the following histogram, with the true target density overlaid in red.

g <- qplot(x = rjresults$chain1[rjresults$model_indices == 1, 1], geom = "blank")
g <- g + geom_histogram(aes(y = ..density..)) + xlab(expression(theta)) + stat_function(fun = dnorm, 
    colour = "red", size = 2) + xlim(-3, 3)
print(g)

For the second model we obtain the following histograms.

g1 <- qplot(x = rjresults$chain2[rjresults$model_indices == 2, 1], geom = "blank")
g1 <- g1 + geom_histogram(aes(y = ..density..)) + xlab(expression(theta[1])) + 
    stat_function(fun = dnorm, colour = "red", size = 2) + xlim(-3, 3)
g2 <- qplot(x = rjresults$chain2[rjresults$model_indices == 2, 2], geom = "blank")
g2 <- g2 + geom_histogram(aes(y = ..density..)) + xlab(expression(theta[2])) + 
    stat_function(fun = dnorm, colour = "red", size = 2) + xlim(-3, 3)
grid.arrange(g1, g2)

It’s also informative to look at the traceplot of the model index, to make sure that it managed to move easily from \(1\) to \(2\) and back to \(1\). That’s also why I reported the acceptance rate of the “between-model” move attempts.

gk <- qplot(x = 1:niterations, y = rjresults$model_indices, geom = "line") + 
    ylab("k") + xlab("iteration") + scale_y_continuous(breaks = 1:2)
print(gk)

It worked fine with our choice of auxiliary distribution, that was a Cauchy distribution. What if we make a bad choice? For instance using \(\mathcal{N}(5,1)\), we end up proposing points \((\theta, u)\) that are very unlikely for model \(2\), hence the “between-model” move attempts tend to be rejected more often. We can see it by plotting the model index, as follows.

rauxiliary <- function() rnorm(1, mean = 5, sd = 1)
dauxiliary <- function(x) dnorm(x, mean = 5, sd = 1, log = TRUE)
auxiliary <- list(generate = rauxiliary, logdensity = dauxiliary)
rjresults2 <- ReversibleJump(niterations, auxiliary)
## The chain spent  3.99 more time in model 2 than in model 1.
## The ratio of normalizing constants is 2.506628.
## Acceptance probability of between-model moves: 1.47%.
gk2 <- qplot(x = 1:niterations, y = rjresults2$model_indices, geom = "line") + 
    ylab("k") + xlab("iteration") + scale_y_continuous(breaks = 1:2)
print(gk2)