Example from Gelman and Hill (2007) - more info in lecture notes.

The measurement \(y\) is the log of the radon level.

Measurements can be taken in the basement or on the ground floor: \(x=0\) for basement, \(x=1\) for ground floor.

Background radon levels vary from county to county, depending on local geology. It is natural to suppose that this background determines a baseline level \(\alpha_j\) (in county \(j\)) and each house varies around that local mean. Baseline levels are not unrelated – they are assumed to vary around the Minnesota-mean level \(\alpha\).

This leads us to a hierarchical model (lectures) with a random effect on the intercept.

[There is another covariate \(u_j\) giving the (log of the) soil uranium yield in county \(j\). We ignore this.]

```
options(digits = 3)
rad <- read.table("http://www.stats.ox.ac.uk/~laws/LMs/data/radon2.txt", header = TRUE)
rad[c(1:5, 917:919), ]
```

```
## radon floor uranium county
## 1 2.2 1 0.502 1
## 2 2.2 0 0.502 1
## 3 2.9 0 0.502 1
## 4 1.0 0 0.502 1
## 5 3.1 0 0.429 2
## 917 5.0 0 0.914 84
## 918 3.7 0 1.427 85
## 919 2.9 0 1.427 85
```

```
attach(rad)
y <- log (ifelse (radon==0, .1, radon))
n <- length(y)
x <- floor
u <- log(uranium)
```

`summary(rad)`

```
## radon floor uranium county
## Min. : 0.0 Min. :0.000 Min. :0.414 Min. : 1.0
## 1st Qu.: 1.9 1st Qu.:0.000 1st Qu.:0.622 1st Qu.:21.0
## Median : 3.6 Median :0.000 Median :0.908 Median :44.0
## Mean : 4.8 Mean :0.166 Mean :0.934 Mean :43.5
## 3rd Qu.: 6.0 3rd Qu.:0.000 3rd Qu.:1.201 3rd Qu.:70.0
## Max. :48.2 Max. :1.000 Max. :1.696 Max. :85.0
```

`plot(rad)`

```
par(mfrow = c(1, 2))
plot(radon ~ county)
plot(log(radon) ~ county)
```

Fit a hierarchical model with a random effect on the intercept:

```
library(nlme)
rad.lme1 <- lme(y ~ x, random = ~1 | county)
summary(rad.lme1)
```

```
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## 2179 2199 -1086
##
## Random effects:
## Formula: ~1 | county
## (Intercept) Residual
## StdDev: 0.328 0.756
##
## Fixed effects: y ~ x
## Value Std.Error DF t-value p-value
## (Intercept) 1.462 0.0516 833 28.34 0
## x -0.693 0.0704 833 -9.84 0
## Correlation:
## (Intr)
## x -0.288
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.39892 -0.61553 0.00291 0.64048 3.42808
##
## Number of Observations: 919
## Number of Groups: 85
```

In terms of the model

\[y_{ij}=\alpha+b_{j}+\beta x_{ij} +\epsilon_{ij}, \quad \epsilon_{ij}\sim N(0,\sigma_y^2), \quad b_j\sim N(0,\sigma_{\alpha}^2)\]

our estimates are \(\hat{\sigma}_{\alpha} =\) 0.328, \(\hat{\sigma}_{y} =\) 0.756, \(\hat{\alpha} =\) 1.462, \(\hat{\beta} =\) -0.693

and our “estimates” for the \(J=85\) random effects \(\hat{b}_j\) can be extracted using `random.effects()`

or `ranef()`

`head(ranef(rad.lme1))`

```
## (Intercept)
## 1 -0.2701
## 2 -0.5340
## 3 0.0176
## 4 0.0429
## 5 -0.0154
## 6 0.0186
```

The fitted values \(\hat{y}_{ij} = \hat{\alpha} + \hat{b}_j + \hat{\beta}x_{ij}\) can be obtained using `fitted()`

`head(fitted(rad.lme1))`

```
## 1 1 1 1 2 2
## 0.499 1.192 1.192 1.192 0.928 0.928
```

The default fitting method is REML (as indicated in the output above). We can fit using maximum likelihood instead:

`rad.lme2 <- lme(y ~ x, random = ~1 | county, method = "ML")`

We can use `rad.lme2`

in a likelihood ratio test. We can compare the model `rad.lme2`

which has random effects for each county, to the model `rad.lm2`

which has fixed effects for each county:

```
rad.lm2 <- lm(y ~ x + as.factor(county))
anova(rad.lme2, rad.lm2)
```

```
## Model df AIC BIC logLik Test L.Ratio p-value
## rad.lme2 1 4 2172 2191 -1082
## rad.lm2 2 87 2179 2598 -1002 1 vs 2 159 <.0001
```

AIC favours the model with random effects. The likelihood ratio test rejects the (“simpler”) random effects model. Which model you go with will depend on how you intend to use the model, and wider physical considerations.

For a little more on this example you could look at http://www.stats.ox.ac.uk/~nicholls/sb1a/L14-15.R