Example from Gelman and Hill (2007) - more info in lecture notes.

The measurement \(y\) is the log of the radon level.

Measurements can be taken in the basement or on the ground floor: \(x=0\) for basement, \(x=1\) for ground floor.

Background radon levels vary from county to county, depending on local geology. It is natural to suppose that this background determines a baseline level \(\alpha_j\) (in county \(j\)) and each house varies around that local mean. Baseline levels are not unrelated – they are assumed to vary around the Minnesota-mean level \(\alpha\).

This leads us to a hierarchical model (lectures) with a random effect on the intercept.

[There is another covariate \(u_j\) giving the (log of the) soil uranium yield in county \(j\). We ignore this.]

```
options(digits = 3)
rad <- read.table("http://www.stats.ox.ac.uk/~laws/LMs/data/radon2.txt", header = TRUE)
rad[c(1:5, 917:919), ]
```

```
## radon floor uranium county
## 1 2.2 1 0.502 1
## 2 2.2 0 0.502 1
## 3 2.9 0 0.502 1
## 4 1.0 0 0.502 1
## 5 3.1 0 0.429 2
## 917 5.0 0 0.914 84
## 918 3.7 0 1.427 85
## 919 2.9 0 1.427 85
```

```
attach(rad)
y <- log (ifelse (radon==0, .1, radon))
n <- length(y)
x <- floor
u <- log(uranium)
```

`summary(rad)`

```
## radon floor uranium county
## Min. : 0.0 Min. :0.000 Min. :0.414 Min. : 1.0
## 1st Qu.: 1.9 1st Qu.:0.000 1st Qu.:0.622 1st Qu.:21.0
## Median : 3.6 Median :0.000 Median :0.908 Median :44.0
## Mean : 4.8 Mean :0.166 Mean :0.934 Mean :43.5
## 3rd Qu.: 6.0 3rd Qu.:0.000 3rd Qu.:1.201 3rd Qu.:70.0
## Max. :48.2 Max. :1.000 Max. :1.696 Max. :85.0
```

`plot(rad)`

```
par(mfrow = c(1, 2))
plot(radon ~ county)
plot(log(radon) ~ county)
```

Fit a hierarchical model with a random effect on the intercept:

`library(nlme)`

`## Warning: package 'nlme' was built under R version 3.4.4`

```
rad.lme1 <- lme(y ~ x, random = ~1 | county)
summary(rad.lme1)
```

```
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## 2179 2199 -1086
##
## Random effects:
## Formula: ~1 | county
## (Intercept) Residual
## StdDev: 0.328 0.756
##
## Fixed effects: y ~ x
## Value Std.Error DF t-value p-value
## (Intercept) 1.462 0.0516 833 28.34 0
## x -0.693 0.0704 833 -9.84 0
## Correlation:
## (Intr)
## x -0.288
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.39892 -0.61553 0.00291 0.64048 3.42808
##
## Number of Observations: 919
## Number of Groups: 85
```

In terms of the model

\[y_{ij}=\alpha+b_{j}+\beta x_{ij} +\epsilon_{ij}, \quad \epsilon_{ij}\sim N(0,\sigma_y^2), \quad b_j\sim N(0,\sigma_{\alpha}^2)\]

our estimates are \(\hat{\sigma}_{\alpha} =\) 0.328, \(\hat{\sigma}_{y} =\) 0.756, \(\hat{\alpha} =\) 1.462, \(\hat{\beta} =\) -0.693

and our “estimates” for the \(J=85\) random effects \(\hat{b}_j\) can be extracted using `random.effects()`

or `ranef()`

`head(ranef(rad.lme1))`

```
## (Intercept)
## 1 -0.2701
## 2 -0.5340
## 3 0.0176
## 4 0.0429
## 5 -0.0154
## 6 0.0186
```

The fitted values \(\hat{y}_{ij} = \hat{\alpha} + \hat{b}_j + \hat{\beta}x_{ij}\) can be obtained using `fitted()`

`head(fitted(rad.lme1))`

```
## 1 1 1 1 2 2
## 0.499 1.192 1.192 1.192 0.928 0.928
```

The default fitting method is REML (as indicated in the output above). We can fit using maximum likelihood instead:

`rad.lme2 <- lme(y ~ x, random = ~1 | county, method = "ML")`

We can compare the model `rad.lme2`

which has random effects for each county, to the simpler model `rad.lm`

which makes no allowance for county (i.e. it assumes all counties are the same, this is the pooled model):

```
rad.lm <- lm(y ~ x)
anova(rad.lme2, rad.lm)
```

```
## Model df AIC BIC logLik Test L.Ratio p-value
## rad.lme2 1 4 2172 2191 -1082
## rad.lm 2 3 2253 2267 -1124 1 vs 2 83.4 <.0001
```

Here the test from `anova()`

is a likelihood ratio test: the model `rad.lm`

is nested within `rad.lme2`

, we are testing \(\sigma^2_\alpha=0\). The test favours the more general model, it rejects the hypothesis \(\sigma^2_\alpha=0\). It compares the likelihood ratio value to an assumed asymptotic \(\chi^2_1\) distribution (the \(\chi^2\) has 1 degree of freedom as the single parameter \(\sigma^2_\alpha\) is held at zero under the null hypothesis). The test is approximate, it requires several assumptions: as well as being asymptotic (i.e. requiring a large sample size), another important assumption is that the parameters under the null are not on the boundary of the parameter space – but here the value \(\sigma^2_\alpha=0\) is on the boundary of the set \([0, \infty)\) of possible \(\sigma^2_\alpha\) values! The boundary effect means that such tests tend to be conservative (Faraway, 2016), i.e. the approximate p-values tend to be larger than they should be, so if the test using the \(\chi^2\) approximation is significant then we can be fairly confident that it is actually significant. Faraway (2016, Chapter 10) assesses the significance of such tests using the parametric bootstrap – so we stop this discussion at this point as the bootstrap doesn’t appear until next term in the Computational Statistics course.

Not surprisingly, in a model with fixed effects for each county (plus a fixed effect for explanatory variable \(x\)), the effect of county is also significant:

```
rad.lm2 <- lm(y ~ x + as.factor(county))
anova(rad.lm2)
```

```
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 48 48.0 83.86 < 2e-16 ***
## as.factor(county) 84 144 1.7 2.99 1.5e-15 ***
## Residuals 833 477 0.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

As for linear models we can look at diagnostic plots that use the residuals \(e = y - \hat{y}\), or standardised versions of \(e\):

```
# "p" below is short for "pearson" and gives standardised residuals,
# without type = "p" we get non-standardised residuals
qqnorm(rad.lme1, ~ resid(., type = "p"))
```

`plot(rad.lme1, resid(., type = "p") ~ fitted(.))`

Also to assess the assumed normality of the random effects:

`qqnorm(rad.lme1, ~ ranef(., type = "p"))`

For a little more on this example you could look at http://www.stats.ox.ac.uk/~nicholls/sb1a/L14-15.R