Radon data

Example from Gelman and Hill (2007) - more info in lecture notes.

The measurement \(y\) is the log of the radon level.

Measurements can be taken in the basement or on the ground floor: \(x=0\) for basement, \(x=1\) for ground floor.

Background radon levels vary from county to county, depending on local geology. It is natural to suppose that this background determines a baseline level \(\alpha_j\) (in county \(j\)) and each house varies around that local mean. Baseline levels are not unrelated – they are assumed to vary around the Minnesota-mean level \(\alpha\).

This leads us to a hierarchical model (lectures) with a random effect on the intercept.

[There is another covariate \(u_j\) giving the (log of the) soil uranium yield in county \(j\). We ignore this.]

options(digits = 3)
rad <- read.table("http://www.stats.ox.ac.uk/~laws/LMs/data/radon2.txt", header = TRUE)
rad[c(1:5, 917:919), ]
##     radon floor uranium county
## 1     2.2     1   0.502      1
## 2     2.2     0   0.502      1
## 3     2.9     0   0.502      1
## 4     1.0     0   0.502      1
## 5     3.1     0   0.429      2
## 917   5.0     0   0.914     84
## 918   3.7     0   1.427     85
## 919   2.9     0   1.427     85
attach(rad)
y <- log (ifelse (radon==0, .1, radon))
n <- length(y)
x <- floor
u <- log(uranium)
summary(rad)
##      radon          floor          uranium          county    
##  Min.   : 0.0   Min.   :0.000   Min.   :0.414   Min.   : 1.0  
##  1st Qu.: 1.9   1st Qu.:0.000   1st Qu.:0.622   1st Qu.:21.0  
##  Median : 3.6   Median :0.000   Median :0.908   Median :44.0  
##  Mean   : 4.8   Mean   :0.166   Mean   :0.934   Mean   :43.5  
##  3rd Qu.: 6.0   3rd Qu.:0.000   3rd Qu.:1.201   3rd Qu.:70.0  
##  Max.   :48.2   Max.   :1.000   Max.   :1.696   Max.   :85.0
plot(rad)

par(mfrow = c(1, 2))
plot(radon ~ county)
plot(log(radon) ~ county)

Fit a hierarchical model with a random effect on the intercept:

library(nlme)
rad.lme1 <- lme(y ~ x, random = ~1 | county)
summary(rad.lme1)
## Linear mixed-effects model fit by REML
##  Data: NULL 
##    AIC  BIC logLik
##   2179 2199  -1086
## 
## Random effects:
##  Formula: ~1 | county
##         (Intercept) Residual
## StdDev:       0.328    0.756
## 
## Fixed effects: y ~ x 
##              Value Std.Error  DF t-value p-value
## (Intercept)  1.462    0.0516 833   28.34       0
## x           -0.693    0.0704 833   -9.84       0
##  Correlation: 
##   (Intr)
## x -0.288
## 
## Standardized Within-Group Residuals:
##      Min       Q1      Med       Q3      Max 
## -4.39892 -0.61553  0.00291  0.64048  3.42808 
## 
## Number of Observations: 919
## Number of Groups: 85

In terms of the model

\[y_{ij}=\alpha+b_{j}+\beta x_{ij} +\epsilon_{ij}, \quad \epsilon_{ij}\sim N(0,\sigma_y^2), \quad b_j\sim N(0,\sigma_{\alpha}^2)\]

our estimates are \(\hat{\sigma}_{\alpha} =\) 0.328, \(\hat{\sigma}_{y} =\) 0.756, \(\hat{\alpha} =\) 1.462, \(\hat{\beta} =\) -0.693

and our “estimates” for the \(J=85\) random effects \(\hat{b}_j\) can be extracted using random.effects() or ranef()

head(ranef(rad.lme1))
##   (Intercept)
## 1     -0.2701
## 2     -0.5340
## 3      0.0176
## 4      0.0429
## 5     -0.0154
## 6      0.0186

The fitted values \(\hat{y}_{ij} = \hat{\alpha} + \hat{b}_j + \hat{\beta}x_{ij}\) can be obtained using fitted()

head(fitted(rad.lme1))
##     1     1     1     1     2     2 
## 0.499 1.192 1.192 1.192 0.928 0.928

The default fitting method is REML (as indicated in the output above). We can fit using maximum likelihood instead:

rad.lme2 <- lme(y ~ x, random = ~1 | county, method = "ML")

We can use rad.lme2 in a likelihood ratio test. We can compare the model rad.lme2 which has random effects for each county, to the model rad.lm2 which has fixed effects for each county:

rad.lm2 <- lm(y ~ x + as.factor(county))
anova(rad.lme2, rad.lm2)
##          Model df  AIC  BIC logLik   Test L.Ratio p-value
## rad.lme2     1  4 2172 2191  -1082                       
## rad.lm2      2 87 2179 2598  -1002 1 vs 2     159  <.0001

AIC favours the model with random effects. The likelihood ratio test rejects the (“simpler”) random effects model. Which model you go with will depend on how you intend to use the model, and wider physical considerations.

For a little more on this example you could look at http://www.stats.ox.ac.uk/~nicholls/sb1a/L14-15.R