A plot of the residuals \(e\) against the fitted values \(\widehat{y}\) allows us to check multiple model assumptions with a single plot. (We plot the points \((\widehat{y}_i,e_i)\) for \(i=1,\dots,n\).)

The residuals \(e\) and the fitted values \(\widehat{y}\) are independent under the model, so a plot of \(e\) against \(\widehat{y}\) should show no correlation.

The variance of each \(y_i\) is assumed to be \(\sigma^2\), so this variance should not change with the value of \(y\) (or with any explanatory \(x\)). So the \(e\) versus \(\widehat{y}\) plot should not show e.g. increasing variance as \(\widehat{y}\) increases.

The \(e\) versus \(\widehat{y}\) plot can also help detect if the relationship between \(y\) and \(x_1,\dots,x_p\) is not the linear function (\(\beta_1x_1+\dots+\beta_px_p\)) that we have assumed it to be.

```
ohp3 <- read.table("../data/ohp3.txt", header = TRUE)
ohp.lm3 <- lm(price ~ month * type, data = ohp3)
plot(resid(ohp.lm3) ~ fitted(ohp.lm3), data = ohp3, col = colour, pch = symbol,
xlab = "Fitted values", ylab = "Residuals")
```

The model `ohp.lm3`

can maybe be simplified (e.g. two of the “non-parallel lines” looked almost identical) but the residual plot above reveals an underlying problem: the residuals show non-constant variance and correlation with \(\widehat{y}\), the variance appears to be increasing as \(\widehat{y}\) increases.

Ignoring correlation often leads to incorrect (over-significant) p-values. This problem with the residuals will not be fixed by simplifying the model.

```
set.seed(123)
x <- runif(20, min = 0, max = 3)
eps <- rnorm(20)
y <- x + x^2 + eps
lmod <- lm(y ~ x)
plot(y ~ x)
abline(lmod)
curve(x + x^2, add = TRUE, lty = 2)
```

The data come from \(y = x + x^2 + \epsilon\) where \(\epsilon\sim N(0, 1)\). We fit a straight-line model \(y = \beta_0 + \beta_1 x\) and look at the residuals. The Resdiuals versus Fitted values plot below reveals the curvature in the data.

`plot(rstudent(lmod) ~ fitted(lmod), xlab = "Fitted values", ylab = "(Studentised) residuals")`

```
library(alr4)
set.seed(321)
(samp <- sample(1:nrow(Heights), size = 50))
```

```
## [1] 1315 1288 328 350 536 468 620 397 617 1102 827 495 1045 62
## [15] 806 275 860 549 395 869 862 1341 1260 657 777 1015 1340 581
## [29] 168 800 284 955 908 1333 889 588 962 35 899 723 1096 369
## [43] 610 1152 1117 456 189 1203 298 235
```

`H50 <- Heights[samp, ]`

So `H50`

contains 50 cases, chosen at random, from the original Heights data.

```
h.lm <- lm(dheight ~ mheight, data = H50)
plot(dheight ~ mheight, data = H50)
abline(h.lm)
```

(Studentised) residuals versus Fitted values, and a Q-Q plot of (Studentised) residuals:

```
par(mfrow = c(1, 2))
plot(rstudent(h.lm) ~ fitted(h.lm), main = "(Studentised) residuals vs Fitted Values",
xlab = "Fitted values", ylab = "(Studentised) residuals")
qqnorm(resid(h.lm), main = "Q-Q plot of (Studentised) residuals")
qqline(resid(h.lm))
```

```
set.seed(123)
par(mfrow = c(2, 2))
n <- 50
x <- rnorm(n); qqnorm(x, main = "Normal data"); qqline(x)
x <- exp(rnorm(n)); qqnorm(x, main = "Lognormal data (skewed)"); qqline(x)
x <- rcauchy(n); qqnorm(x, main = "Cauchy = t(1) data (long-tailed)"); qqline(x)
x <- runif(n); qqnorm(x, main = "Uniform data (short-tailed)"); qqline(x)
```

```
par(mfrow = c(1, 2))
plot(rstandard(ohp.lm3) ~ fitted(ohp.lm3), data = ohp3, col = colour, pch = symbol,
xlab = "Fitted values", ylab = "Standardised residuals")
plot(rstudent(ohp.lm3) ~ fitted(ohp.lm3), data = ohp3, col = colour, pch = symbol,
xlab = "Fitted values", ylab = "Studentised residuals")
```

```
pigs <- data.frame(Gain = c(89, 78, 114, 79, 68, 59, 85, 61, 62, 61, 83, 82),
Litter = rep(c("I", "II", "III", "IV"), 3),
Diet = rep(c("A", "B", "C"), each = 4))
pigs.lm <- lm(Gain ~ Litter + Diet, data = pigs)
par(mfrow = c(1, 2))
plot(rstandard(pigs.lm) ~ fitted(pigs.lm), data = pigs, col = as.numeric(Litter), pch = as.numeric(Diet),
xlab = "Fitted values", ylab = "Standardised residuals")
plot(rstudent(pigs.lm) ~ fitted(pigs.lm), data = pigs, col = as.numeric(Litter), pch = as.numeric(Diet),
xlab = "Fitted values", ylab = "Studentised residuals")
```

```
par(mfrow = c(1, 2))
plot(hatvalues(ohp.lm3), col = ohp3$colour, pch = ohp3$symbol,
main = "ohp", ylab = "Leverage")
plot(hatvalues(pigs.lm), col = as.numeric(pigs$Litter), pch = as.numeric(pigs$Diet),
main = "pigs", ylab = "Leverage")
```

```
par(mfrow = c(1, 2))
plot(cooks.distance(ohp.lm3) ~ month, data = ohp3, col = colour, pch = symbol,
main = "ohp", ylab = "Cook's distance")
n <- nrow(ohp3)
p <- ohp.lm3$rank
abline(h = 8/(n - 2*p))
plot(cooks.distance(pigs.lm), col = as.numeric(pigs$Litter), pch = as.numeric(pigs$Diet),
main = "pigs", ylab = "Cook's distance")
```