Example follows Faraway (2015), see also Venables and Ripley (2002).

The dataframe `whiteside`

, in the `MASS`

package, records gas consumption (in thousands of cubic feet) and outside temperature (in degrees Celsius) for two winters. The first winter was before cavity wall insulation was installed, and the second was after.

```
data(whiteside, package = "MASS")
str(whiteside)
```

```
## 'data.frame': 56 obs. of 3 variables:
## $ Insul: Factor w/ 2 levels "Before","After": 1 1 1 1 1 1 1 1 1 1 ...
## $ Temp : num -0.8 -0.7 0.4 2.5 2.9 3.2 3.6 3.9 4.2 4.3 ...
## $ Gas : num 7.2 6.9 6.4 6 5.8 5.8 5.6 4.7 5.8 5.2 ...
```

`head(whiteside)`

```
## Insul Temp Gas
## 1 Before -0.8 7.2
## 2 Before -0.7 6.9
## 3 Before 0.4 6.4
## 4 Before 2.5 6.0
## 5 Before 2.9 5.8
## 6 Before 3.2 5.8
```

`tail(whiteside)`

```
## Insul Temp Gas
## 51 After 7.2 2.8
## 52 After 7.5 2.6
## 53 After 8.0 2.7
## 54 After 8.7 2.8
## 55 After 8.8 1.3
## 56 After 9.7 1.5
```

```
plot(Gas ~ Temp, data = whiteside, main = "Gas consumption vs Outside temperature",
col = as.numeric(Insul), pch = 16)
legend("topright", c("Before", "After"), col = c(1, 2), pch = 16)
```

The variable `Insul`

records whether the observation was made `Before`

or `After`

the insulation was installed – it is a categorical variable (a factor) with two levels. To find out which level is the baseline:

`levels(whiteside$Insul)`

`## [1] "Before" "After"`

The first of these, i.e. `Before`

, is the baseline. Unless instructed otherwise, R will take the levels of a factor in alphabetical order – so the `MASS`

package has clearly setup this factor in the opposite way.

```
options(digits = 3)
gas1.lm <- lm(Gas ~ Temp + Insul, data = whiteside)
(b1 <- coef(gas1.lm))
```

```
## (Intercept) Temp InsulAfter
## 6.551 -0.337 -1.565
```

```
plot(Gas ~ Temp, data = whiteside, col = as.numeric(Insul), pch = 16)
legend("topright", c("Before", "After"), col = c(1, 2), pch = 16)
abline(b1[1], b1[2])
abline(b1[1] + b1[3], b1[2], col = 2)
```

Interpretations:

Both before and after installation, gas consumption falls by 0.337 for each 1 degree C increase in temperature.

The interpretation of the other two estimates is a bit problematic since they represent predicted consumption when temperature is zero, and zero is at the edge of the observed temperature range. Ignoring this issue, we could say the predicted consumption at a temperature of zero is 6.551 before installation, and 1.565 less than this after installation, i.e. 4.986 after installation. For other datasets a continuous predictor (such as temperature) value of zero might be far outside the observed range and interpretations like these might be meaningless.

To address the problem in the second bullet point above, we can centre temperature using its mean value and refit the model:

`mean(whiteside$Temp)`

`## [1] 4.88`

```
whiteside$ctemp <- whiteside$Temp - mean(whiteside$Temp)
gas1c.lm <- lm(Gas ~ ctemp + Insul, data = whiteside)
(b1c <- coef(gas1c.lm))
```

```
## (Intercept) ctemp InsulAfter
## 4.910 -0.337 -1.565
```

At the average temperature (i.e. at 4.875 degrees C), the predicted consumption is 4.91 before installation, and 1.565 less than this after installation, i.e. 3.345 after installation.

The fall in consumption per degree increase in temperature is as before.

```
gas2.lm <- lm(Gas ~ Temp * Insul, data = whiteside)
summary(gas2.lm)
```

```
##
## Call:
## lm(formula = Gas ~ Temp * Insul, data = whiteside)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9780 -0.1801 0.0376 0.2093 0.6380
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.8538 0.1360 50.41 < 2e-16 ***
## Temp -0.3932 0.0225 -17.49 < 2e-16 ***
## InsulAfter -2.1300 0.1801 -11.83 2.3e-16 ***
## Temp:InsulAfter 0.1153 0.0321 3.59 0.00073 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.323 on 52 degrees of freedom
## Multiple R-squared: 0.928, Adjusted R-squared: 0.924
## F-statistic: 222 on 3 and 52 DF, p-value: <2e-16
```

```
b2 <- coef(gas2.lm)
plot(Gas ~ Temp, data = whiteside, col = as.numeric(Insul), pch = 16)
legend("topright", c("Before", "After"), col = c(1, 2), pch = 16)
abline(b2[1], b2[2])
abline(b2[1] + b2[3], b2[2] + b2[4], col = 2)
```

Less gas is used after installation, and the difference `Before - After`

varies with temperature i.e. the gradients of the two lines differ. Note from the summary above that all coefficients are significant, in particular the interaction coefficient is significant – there is evidence that the gradients of the two lines differ. So we prefer this to the parallel lines model above.

We will again give interpretations after centering temperature:

```
gas2c.lm <- lm(Gas ~ ctemp * Insul, data = whiteside)
(b2c <- coef(gas2c.lm))
```

```
## (Intercept) ctemp InsulAfter ctemp:InsulAfter
## 4.937 -0.393 -1.568 0.115
```

For each 1 degree C increase in temperature:

consumption falls by 0.393 before installation

and by 0.393 \(-\) 0.115 \(=\) 0.278 after installation.

And at the average temperature (of 4.875 degrees C), the predicted consumption is 4.937 before installation, and 1.568 less than this after installation, i.e. 3.369 after installation.