Gas consumption data

Example follows Faraway (2015), see also Venables and Ripley (2002).

The dataframe whiteside, in the MASS package, records gas consumption (in thousands of cubic feet) and outside temperature (in degrees Celsius) for two winters. The first winter was before cavity wall insulation was installed, and the second was after.

data(whiteside, package = "MASS")
str(whiteside)
## 'data.frame':    56 obs. of  3 variables:
##  $ Insul: Factor w/ 2 levels "Before","After": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Temp : num  -0.8 -0.7 0.4 2.5 2.9 3.2 3.6 3.9 4.2 4.3 ...
##  $ Gas  : num  7.2 6.9 6.4 6 5.8 5.8 5.6 4.7 5.8 5.2 ...
head(whiteside)
##    Insul Temp Gas
## 1 Before -0.8 7.2
## 2 Before -0.7 6.9
## 3 Before  0.4 6.4
## 4 Before  2.5 6.0
## 5 Before  2.9 5.8
## 6 Before  3.2 5.8
tail(whiteside)
##    Insul Temp Gas
## 51 After  7.2 2.8
## 52 After  7.5 2.6
## 53 After  8.0 2.7
## 54 After  8.7 2.8
## 55 After  8.8 1.3
## 56 After  9.7 1.5
plot(Gas ~ Temp, data = whiteside, main = "Gas consumption vs Outside temperature",
     col = as.numeric(Insul), pch = 16)
legend("topright", c("Before", "After"), col = c(1, 2), pch = 16)

The variable Insul records whether the observation was made Before or After the insulation was installed – it is a categorical variable (a factor) with two levels. To find out which level is the baseline:

levels(whiteside$Insul)
## [1] "Before" "After"

The first of these, i.e. Before, is the baseline. Unless instructed otherwise, R will take the levels of a factor in alphabetical order – so the MASS package has clearly setup this factor in the opposite way.

No interaction term, parallel lines

options(digits = 3)
gas1.lm <- lm(Gas ~ Temp + Insul, data = whiteside)
(b1 <- coef(gas1.lm))
## (Intercept)        Temp  InsulAfter 
##       6.551      -0.337      -1.565
plot(Gas ~ Temp, data = whiteside, col = as.numeric(Insul), pch = 16)
legend("topright", c("Before", "After"), col = c(1, 2), pch = 16)
abline(b1[1], b1[2])
abline(b1[1] + b1[3], b1[2], col = 2)

Interpretations:

  • Both before and after installation, gas consumption falls by 0.337 for each 1 degree C increase in temperature.

  • The interpretation of the other two estimates is a bit problematic since they represent predicted consumption when temperature is zero, and zero is at the edge of the observed temperature range. Ignoring this issue, we could say the predicted consumption at a temperature of zero is 6.551 before installation, and 1.565 less than this after installation, i.e. 4.986 after installation. For other datasets a continuous predictor (such as temperature) value of zero might be far outside the observed range and interpretations like these might be meaningless.

To address the problem in the second bullet point above, we can centre temperature using its mean value and refit the model:

mean(whiteside$Temp)
## [1] 4.88
whiteside$ctemp <- whiteside$Temp - mean(whiteside$Temp)
gas1c.lm <- lm(Gas ~ ctemp + Insul, data = whiteside)
(b1c <- coef(gas1c.lm))
## (Intercept)       ctemp  InsulAfter 
##       4.910      -0.337      -1.565
  • At the average temperature (i.e. at 4.875 degrees C), the predicted consumption is 4.91 before installation, and 1.565 less than this after installation, i.e. 3.345 after installation.

  • The fall in consumption per degree increase in temperature is as before.

Add an interaction term, non-parallel lines

gas2.lm <- lm(Gas ~ Temp * Insul, data = whiteside)
summary(gas2.lm)
## 
## Call:
## lm(formula = Gas ~ Temp * Insul, data = whiteside)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9780 -0.1801  0.0376  0.2093  0.6380 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       6.8538     0.1360   50.41  < 2e-16 ***
## Temp             -0.3932     0.0225  -17.49  < 2e-16 ***
## InsulAfter       -2.1300     0.1801  -11.83  2.3e-16 ***
## Temp:InsulAfter   0.1153     0.0321    3.59  0.00073 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.323 on 52 degrees of freedom
## Multiple R-squared:  0.928,  Adjusted R-squared:  0.924 
## F-statistic:  222 on 3 and 52 DF,  p-value: <2e-16
b2 <- coef(gas2.lm)
plot(Gas ~ Temp, data = whiteside, col = as.numeric(Insul), pch = 16)
legend("topright", c("Before", "After"), col = c(1, 2), pch = 16)
abline(b2[1], b2[2])
abline(b2[1] + b2[3], b2[2] + b2[4], col = 2)

Less gas is used after installation, and the difference Before - After varies with temperature i.e. the gradients of the two lines differ. Note from the summary above that all coefficients are significant, in particular the interaction coefficient is significant – there is evidence that the gradients of the two lines differ. So we prefer this to the parallel lines model above.

We will again give interpretations after centering temperature:

gas2c.lm <- lm(Gas ~ ctemp * Insul, data = whiteside)
(b2c <- coef(gas2c.lm))
##      (Intercept)            ctemp       InsulAfter ctemp:InsulAfter 
##            4.937           -0.393           -1.568            0.115

For each 1 degree C increase in temperature:

  • consumption falls by 0.393 before installation

  • and by 0.393 \(-\) 0.115 \(=\) 0.278 after installation.

And at the average temperature (of 4.875 degrees C), the predicted consumption is 4.937 before installation, and 1.568 less than this after installation, i.e. 3.369 after installation.