# Introductory examples

### Heights data

Possible model: $$y = \beta_0 + \beta_1 x + \epsilon$$.

library(alr4)
plot(dheight ~ mheight, data = Heights)

### Cars data

Possible model: $$y = \beta_0 + \beta_1 x + \beta_2 x^2 + \epsilon$$.

plot(dist ~ speed, data = cars)

Example from James, Witten, Hastie, Tibshirani (2013).

The data are measurements of

• sales (thousands of units)

• TV, radio and newspaper budgets (thousands of dollars)

for 200 different markets.

We may want to model the product sales as a function of the TV, radio and newspaper advertising budgets.

# data from http://www-bcf.usc.edu/~gareth/ISL/
str(adv)
## 'data.frame':    200 obs. of  4 variables:
##  $TV : num 230.1 44.5 17.2 151.5 180.8 ... ##$ radio    : num  37.8 39.3 45.9 41.3 10.8 48.9 32.8 19.6 2.1 2.6 ...
##  $newspaper: num 69.2 45.1 69.3 58.5 58.4 75 23.5 11.6 1 21.2 ... ##$ sales    : num  22.1 10.4 9.3 18.5 12.9 7.2 11.8 13.2 4.8 10.6 ...
head(adv)
##      TV radio newspaper sales
## 1 230.1  37.8      69.2  22.1
## 2  44.5  39.3      45.1  10.4
## 3  17.2  45.9      69.3   9.3
## 4 151.5  41.3      58.5  18.5
## 5 180.8  10.8      58.4  12.9
## 6   8.7  48.9      75.0   7.2
pairs(adv, upper.panel = NULL)

### Oxford house price data

Is there any difference between the price trends for the three types of houses?

# data, now edited slightly, from http://www.stats.ox.ac.uk/~nicholls/sb1a/
str(ohp3)
## 'data.frame':    300 obs. of  6 variables:
##  $price : int 276 206 246 228 235 238 299 230 252 261 ... ##$ type  : Factor w/ 3 levels "Flat","Semi-Detached",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $month : int 100 99 98 97 96 95 94 93 92 91 ... ##$ sales : int  11 23 23 44 19 23 25 33 34 54 ...
##  $colour: int 3 3 3 3 3 3 3 3 3 3 ... ##$ symbol: int  15 15 15 15 15 15 15 15 15 15 ...
head(ohp3)
##   price type month sales colour symbol
## 1   276 Flat   100    11      3     15
## 2   206 Flat    99    23      3     15
## 3   246 Flat    98    23      3     15
## 4   228 Flat    97    44      3     15
## 5   235 Flat    96    19      3     15
## 6   238 Flat    95    23      3     15
plot(price ~ month, data = ohp3, col = colour, pch = symbol)
legend("topleft", c("Flat", "Semi-Detached", "Terraced"), col = c(3, 1, 2), pch = c(15, 16, 1))

One single line for all of the data:

ohp.lm1 <- lm(price ~ month, data = ohp3)
plot(price ~ month, data = ohp3, col = colour, pch = symbol,
main = "Single line")
legend("topleft", c("Flat", "Semi-Detached", "Terraced"), col = c(3, 1, 2), pch = c(15, 16, 1))
ab1 <- ohp.lm1$coef abline(ab1[1], ab1[2], lwd = 3) Three parallel lines: ohp.lm2 <- lm(price ~ month + type, data = ohp3) plot(price ~ month, data = ohp3, col = colour, pch = symbol, main = "Parallel lines") legend("topleft", c("Flat", "Semi-Detached", "Terraced"), col = c(3, 1, 2), pch = c(15, 16, 1)) ab2 <- ohp.lm2$coef
abline(ab2[1], ab2[2], col = "green")
abline(ab2[1] + ab2[3], ab2[2], col = "black")
abline(ab2[1] + ab2[4], ab2[2], col = "red")

Three non-parallel lines:

ohp.lm3 <- lm(price ~ month * type, data = ohp3)
plot(price ~ month, data = ohp3, col = colour, pch = symbol,
main = "Non-parallel lines")
legend("topleft", c("Flat", "Semi-Detached", "Terraced"), col = c(3, 1, 2), pch = c(15, 16, 1))
ab3 <- ohp.lm3\$coef
abline(ab3[1], ab3[2], col = "green")
abline(ab3[1] + ab3[3], ab3[2] + ab3[5], col = "black")
abline(ab3[1] + ab3[4], ab3[2] + ab3[6], col = "red")

### Pig diet data

32 pigs were divided into eight groups of four, in such a way that the pigs in any one group were expected to gain weight at equal rates if fed in the same way. Four diets were compared by randomly assigning them to pigs, subject to each diet occurring once in each group. (Davison, 2003.)

The average daily weight gains are in the table below.

Is there evidence that the groups differ? Is there evidence that the diets differ?

Diet Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8
Diet I 1.40 1.79 1.72 1.47 1.26 1.28 1.34 1.55
Diet II 1.31 1.30 1.21 1.08 1.45 0.95 1.26 1.14
Diet III 1.40 1.47 1.37 1.15 1.22 1.48 1.31 1.27
Diet IV 1.96 1.77 1.62 1.76 1.88 1.50 1.60 1.49