Twelve piglets were fed one of three different diets (A, B and C). Four litters (I, II, III and IV), each of 3 piglets, were used with each diet being allocated at random to one piglet in each litter. (Lunn, 2007.)

The question of interest: is there evidence that the diets differ?

The litters are the *blocks*.

The diets are the *treatments*.

And because the three diets are randomly allocated within each block, the experiment is referred to as a *randomised block design*. The design is complete and balanced (every treatment occurs at least once, an equal number of times in fact, in every block).

```
pigs <- data.frame(Gain = c(89, 78, 114, 79, 68, 59, 85, 61, 62, 61, 83, 82),
Litter = rep(c("I", "II", "III", "IV"), 3),
Diet = rep(c("A", "B", "C"), each = 4))
pigs
```

```
## Gain Litter Diet
## 1 89 I A
## 2 78 II A
## 3 114 III A
## 4 79 IV A
## 5 68 I B
## 6 59 II B
## 7 85 III B
## 8 61 IV B
## 9 62 I C
## 10 61 II C
## 11 83 III C
## 12 82 IV C
```

`str(pigs)`

```
## 'data.frame': 12 obs. of 3 variables:
## $ Gain : num 89 78 114 79 68 59 85 61 62 61 ...
## $ Litter: Factor w/ 4 levels "I","II","III",..: 1 2 3 4 1 2 3 4 1 2 ...
## $ Diet : Factor w/ 3 levels "A","B","C": 1 1 1 1 2 2 2 2 3 3 ...
```

Note that Litter I is the baseline litter, and Diet A is the baseline diet.

```
plot(Gain ~ as.numeric((Diet)), data = pigs, xaxt = "n", xlab = "Diet",
col = as.numeric(Litter), pch = as.numeric(Diet))
axis(1, at = c(1, 2, 3), label = c("A", "B", "C"))
legend("topright", c("Litter I", "Litter II", "Litter III", "Litter IV"),
col = as.numeric(pigs$Litter), pch = as.numeric(pigs$Diet))
```

Suppose we index observations by \((i, j)\) where \(i\) is the litter (\(i=1,\dots,4\)) and \(j\) is the diet (\(j=1,2,3\)). Then the model with

an intercept \(\alpha\)

main effects \(\gamma_i\) for each litter, with \(\gamma_1=0\) as the first litter (Litter I) is the baseline

main effects \(\tau_j\) for each diet, with \(\tau_1=0\) as the first diet (Diet A) is the baseline

is:

\[y_{ij} = \alpha + \gamma_i + \tau_j + \epsilon_{ij}\]

for \(i=1,\dots,4\) and \(j=1,2,3\). We are looking for an effect (a mean shift in weight gain) due to diet, and we want to allow for variation (another mean shift) due to litter.

```
pigs.lm <- lm(Gain ~ Litter + Diet, data = pigs)
(X <- model.matrix(Gain ~ Litter + Diet, data = pigs))
```

```
## (Intercept) LitterII LitterIII LitterIV DietB DietC
## 1 1 0 0 0 0 0
## 2 1 1 0 0 0 0
## 3 1 0 1 0 0 0
## 4 1 0 0 1 0 0
## 5 1 0 0 0 1 0
## 6 1 1 0 0 1 0
## 7 1 0 1 0 1 0
## 8 1 0 0 1 1 0
## 9 1 0 0 0 0 1
## 10 1 1 0 0 0 1
## 11 1 0 1 0 0 1
## 12 1 0 0 1 0 1
## attr(,"assign")
## [1] 0 1 1 1 2 2
## attr(,"contrasts")
## attr(,"contrasts")$Litter
## [1] "contr.treatment"
##
## attr(,"contrasts")$Diet
## [1] "contr.treatment"
```

```
options(digits = 3)
summary(pigs.lm)
```

```
##
## Call:
## lm(formula = Gain ~ Litter + Diet, data = pigs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.250 -4.937 -0.375 2.938 12.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 86.25 5.76 14.97 5.6e-06 ***
## LitterII -7.00 6.65 -1.05 0.3333
## LitterIII 21.00 6.65 3.16 0.0197 *
## LitterIV 1.00 6.65 0.15 0.8855
## DietB -21.75 5.76 -3.77 0.0092 **
## DietC -18.00 5.76 -3.12 0.0205 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.15 on 6 degrees of freedom
## Multiple R-squared: 0.857, Adjusted R-squared: 0.738
## F-statistic: 7.18 on 5 and 6 DF, p-value: 0.0162
```

It is clear from the summary that Litter I and Diet A have been taken as the baseline levels for the two categorical variables.

Is `Diet`

predictive for `Gain`

? For this we want to test \(\tau_2 = \tau_3 = 0\).

`anova(pigs.lm)`

```
## Analysis of Variance Table
##
## Response: Gain
## Df Sum Sq Mean Sq F value Pr(>F)
## Litter 3 1304 435 6.55 0.025 *
## Diet 2 1082 541 8.14 0.020 *
## Residuals 6 399 66
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

So, yes, diet is predictive at 5% significance.

The above analysis is a *two-way analysis of variance* (two-way ANOVA, there are two factors).

The following shows the *orthogonality* of this balanced design:

```
# %*% is matrix multiplication
# t(X) %*% X is X^T X
solve(t(X) %*% X) # the inverse of X^T X
```

```
## (Intercept) LitterII LitterIII LitterIV DietB DietC
## (Intercept) 0.500 -0.333 -0.333 -0.333 -0.25 -0.25
## LitterII -0.333 0.667 0.333 0.333 0.00 0.00
## LitterIII -0.333 0.333 0.667 0.333 0.00 0.00
## LitterIV -0.333 0.333 0.333 0.667 0.00 0.00
## DietB -0.250 0.000 0.000 0.000 0.50 0.25
## DietC -0.250 0.000 0.000 0.000 0.25 0.50
```

Note the zeros in the DietX-LitterY entries of this matrix. Recall that \(\text{var}(\widehat\beta) = \sigma^2 (X^T X)^{-1}\), so the zeros show that \((\widehat\gamma_2, \widehat\gamma_3, \widehat\gamma_4)\) and \((\widehat\tau_2, \widehat\tau_3)\) are independent – blocks and treatments are orthogonal.

A consequence of this orthogonality is that the following two ANOVAs are essentially the same even though the sequence of models fitted is different – the first two rows are just swapped, because of orthogonality.

`anova(lm(Gain ~ Litter + Diet, data = pigs)) # the same as done above`

```
## Analysis of Variance Table
##
## Response: Gain
## Df Sum Sq Mean Sq F value Pr(>F)
## Litter 3 1304 435 6.55 0.025 *
## Diet 2 1082 541 8.14 0.020 *
## Residuals 6 399 66
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`anova(lm(Gain ~ Diet + Litter, data = pigs)) # Diet and Litter now in the opposite order`

```
## Analysis of Variance Table
##
## Response: Gain
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 2 1082 541 8.14 0.020 *
## Litter 3 1304 435 6.55 0.025 *
## Residuals 6 399 66
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Treatment effects are separated from block effects in this orthogonal/complete design. The treatment effects (the Sum Sq for treatments) are not changed by having the block effects in the model. But the block effects do contribute to the estimate \(s^2\) of the error variance.

Suppose we had no records of which piglets were from which litter. Or suppose we had the litter information but we ignored it.

`anova(lm(Gain ~ Diet, data = pigs))`

```
## Analysis of Variance Table
##
## Response: Gain
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 2 1082 541 2.86 0.11
## Residuals 9 1703 189
```

This is fitting the model:

\[y_{ij} = \alpha + \tau_j + \epsilon_{ij}\] with \(\tau_1=0\). This is a *one-way analysis of variance* (there is one factor, just the diet variable).

In this analysis, diet is not significant. This is the “wrong analysis” in this case – Litter has been omitted when it should have been included. The estimate of \(\sigma^2\) is much higher in this case (189, was 66 previously). This feeds into the denominator of the F value, the resulting F value is small and not significant. We have “forgotten” to allow for a blocking effect (the different Litters) and this has made a critical difference.