The putting data below records the fraction of successful putts as a function of distance in feet. Gelman and Nolan (2001) model these data, see http://www.stat.columbia.edu/~gelman/research/published/golf.pdf

```
putts <- data.frame(Dist = 2:20,
Prop = c(0.93, 0.83, 0.74, 0.59, 0.55, 0.53, 0.46, 0.32, 0.34, 0.32,
0.26, 0.24, 0.31, 0.17, 0.13, 0.16, 0.17, 0.14, 0.16))
head(putts)
```

```
## Dist Prop
## 1 2 0.93
## 2 3 0.83
## 3 4 0.74
## 4 5 0.59
## 5 6 0.55
## 6 7 0.53
```

`plot(Prop ~ Dist, data = putts)`

We often transform proportion data as \(\log(p/(1-p))\) since this (the log-odds) is the canonical link function for a Bernoulli r.v. (see GLMs). It is a monotone map from \(p\in (0,1)\) to \((-\infty,\infty)\). In this case the odds of failure \((1-p)/p\) is the natural object (it increases with distance). A log turns out to be the wrong transformation to get linear dependence on distance. Using Box-Cox we find a response which is linear in distance.

```
putts$y <- (1 - putts$Prop)/putts$Prop
par(mfrow = c(1, 2))
plot(y ~ Dist, data = putts)
plot(log(y) ~ Dist, data = putts)
```

We can estimate the best value of \(\lambda\) by maximising a likelihood using the `boxcox()`

function in the `MASS`

package:

```
options(digits = 4)
library(MASS)
putts.bc <- boxcox(y ~ Dist, data = putts)
```

`putts.bc$y[60:65]`

`## [1] -3.842 -3.661 -3.658 -3.820 -4.132 -4.573`

`putts.bc$x[60:65]`

`## [1] 0.3838 0.4242 0.4646 0.5051 0.5455 0.5859`

From above we see that the MLE is at around 0.46 but the confidence interval covers \(\lambda = 0.5\) which is easier to interpret.

We transform the data using \(\lambda = 0.5\). So we fit the model

\[\sqrt{y} = \beta_1 + \beta_2 x + \epsilon\]

where \(y = (1-p)/p\) as above and \(x\) is distance.

```
options(digits = 3)
putts.lm <- lm(sqrt(y) ~ Dist, data = putts)
confint(putts.lm)
```

```
## 2.5 % 97.5 %
## (Intercept) -0.0637 0.351
## Dist 0.1061 0.140
```

```
plot(sqrt(y) ~ Dist, data = putts)
abline(putts.lm)
```

Note that \(\beta_1\) is not significant, the confidence interval for \(\beta_1\) includes zero. Enforcing \(\beta_1=0\) is natural on physical grounds also, as the odds of failure should go to zero for very short putts. We conclude that the odds of putt-failure increase as the square of the distance. (The re-fitted model with \(\beta_1=0\) enforced is not shown here.)