*p. 18 l.-14*
The unconditional misclassification probability should be
P(\hat c(X) \ne C or \mathcal{D}}.

*p. 143 l.-3*
Clarification: f = 1 means f is the identity function, not the constant.

*p. 176 l.-1*
| tilde f(omega) | : add the absolute value signs.

*p. 248 l.4*
>disjoint< subsets A, B, C

*p. 250 l.-4*
clique that contains >C<,

*p. 266 l.-1*
Clarification: the `subgraph of the ancestors' is the `smallest
ancestral set'.

*p. 354 `uniform convergence'*
Purists who know what this means should read `max' as `sup' (but then
they should know this definition anyway).

*p. 5 l.-6*
>are< misdiagnosed

*p. 20 l.-16*
R(\hat c)

*p. 44 (2.30)*
p(2 | c)

*p. 78 l.23*
`worst' not `worse'

*p. 85 l.19*
eta by hat eta_2, twice

*p. 110 second display*
theta in the denominator should not have a hat

*p. 146 l.-13*
`nor' should be `or'

*p. 177 first display*
replace w by omega in range of the integral

*p. 177 l.12*
delete `r' before C_{f}

*p. 177 l.13*
>a< convex combination

*p. 248 l.-9*
`though' should be `through'

*p. 22*
I confused origins for the vectors of probabilities here. So:

The optimal rule is to allocate to class 1 if X <= 12, class 2 if
13 <= X <= 17 and to class 3 if X >= 18.

With two measurements the rule is to allocate to class 1 if Xbar <= 12,
class 2 if 12.5 <= Xbar <= 17 and to class 3 if Xbar >= 17.5.

The numbers given for class-wise success rates and overall error
rates are correct.

*p. 25 l.13*
Change d p(x) to (1-d)p(x) on the right-hand side.

*p. 100* There are typos in the displays (and an error in
the original reference). In first display n-1 should be n-K and there
is sign error in the second line. In LaTeX:

\begin{eqnarray*} \Delta^2_{jc} &\leftarrow& \Delta^2_{jc} \times \frac{n-K-1}{n-K} \left(\frac{n_c}{n_c-1}\right)^2 \Bigm / \left[1 - \frac{n_c}{(n_c-1)(n-K)}\Delta^2_{jc} \right ]\\ \Delta^2_{jk} &\leftarrow& \Delta^2_{jk} \times \frac{n-K-1}{n-K} \left[ 1 + \frac{\{(\bx_j - \bmu_c)^T\hat\Sigma^{-1}(\bx_j - \bmu_k)\}^2} {\{(n-K)(n_c-1)/n_c - \Delta^2_{jc}\}\Delta^2_{jk}}\right] \end{eqnarray*}

The second display gives the updated value of \hat\Sigma_c. In this and final display n_k should be n_c.

*p. 177 l.3 *
replace pi in 2 pi C_{f} by r.

*p. 225 l.20*
the *largest* value of alpha ... (and hence the smallest tree)

*p. 362 *
Ciampi et al (1987): Recursive *partition*.

*p. 377 *
[Second printing] Mathieson (1996) has page numbers 523-536.

*p. 22 l.5*
delete small ( after large (.

*p. 33 l.-4*
Replace \partial\log p(x;\theta_0)/\partial\theta by
\partial\log p(X;\theta)/\partial\theta.

*p. 35 l.15*
Change X to x to get U(x, \theta) = \partial \log p(x;
\theta)/\partial \theta.

*p. 45 l.2*
The formula for \hat\beta should be \hat\Sigma^{-1} (\hat\mu_k - \hat\mu_1)

*p. 288 caption *
Cushing's

*p. 360*
Bridle (1990a): an editor's name is spelled Fogelman.

The following apply only to the first printing:

*p. iv *
The copyright is owned by the author, not the publisher.

*pp. 7, 357, 391 *
For Angulin read Angluin.

*p. 13*
For Freemantle read Fremantle.

*p. 44 (2.31)*
log has been omitted before the two probabilities p(2 | x_{i}) and
[1 - p(2 | x_{i})].

*p. 59 l.10*
\ell \pi_{d} n_{n} / \pi_{n} n_{d} (interchange d and n).

*p. 63*
For Evans & Swantz read Evans & Swartz.

*p. 83 (2.53) *
The constant is 4 e^{4 epsilon + 4 epsilon^2}.

*p. 87 second display *
The factor 1/4 should be on the first not second term.

*pp. 91, 121 *
Dietterich & Bakiri (1991, 1995), not 1994.

*p. 104 line -11*
(n - g)/n_{t} + 2 log \pi_{t}.

*p. 114 Examples line 2*
on the *seven* explanatory inputs.

*p. 118 *
The convergence time stated for Mansfield's method applies to
binary inputs.

*p. 149 l.-9 *
Minus the log-likelihood ....

*p. 163 line -9 *
`standard deviance' should be `standard deviation'.

*p. 165 second para *
... by simulating **w**.

*p. 166 display *
The exponent is -n_{w}/2.

*p. 192 Proposition 6.1 *
A clearer statement is:

`Then the error rate of the nearest neighbour rule averaged over
training sets converges as the size of the training set increases ...'

*p. 193 line 1 *
This line should read:

Thus E_1 = E e_1(X) where e_1(**x**) = \sum_{i \ne j} p(i | **x**) p(j | **x**)
= 1 - \sum_i p(i | **x**)^2.

*p. 196 lines 11 and 12 *
... for odd $k$ $E_{k-1, \lceil k/2 \rceil} = E'_{k-1} \le E^*$,
which suggests bounding the Bayes risk by the achieved performance of
the $(k-1, \lceil k/2\rceil)$-nn rule.

(Thus we require a strict majority for the rule with even k-1.)

*p. 196 Proposition 6.3 *
It is not made clear how ties are to be handled in the 3-nn rule:
if the neighbours are of three different classes (only possible for
K > 2) `doubt' is declared, so this is a (3,2)-nn rule.

*p. 270 *
Figure 8.8. The link from `d` to `f` has not shown up;
please reinstate it.

*pp. 292, 359, 391*
For Bourland read Bourlard.

*p. 294 line 8*
The suffix of lambda should be j in both sums.

*p. 307 line -6*
The bold **x**'s are the rows of X
viewed as *row* vectors, hence **x**_{r}** x**_{s}^{T} is a scalar.

*p. 320 line -10*
Macnaughton-Smith *et al.* (1964) not 1984.

*p. 336 line 22*
Q(phi, phi') = E[log p(phi, psi | X) | phi', X]: that is phi was
omitted from the right-hand side.

*p. 350*
In the definition of Mahalanobis distance there should be a transpose
on the first (**x** - **y**).

*p. 356*
In Akaike (1985) the second editor is Fienberg.

*p. 358*
The first page of Baum (1988) is 193.

*p. 361*
Carroll & Dickinson (1989) ... using the Radon transform.

*p. 362 *
Clark & Niblett (1989)'s last page is (probably) 283.

*p. 364 *
Dietterich, T. G. & Bakiri, G. (1994) appeared in 1995
in Journal of Artificial Intelligence Research **2**, 263-286.

*p. 365 *
The paper by Evans & Swartz (not Swantz) appeared in volume **10**, 254-272,
in August 1996 but with a 1995 date. The discussion appeared in
volume **11**, 54-64 in February 1997 with a 1996 date.

*p. 365 *
Fagin (1977)'s title is Multivalued dependencies and a new
normal form for relational databases.

*p. 369 *
Hastie, T. & Tibshirani, R. (1996) appeared in volume
**58**, 155-176, with title ending as `Gaussian mixtures'.

*p. 370 *
Hjort, N. L. & Glad, I. K. (1995) appeared in volume
**23**, 882-904.

*p. 370 *
Hjort, N. L. & Jones, M. C. (1995) appeared in 1996 in volume
**24**, 1619-1647.

*p. 371 *
My copy of Hwang et al. (1994b) is undated, but it seems the
report was first distributed in 1993. A version appeared as
`The cascaded correlation learning: A projection pursuit
perspective.' IEEE Transactions on Neural Networks **7**(2),
278-289, March 1996.

*p. 371 *
Intrator & Gold (1993). The correct title is:
Three-dimensional object recognition using an unsupervised BCM
network: the usefulness of distinguishing features.

*p. 374 *
In Kung & Diamantaras, Albuquerque is misspelt.

*p. 375 *
Maass (1995). This is a Jan 1995 technical report, but the collection
in which it appeared is dated Oct 1994! The page numbers are
153-172.

*p. 375 *
Macintyre & Sontag (1993). First page is (probably) 325.

*p. 376*
Macnaughton-Smith *et al.* (1964) not 1984.

*p. 376 *
Mathieson (1995) appeared (very belatedly) in August 1996
with page numbers 523-536 and editors
A.-P. N. Refenes, Y. Abu-Mostafa, J. Moody & A. Weigend.

*p. 377 *
McCulloch & Pitts (1943). Replace `neural' by `nervous'.

*p. 380 *
Quinlan (1979). Replace `classes' by `collections'.

*p. 383, 397 *
D. W. Scott (not D. F.)

*p. 386 *
The details for Tarassenko et al (1995) is
pages 442-447 of IEE Conference Publication 409.

*p. 388 *
Weigend, Rumelhart & Huberman (1990) is probably
Weigend, Huberman & Rumelhart.

*p. 389 *
Widrow & Hoff (1960). First page is 96.

*p. 390 *
The precise reference is: Young, T. Y. & Calvert, T. W.
(1974) *Classification, Estimation and Pattern Recognition*. New York:
American Elsevier.

*p. ix l.-14 *
... or `clone' (delete `to')

*p. 3 l.-4 *
delete `which'

*p. 5 l.-5 *
airliner, not airline.

*p. 27 l.-1 *
parameters *than*

*p. 58 l.-16*
Delete second of `the the'.

*p. 62 l.-8*
... seems not *to* be widely ....

*p. 202 l.-15 *
`minimize'

*p. 228 Examples l.1 *
... Pima Indians have ...

*p. 228 Examples l.8 *
worse *than* the ...

*p. 335 l.6 *
... missing observations of a few ... (insert space).

Last edited on Mon 9 November 1998 by Brian Ripley ripley@stats.ox.ac.uk