# Errata for Pattern Recognition and Neural Networks'

## First, second and third printings

p. 18 l.-14 The unconditional misclassification probability should be P(\hat c(X) \ne C or \mathcal{D}}.

p. 143 l.-3 Clarification: f = 1 means f is the identity function, not the constant.

p. 176 l.-1 | tilde f(omega) | : add the absolute value signs.

p. 248 l.4 >disjoint< subsets A, B, C

p. 250 l.-4 clique that contains >C<,

p. 266 l.-1 Clarification: the subgraph of the ancestors' is the smallest ancestral set'.

p. 354 uniform convergence' Purists who know what this means should read max' as sup' (but then they should know this definition anyway).

### Typos

p. 5 l.-6 >are< misdiagnosed

p. 20 l.-16 R(\hat c)

p. 44 (2.30) p(2 | c)

p. 78 l.23 worst' not worse'

p. 85 l.19 eta by hat eta_2, twice

p. 110 second display theta in the denominator should not have a hat

p. 146 l.-13 nor' should be or'

p. 177 first display replace w by omega in range of the integral

p. 177 l.12 delete r' before Cf

p. 177 l.13 >a< convex combination

p. 248 l.-9 though' should be through'

## First and second printings

p. 22 I confused origins for the vectors of probabilities here. So:
The optimal rule is to allocate to class 1 if X <= 12, class 2 if 13 <= X <= 17 and to class 3 if X >= 18.
With two measurements the rule is to allocate to class 1 if Xbar <= 12, class 2 if 12.5 <= Xbar <= 17 and to class 3 if Xbar >= 17.5.
The numbers given for class-wise success rates and overall error rates are correct.

p. 25 l.13 Change d p(x) to (1-d)p(x) on the right-hand side.

p. 100 There are typos in the displays (and an error in the original reference). In first display n-1 should be n-K and there is sign error in the second line. In LaTeX:

\begin{eqnarray*}
\Delta^2_{jc} &\leftarrow& \Delta^2_{jc} \times
\frac{n-K-1}{n-K} \left(\frac{n_c}{n_c-1}\right)^2 \Bigm /
\left[1 -  \frac{n_c}{(n_c-1)(n-K)}\Delta^2_{jc} \right ]\\

\Delta^2_{jk} &\leftarrow& \Delta^2_{jk} \times \frac{n-K-1}{n-K}
\left[ 1 + \frac{\{(\bx_j - \bmu_c)^T\hat\Sigma^{-1}(\bx_j - \bmu_k)\}^2}
{\{(n-K)(n_c-1)/n_c - \Delta^2_{jc}\}\Delta^2_{jk}}\right]
\end{eqnarray*}


The second display gives the updated value of \hat\Sigma_c. In this and final display n_k should be n_c.

p. 177 l.3 replace pi in 2 pi Cf by r.

p. 225 l.20 the largest value of alpha ... (and hence the smallest tree)

p. 362 Ciampi et al (1987): Recursive partition.

p. 377 [Second printing] Mathieson (1996) has page numbers 523-536.

### Typos

p. 22 l.5 delete small ( after large (.

p. 33 l.-4 Replace \partial\log p(x;\theta_0)/\partial\theta by \partial\log p(X;\theta)/\partial\theta.

p. 35 l.15 Change X to x to get U(x, \theta) = \partial \log p(x; \theta)/\partial \theta.

p. 45 l.2 The formula for \hat\beta should be \hat\Sigma^{-1} (\hat\mu_k - \hat\mu_1)

p. 288 caption Cushing's

p. 360 Bridle (1990a): an editor's name is spelled Fogelman.

## First printing

The following apply only to the first printing:

p. iv The copyright is owned by the author, not the publisher.

pp. 7, 357, 391 For Angulin read Angluin.

p. 44 (2.31) log has been omitted before the two probabilities p(2 | xi) and [1 - p(2 | xi)].

p. 59 l.10 \ell \pid nn / \pin nd (interchange d and n).

p. 63 For Evans & Swantz read Evans & Swartz.

p. 83 (2.53) The constant is 4 e^{4 epsilon + 4 epsilon^2}.

p. 87 second display The factor 1/4 should be on the first not second term.

pp. 91, 121 Dietterich & Bakiri (1991, 1995), not 1994.

p. 104 line -11 (n - g)/nt + 2 log \pit.

p. 114 Examples line 2 on the seven explanatory inputs.

p. 118 The convergence time stated for Mansfield's method applies to binary inputs.

p. 149 l.-9 Minus the log-likelihood ....

p. 163 line -9 standard deviance' should be standard deviation'.

p. 165 second para ... by simulating w.

p. 166 display The exponent is -nw/2.

p. 192 Proposition 6.1 A clearer statement is:
Then the error rate of the nearest neighbour rule averaged over training sets converges as the size of the training set increases ...'

p. 193 line 1 This line should read:
Thus E_1 = E e_1(X) where e_1(x) = \sum_{i \ne j} p(i | x) p(j | x) = 1 - \sum_i p(i | x)^2.

p. 196 lines 11 and 12 ... for odd $k$ $E_{k-1, \lceil k/2 \rceil} = E'_{k-1} \le E^*$, which suggests bounding the Bayes risk by the achieved performance of the $(k-1, \lceil k/2\rceil)$-nn rule.
(Thus we require a strict majority for the rule with even k-1.)

p. 196 Proposition 6.3 It is not made clear how ties are to be handled in the 3-nn rule: if the neighbours are of three different classes (only possible for K > 2) doubt' is declared, so this is a (3,2)-nn rule.

p. 270 Figure 8.8. The link from d to f has not shown up; please reinstate it.

pp. 292, 359, 391 For Bourland read Bourlard.

p. 294 line 8 The suffix of lambda should be j in both sums.

p. 307 line -6 The bold x's are the rows of X viewed as row vectors, hence xr xsT is a scalar.

p. 320 line -10 Macnaughton-Smith et al. (1964) not 1984.

p. 336 line 22 Q(phi, phi') = E[log p(phi, psi | X) | phi', X]: that is phi was omitted from the right-hand side.

p. 350 In the definition of Mahalanobis distance there should be a transpose on the first (x - y).

### References

(probably) means that I have been unable to check the original source.

p. 356 In Akaike (1985) the second editor is Fienberg.

p. 358 The first page of Baum (1988) is 193.

p. 361 Carroll & Dickinson (1989) ... using the Radon transform.

p. 362 Clark & Niblett (1989)'s last page is (probably) 283.

p. 364 Dietterich, T. G. & Bakiri, G. (1994) appeared in 1995 in Journal of Artificial Intelligence Research 2, 263-286.

p. 365 The paper by Evans & Swartz (not Swantz) appeared in volume 10, 254-272, in August 1996 but with a 1995 date. The discussion appeared in volume 11, 54-64 in February 1997 with a 1996 date.

p. 365 Fagin (1977)'s title is Multivalued dependencies and a new normal form for relational databases.

p. 369 Hastie, T. & Tibshirani, R. (1996) appeared in volume 58, 155-176, with title ending as Gaussian mixtures'.

p. 370 Hjort, N. L. & Glad, I. K. (1995) appeared in volume 23, 882-904.

p. 370 Hjort, N. L. & Jones, M. C. (1995) appeared in 1996 in volume 24, 1619-1647.

p. 371 My copy of Hwang et al. (1994b) is undated, but it seems the report was first distributed in 1993. A version appeared as The cascaded correlation learning: A projection pursuit perspective.' IEEE Transactions on Neural Networks 7(2), 278-289, March 1996.

p. 371 Intrator & Gold (1993). The correct title is: Three-dimensional object recognition using an unsupervised BCM network: the usefulness of distinguishing features.

p. 374 In Kung & Diamantaras, Albuquerque is misspelt.

p. 375 Maass (1995). This is a Jan 1995 technical report, but the collection in which it appeared is dated Oct 1994! The page numbers are 153-172.

p. 375 Macintyre &amp; Sontag (1993). First page is (probably) 325.

p. 376 Macnaughton-Smith et al. (1964) not 1984.

p. 376 Mathieson (1995) appeared (very belatedly) in August 1996 with page numbers 523-536 and editors A.-P. N. Refenes, Y. Abu-Mostafa, J. Moody & A. Weigend.

p. 377 McCulloch & Pitts (1943). Replace neural' by nervous'.

p. 380 Quinlan (1979). Replace classes' by collections'.

p. 383, 397 D. W. Scott (not D. F.)

p. 386 The details for Tarassenko et al (1995) is pages 442-447 of IEE Conference Publication 409.

p. 388 Weigend, Rumelhart & Huberman (1990) is probably Weigend, Huberman & Rumelhart.

p. 389 Widrow & Hoff (1960). First page is 96.

p. 390 The precise reference is: Young, T. Y. & Calvert, T. W. (1974) Classification, Estimation and Pattern Recognition. New York: American Elsevier.

### Typos

p. ix l.-14 ... or clone' (delete to')

p. 3 l.-4 delete which'

p. 5 l.-5 airliner, not airline.

p. 27 l.-1 parameters than

p. 58 l.-16 Delete second of the the'.

p. 62 l.-8 ... seems not to be widely ....

p. 202 l.-15 minimize'

p. 228 Examples l.1 ... Pima Indians have ...

p. 228 Examples l.8 worse than the ...

p. 335 l.6 ... missing observations of a few ... (insert space).

Last edited on Mon 9 November 1998 by Brian Ripley ripley@stats.ox.ac.uk