Note that prior to 2005-6 there was an 8-hour course in Multivariate Analysis,, but much of the material has been moved to Statistical Data Mining or Further Statistical Methods. The purpose of this module is to give an overview and to relate concepts found in those two courses.
What `multivariate analysis' is (and is not).
Graphical methods. Brush and Spin, Projection pursuit.
Principal component and factor analysis.
[Factor analysis is covered in Further Statistical Methods, and
PCA in the optional Statistical Data Mining.
This lecture will go over PCA from several viewpoints and explain why it is
frequently confused with factor analysis.]
Discrete methods, including correspondence analysis.
`Finding Needles in Haystacks: Tools for Finding
Structure in Large Datasets'
slides for visualization.
`Principal Component Analysis and Factor Analysis'
`SVD, PCA and Metric Scaling' A very mathematical account of the underlying theory.
`Discrete Multivariate Analysis' Correspondence analysis.
`Visualization --- Crop Viruses' Data on 61 viruses, data frame virus.
`University LeagueTables' Datasets Times, ft and tfl.
All these datasets are contained in the file mult.RData. This is an R save file, and you can use load on it, or drag-and-drop it onto an R console window.
We will be making use of GGobi. You can download GGobi from here.
Some of the demos are worth viewing if you have QuickTime installed. In particular those for tours and brushing part 2.
Here are some notes on how to manipulate the GGvis plugin inside GGobi.
GGobi can be driven from R via package rggobi, which you can install like any other R package. You will also want package DescribeDisplay to print out plots. If you install these from the menus you will get all the dependencies: if doing this manually you need the package RGtk2. Note that RGTK2 has lots of small HTML help files and so takes a long time to install.
If you want to use (two variants of) Chernoff faces from R you need package TeachingDemos and its dependency tkrplot.
3D rotations including of surfaces are covered by package rgl.
Bartholomew, D. J., Steele, F., Moustaki, I. and Galbraith, J. I. (2002) The Analysis and Interpretation of Multivariate Data for Social Scientists. Chapman & Hall / CRC.
Cook, D. and Swayne, D. F. (2007) Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi. Book's website
Gower, J. C. and Hand, D. J. (1996) Biplots. Chapman & Hall.
Krzanowski, W. J. (1988) Principles of Multivariate Analysis. OUP.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. CUP. (Sections 9.1 and 9.2.)
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Springer. (Sections 11.1, 11.3 and 11.4.)
Cleveland, W. (1993) Visualizing Data. Hobart Press.
Wilkinson, L. (1999, 2005) The Grammar of Graphics. Springer.
Unwin, A., Theus, M. and Hoffmann, H. (2006) Graphics of Large Datasets. Visualizing a Million. Springer.