Note that prior to 2005-6 there was an 8-hour course in Multivariate Analysis,, but much of the material has been moved to Statistical Data Mining or Further Statistical Methods. The purpose of this module is to give an overview and to relate concepts found in those two courses.
What `multivariate analysis' is (and is not).
Graphical methods. Brush and Spin, Projection pursuit.
Principal component and factor analysis.
[Factor analysis is covered in Further Statistical Methods, and
PCA in the optional Statistical Data Mining.
This lecture will go over PCA from several viewpoints and explain why it is
frequently confused with factor analysis.]
Discrete methods, including correspondence analysis.
`Finding Needles in Haystacks: Tools for Finding
Structure in Large Datasets'
slides for visualization.
`Principal Component Analysis and Factor Analysis'
`SVD, PCA and Metric Scaling' A very mathematical account of the underlying theory.
`Discrete Multivariate Analysis' Correspondence analysis.
`Visualization --- Crop Viruses' Data on 61 viruses, data frame virus.
`University LeagueTables' Datasets Times, ft and tfl.
All these datasets are contained in the file mult.RData. This is an R save file, and you can use load on it, or drag-and-drop it onto an R console window.
We will be making use of GGobi. The status of that website varies from day to day, but when it is accessible there is a lot of information on it, including a draft book (by Cook & Swayne, see below), which is lot more usable than the manual.
You can download GGobi from here. Note that you also need GTK for Windows, and can do the download from inside R. The version of GTK on their link is much more than you need: this one suffices, and here is a local copy of the GGobi for Windows installer.
Some of the demos are worth viewing if you have QuickTime installed (the lab machines currently do not). In particular those for tours and brushing part 2.
Here are some notes on how to manipulate the GGvis plugin inside GGobi.
GGobi can be driven from R via package rggobi, which you can install like any other R package. You will also want package DescribeDisplay to print out plots. If you install these from the menus you will get all the dependencies: if doing this manually you need the packages
DescribeDisplay ggplot RColorBrewer reshape rggobi RGTK2Note that RGTK2 has lots of small HTML help files and so takes a long time to install.
If you want to use (two variants of) Chernoff faces from R you need package TeachingDemos and its dependency tkrplot.
3D rotations including of surfaces are covered by package rgl.
Bartholomew, D. J., Steele, F., Moustaki, I. and Galbraith, J. I. (2002) The Analysis and Interpretation of Multivariate Data for Social Scientists. Chapman & Hall / CRC.
Cook, D. and Swayne, D. F. (2007?) Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi. Online draft of nearly complete book.
Gower, J. C. and Hand, D. J. (1996) Biplots. Chapman & Hall.
Krzanowski, W. J. (1988) Principles of Multivariate Analysis. OUP.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. CUP. (Sections 9.1 and 9.2.)
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Springer. (Sections 11.1, 11.3 and 11.4.)
Cleveland, W. (1993) Visualizing Data. Hobart Press.
Wilkinson, L. (1999, 2005) The Grammar of Graphics. Springer.
Unwin, A., Theus, M. and Hoffmann, H. (2006) Graphics of Large Datasets. Visualizing a Million. Springer.