Multivariate Analysis, HT2008

A 3-hour module for the M.Sc. in Applied Statistics in Hilary Term weeks 3, 5, 6. It follows the sections of Venables & Ripley (2002) specified under `relevant books'.

Note that prior to 2005-6 there was an 8-hour course in Multivariate Analysis,, but much of the material has been moved to Statistical Data Mining or Further Statistical Methods. The purpose of this module is to give an overview and to relate concepts found in those two courses.

Synopsis

What `multivariate analysis' is (and is not).

Graphical methods. Brush and Spin, Projection pursuit.

Principal component and factor analysis.
[Factor analysis is covered in Further Statistical Methods, and PCA in the optional Statistical Data Mining. This lecture will go over PCA from several viewpoints and explain why it is frequently confused with factor analysis.]

Discrete methods, including correspondence analysis.

Lecture material

`Finding Needles in Haystacks: Tools for Finding Structure in Large Datasets'
slides for visualization.

`Principal Component Analysis and Factor Analysis'

Background material

All PDF documents.

`SVD, PCA and Metric Scaling' A very mathematical account of the underlying theory.

`Discrete Multivariate Analysis' Correspondence analysis.

Datasets

`Visualization --- Crop Viruses' Data on 61 viruses, data frame virus.

`University LeagueTables' Datasets Times, ft and tfl.

All these datasets are contained in the file mult.RData. This is an R save file, and you can use load on it, or drag-and-drop it onto an R console window.

Software for use on your own machine

We will be making use of GGobi. You can download GGobi from here.

Some of the demos are worth viewing if you have QuickTime installed. In particular those for tours and brushing part 2.

Here are some notes on how to manipulate the GGvis plugin inside GGobi.

GGobi can be driven from R via package rggobi, which you can install like any other R package. You will also want package DescribeDisplay to print out plots. If you install these from the menus you will get all the dependencies: if doing this manually you need the package RGtk2. Note that RGTK2 has lots of small HTML help files and so takes a long time to install.

If you want to use (two variants of) Chernoff faces from R you need package TeachingDemos and its dependency tkrplot.

3D rotations including of surfaces are covered by package rgl.

Warning

This software is not as stable as R or the packages you have been using hitherto. Be careful to save your work (image, editor scripts, history) frequently. It seems particularly vulnerable times are when you shut GGobi windows or close GGobi itself.

Examples

Some examples of driving GGobi from the R package rggobi.

Week 8 Practical

The practical contains four problems, three based on the scripts in the previous section and one on this dataset.

Relevant books

Bartholomew, D. J., Steele, F., Moustaki, I. and Galbraith, J. I. (2002) The Analysis and Interpretation of Multivariate Data for Social Scientists. Chapman & Hall / CRC.

Cook, D. and Swayne, D. F. (2007) Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi. Book's website

Gower, J. C. and Hand, D. J. (1996) Biplots. Chapman & Hall.

Krzanowski, W. J. (1988) Principles of Multivariate Analysis. OUP.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. CUP. (Sections 9.1 and 9.2.)

Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Springer. (Sections 11.1, 11.3 and 11.4.)

Visualization

Cleveland, W. (1993) Visualizing Data. Hobart Press.

Wilkinson, L. (1999, 2005) The Grammar of Graphics. Springer.

Unwin, A., Theus, M. and Hoffmann, H. (2006) Graphics of Large Datasets. Visualizing a Million. Springer.


Last edited on 3 March 2008 by Prof Brian Ripley (ripley@stats.ox.ac.uk)