Multivariate Analysis, 2011

A 3-hour module for the M.Sc. in Applied Statistics in Hilary Term. It follows the sections of Venables & Ripley (2002) specified under "relevant books". Lectures will be held at 10am on Wednesday of weeks 3, 4 and 5.

Note that prior to 2005-6 there was an 8-hour course in Multivariate Analysis,, but much of the material has been moved to Statistical Data Mining or Further Statistical Methods. The purpose of this module is to give an overview and to relate concepts found in those two courses.

In recent years the course has been taught by Professor Brian Ripley, who prepared most of the material below. The course website from 2008 can be found here.

Synopsis

What `multivariate analysis' is (and is not).

Graphical methods. Brush and Spin, Projection pursuit.

Multi-dimensional scaling and principal component analysis.

Factor analysis and discrete methods, including correspondence analysis. [Factor analysis is covered in Further Statistical Methods, and PCA in Statistical Data Mining. We will go over PCA from several viewpoints and explain why it is frequently confused with factor analysis.]

Lecture material

Graphical Methods, PCA.
Accompanying R code.
Here is the R code for one way to carry out the exercises suggested at the end of the lecture.

Projection Pursuit and MDS.

Factor Analysis and methods for discrete data. Here are some notes giving a brief summary, as well as suggested reading. .

Background material

All PDF documents.

`SVD, PCA and Metric Scaling' A very mathematical account of the underlying theory.

`Discrete Multivariate Analysis' Correspondence analysis.

Datasets

`Visualization --- Crop Viruses' Data on 61 viruses, data frame virus.

`University LeagueTables' Datasets Times, ft and tfl.

All these datasets are contained in the file mult.RData.

Software for use on your own machine

We will be making use of GGobi. You can download GGobi from here.

Some of the demos are worth viewing if you have QuickTime installed. In particular those for tours and brushing part 2.

Here are some notes on how to manipulate the GGvis plugin inside GGobi.

GGobi can be driven from R via package rggobi, which you can install like any other R package. You will also want package DescribeDisplay to print out plots. If you install these from the menus you will get all the dependencies: if doing this manually you need the package RGtk2. Note that RGtk2 has lots of small HTML help files and so takes a long time to install.

If you want to run rggobi on your own machine, then please take care to follow these instructions. If using rggobi on the machines in the computer labs, make sure to start R using the "R for GGobi" shortcut on the desktop.

If you want to use (two variants of) Chernoff faces from R you need package TeachingDemos and its dependency tkrplot.

3D rotations including of surfaces are covered by package rgl.

Warning

This software is not as stable as R or the other packages you have been using so far. Be careful to save your work (image, editor scripts, history) frequently. It seems particularly vulnerable times are when you shut GGobi windows or close GGobi itself. I also recommend that you do not copy and paste more than one command at a time into rggobi.

Examples

Some examples of driving GGobi from the R package rggobi.

Week 5 Practical

The practical contains six problems, four based on the scripts in the previous section. You will need the mult.RData and dermatology datasets. You should write a report on your analysis of question 6 and submit it to the front office, SPR 1, by 10 am on Monday week 6. You should download the schools data and description. Some general comments on the submitted reports are now available.

Relevant books

Bartholomew, D. J., Steele, F., Moustaki, I. and Galbraith, J. I. (2002) The Analysis and Interpretation of Multivariate Data for Social Scientists. Chapman & Hall / CRC.

Cook, D. and Swayne, D. F. (2007) Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi. Book's website

Gower, J. C. and Hand, D. J. (1996) Biplots. Chapman & Hall.

Krzanowski, W. J. (1988) Principles of Multivariate Analysis. OUP.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. CUP. (Sections 9.1 and 9.2.)

Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Springer. (Sections 11.1, 11.3 and 11.4.)

Visualization

Cleveland, W. (1993) Visualizing Data. Hobart Press.

Wilkinson, L. (1999, 2005) The Grammar of Graphics. Springer.

Unwin, A., Theus, M. and Hoffmann, H. (2006) Graphics of Large Datasets. Visualizing a Million. Springer.