Course lecturer : Professor Jonathan Marchini marchini@stats.ox.ac.uk
The aim of the course
is to introduce students to the theory and practice of
unsupervised learning.
Unsupervised learning can be described as finding structure in
datasets, and has applications in many areas such as finance,
retail, medical imaging, sports performance analysis, genetics,
medicine, studies of the environment and social networks.
Unsupervised
learning methods are important parts of Computational
Statistics, Machine
Learning,
Artificial
Intelligence and Big Data.
Raw
dataset : 300 x 8686 matrix of gene expression
measurements from
Pollen et al (2014) Nature Biotechnology 32, 10531058 Viewing the raw data it is very difficult to see any clear structure or similarity between the samples. 
3D
Projection and clustering : The method of Principal
Components Analysis (PCA) has been applied to the dataset
in order to uncover structure. A clustering method
(kmeans) has then been applied to group observations in
distinct groupings or clusters. Students will learn the theory and practical skills to reproduce this analysis. 
Exercise
sheet 

1 
sheet6.pdf 
2 
sheet7.pdf 
3 
sheet8.pdf 
The Paper III specimen papers now include questions on the material in these 6 lectures. These can be found here
This course leads onto
several more advanced courses in future years that students
should consider if they wish to learn more about Statistical
Data Analysis, Machine Learning, Big Data and Artificial
Intelligence.
Part
A 
Part
B 
Part
C 
Probability Statistics Simulation and Statistical Programming 
Foundations
of Statistical Inference 
Statistical
Data Mining and Machine Learning Advanced Simulation Methods 