Course lecturer : Professor Jonathan Marchini firstname.lastname@example.org
The aim of the course
is to introduce students to the theory and practice of
Unsupervised learning can be described as finding structure in datasets, and has applications in many areas such as finance, retail, medical imaging, sports performance analysis, genetics, medicine, studies of the environment and social networks.
learning methods are important parts of Computational
Intelligence and Big Data.
Raw dataset : 300 x 8686 matrix of gene expression measurements from
Pollen et al (2014) Nature Biotechnology 32, 1053-1058
Viewing the raw data it is very difficult to see any clear structure or similarity between the samples.
Projection and clustering : The method of Principal
Components Analysis (PCA) has been applied to the dataset
in order to uncover structure. A clustering method
(k-means) has then been applied to group observations in
distinct groupings or clusters.
Students will learn the theory and practical skills to reproduce this analysis.
The Paper III specimen papers now include questions on the material in these 6 lectures. These can be found here
This course leads onto
several more advanced courses in future years that students
should consider if they wish to learn more about Statistical
Data Analysis, Machine Learning, Big Data and Artificial
Simulation and Statistical Programming
of Statistical Inference
Data Mining and Machine Learning
Advanced Simulation Methods