Course lecturer: Prof Dino Sejdinovic. The course and all course materials were designed by Prof Jonathan Marchini.
The course synopsis
The aim of the Data Analysis course is to introduce students to the theory and practice of unsupervised learning.
Unsupervised learning can be described as finding structure in datasets, and has applications in many areas such as finance, retail, medical imaging, sports performance analysis, genetics, medicine, studies of the environment and social networks.
Unsupervised learning methods are important parts of
Artificial Intelligence, and
Raw dataset : 300 x 8686 matrix of gene expression measurements from
Pollen et al (2014) Nature Biotechnology 32, 1053-1058
Viewing the raw data it is very difficult to see any clear structure or similarity between the samples.
Projection and clustering : The method of Principal
Components Analysis (PCA) has been applied to the dataset
in order to uncover structure. A clustering method
(k-means) has then been applied to group observations in
distinct groupings or clusters.
Students will learn the theory and practical skills to reproduce this analysis.
This course leads onto
several more advanced courses in future years that students
should consider if they wish to learn more about Statistical
Data Analysis, Machine Learning, Big Data and Artificial
Simulation and Statistical Programming
|Foundations of Statistical Inference
Statistical Machine Learning
|Advanced Topics in Statistical Machine Learning
Advanced Simulation Methods
Algorithmic Foundations of Learning