**Course lecturer**
: Professor Jonathan Marchini marchini@stats.ox.ac.uk

The aim of the course
is to introduce students to the theory and practice of **
unsupervised learning**.

Unsupervised learning can be described as finding structure in
datasets, and has applications in many areas such as finance,
retail, medical imaging, sports performance analysis, genetics,
medicine, studies of the environment and social networks.

**Uns****upervised
lea****rning** methods are important parts of **Computational
Statistics, Machine
Lear****n****ing****,
****Artificial
Intelligence**** and Big Data.**

Raw
dataset : 300 x 8686 matrix of gene expression
measurements from Pollen et al (2014) Nature Biotechnology 32, 1053-1058 Viewing the raw data it is very difficult to see any clear structure or similarity between the samples. |
3D
Projection and clustering : The method of Principal
Components Analysis (PCA) has been applied to the dataset
in order to uncover structure. A clustering method
(k-means) has then been applied to group observations in
distinct groupings or clusters. Students will learn the theory and practical skills to reproduce this analysis. |

The course synopsis is here https://www0.maths.ox.ac.uk/courses/course/29077/synopsis

The Paper III specimen papers now include questions on the material in these 6 lectures. These can be found here

It is up to each college tutor to decide whether students should attempt these questions, but it is

Modern statistics is pervasive in the era of "Big Data". The majority of Maths graduates will go on to careers that involve some use of data, so a firm practical grounding in statistical analysis is highly valuable. An aim of this course is to get students started on being able to independently carry out statistical data analysis.

As many student will not have worked with R, here is a short tutorial document that will introduce R, show students how to install R and get started with some basics.

R_intro.pdf

This course leads onto
several more advanced courses in future years that students
should consider if they wish to learn more about Statistical
Data Analysis, Machine Learning, Big Data and Artificial
Intelligence.

Part
A |
Part
B |
Part
C |

Probability Statistics Simulation and Statistical Programming |
Foundations
of Statistical Inference |
Statistical
Data Mining and Machine Learning Advanced Simulation Methods |

This book is freely available online here http://www-bcf.usc.edu/~gareth/ISL/

G. James, D. Witten, T. Hastie, R. Tibshirani

Chapter 10 covers unsupervised learning.