Algorithmic Foundations of Learning

Prof. Patrick Rebeschini, University of Oxford, Michaelmas Term 2018

Course offered to: Part C, MSc in Mathematical Sciences (OMMS), and MSc in Statistical Science

Revisions (Trinity Term)

Revision Class 1 Monday Week 2, 9:00-10:30, LG.01.
We will cover the solutions of the mock exam dated 04-04-2019.
Revision Class 2 Monday Week 3, 9:00-10:30, LG.01.
We will cover the solutions of the mock exam dated 15-04-2019.
Consultation Session Monday Week 4, 9:00-10:00, LG.05.
Only questions on the course material and the course problem sheets will be taken.
Let the course instructor know which questions you would like to ask by completing this anonymous form  click here
Mock exams can be found on WebLearn, following this link (Part C -> SC10). After revision classes, solutions will be uploaded to the same folder on WebLearn.

General Information

Course Team Patrick Rebeschini, Tomas Vaškevičius, Fan Wu
Email patrick.rebeschini AT stats.ox.ac.uk
Lectures Tuesdays 9:00-10:00 & Thursdays 9:00-10:00, weeks 1-8, LG.01
Tutorials Weeks 2, 5, 7 of Michaelmas Term. Week 1 of Hilary Term. Details below

Tutorials

Group Time Location Class Tutor / Teaching Assistant
Part C & OMMS, group 1 Fridays 9:00-10:30
Weeks 2, 5, 7 of Michaelmas Term
Week 1 of Hilary Term
LG.04 in Michaelmas Term
LG.05 in Hilary Term
Patrick Rebeschini / Tomas Vaškevičius
Part C & OMMS, group 2 Fridays 10:30-12:00
Weeks 2, 5, 7 of Michaelmas Term
Week 1 of Hilary Term
LG.04 in Michaelmas Term
LG.05 in Hilary Term
Fan Wu
Part C & OMMS, group 3 Fridays 12:00-13:30
Weeks 2, 5, 7 of Michaelmas Term
Week 1 of Hilary Term
LG.04 Tomas Vaškevičius / Fan Wu
MSc in Statistical Science Fridays 13:30-14:30
Weeks 2, 5, 7 of Michaelmas Term
Week 1 of Hilary Term
LG.01 Patrick Rebeschini

Syllabus

The course is meant to provide a rigorous theoretical account of the main ideas underlying machine learning, and to offer a principled framework to understand the algorithmic paradigms being used, along with non-asymptotic methods for the study of random structures in high-dimensional probability, statistics, and optimisation.
  • Statistical learning frameworks for prediction, estimation, and online learning.
  • Probability:
    • Maximal inequalities.
    • Rademacher and Gaussian complexities.
    • Elements of VC theory.
    • Covering and packing numbers.
    • Chaining.
    • Concentration inequalities.
  • Statistics:
    • Bayes decision rules.
    • Empirical risk minimisation. Error decomposition: generalisation, optimisation, and approximation.
    • Learning via uniform convergence, margin bounds, and algorithmic stability.
    • Regularisation: explicit (constraints and penalisation) and implicit (algorithmic).
    • Convex loss surrogates.
    • Slow and fast rates.
    • Minimax lower bounds and hypothesis testing.
  • Optimisation:
    • Elements of convex theory.
    • Oracle model. Gradient descent. Mirror descent.
    • Stochastic oracle model. Stochastic gradient descent. Stochastic mirror descent. Variance reduction techniques.
    • Approximate Message Passing.
    • Online optimisation.
  • Examples:
    • Linear predictors, including Boosting.
    • Non-linear predictors, including Support Vector Machines and Neural Networks.
    • High-dimensional estimators for sparse and low-rank problems, including Lasso.
    • Online learning, including multi-armed bandit problem and algorithms.
Readings:
  • Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
  • Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning, 2015.
  • Ramon van Handel. Probability in High Dimension. Lecture notes available online, 2016.
  • Tor Lattimore and Csaba Szepesvári. Bandit Algorithms. Book available online (draft).
Prerequisites:
  • The course requires a good level of mathematical maturity. Students are expected to have a working knowledge of core concepts in probability theory (basic properties of probabilities, such as union bounds, and of conditional expectations, such as the tower property; basic inequalities, such as Markov's and Jensen's), statistics (confidence intervals, hypothesis testing), and linear algebra (matrix-vector operations, eigenvalues and eigenvectors; basic inequalities, such as Cauchy-Schwarz's and Hölder's). Previous exposure to machine learning (empirical risk minimisation, overfitting, regularisation) is recommended.
  • Students would benefit from being familiar with the material covered in SB2.1 (formerly SB2a) Foundations of Statistical Inference (in particular, Decision Theory) and in SB2.2 (formerly SB2b) Statistical Machine Learning.

Lecture Material

0 Preliminaries Notes »
1 Introduction Notes »  Slides »
2 Maximal Inequalities and Rademacher Complexity Notes »  Slides »
3 Rademacher Complexity. Examples Notes »  Slides »
4 VC Dimension. Covering and Packing Numbers Notes »  Slides »
5 Covering Numbers Bounds for Rademacher Complexity. Chaining Notes »  Slides »
6 Sub-Gaussian Concentration Inequalities. Bounds in Probability Notes »  Slides »
7 Bernstein’s Concentration Inequalities. Fast Rates Notes »  Slides »
8 Convex Loss Surrogates. Elements of Convex Theory Notes »  Slides »
9 Oracle Model. Gradient Descent Methods Notes »  Slides »
10 Non-Euclidean Settings. Mirror Descent Methods Notes »  Slides »
11 Stochastic Oracle Model. Algorithmic Stability and Implicit Regularization Notes »  Slides »
12 High-Dimensional Statistics. Gaussian Complexity Notes »  Slides »
13 The Lasso Estimator. Proximal Gradient Methods Notes »  Slides »
14 Approximate Message Passing Notes »  Slides »
15 Stochastic Multi-Armed Bandit Problem and Algorithms Notes »  Slides »
16 Minimax Lower Bounds and Hypothesis Testing Notes »  Slides »
The lecture notes are self-contained and give full details of the results and proofs covered in this course.
The slides provide the high-level narrative and highlight the main results.
Proofs and graphical illustrations are given on the whiteboard during lectures.

Problem Sheets

Week 2 Problems on material covered up to Lecture 2 Sheet »  due on October 17th at 13:00
Week 5 Problems on material covered up to Lecture 7 Sheet »  due on November 7th at 13:00
Week 7 Problems on material covered up to Lecture 11 Sheet »  due on November 22nd at 9:00
Week 1 (Hilary Term) Problems on material covered up to Lecture 16 Sheet »  (scripts will not be collected)
For Part C and OMMS students: hand in solutions at the Algorithmic Foundations of Learning tray in the Statistics Department. Indicate the class group (1, 2, or 3) in the front page.

Solutions will be provided during tutorials. For Part C and OMMS students: class allocation details are on Minerva (accessible from Oxford network).


Feedback

Leave your feedback to the instructors, anonymously:   click here