# Algorithmic Foundations of Learning

#### Prof. Patrick Rebeschini, University of Oxford, Michaelmas Term 2019

Course offered to: Part C, MSc in Mathematical Sciences (OMMS), and MSc in Statistical Science

## Revision Sessions (Trinity Term)

Consultation class 1 | Wednesday 6th May (Week 2), 11:00-12:00, Microsoft Teams.Only questions on the course material and the course problem sheets will be taken. Let the course instructor know which questions you would like to ask by completing this anonymous form click here |
---|---|

Consultation class 2 | Wednesday 20th May (Week 4), 11:00-12:00, Microsoft Teams.Only questions on the course material and the course problem sheets will be taken. Let the course instructor know which questions you would like to ask by completing this anonymous form click here |

**Recorded revision classes are available on Canvas**, covering the solutions of the mock exam papers dated 04-04-2019 and 15-04-2019 (3 questions each), the solutions of the 2019 Part C/OMMS exam paper (3 questions), and the 2019 MSc exam paper (2 questions).

Part C/OMMS exam papers and solutions can be found on WebLearn, following this link (searching for "Algorithmic Foundations of Learning") and this link (Part C -> SC10), respectively. MSc exam papers and solutions can be found on WebLearn, following this link (searching for "Principles of Statistical Analysis") and this link, respectively.

## General Information

Instructor | Patrick Rebeschini (patrick.rebeschini AT stats.ox.ac.uk) |
---|---|

Class Tutors and Teaching Assistants | Lorenzo Pacchiardi (lorenzo.pacchiardi AT spc.ox.ac.uk) Tomas Vaškevičius (tomas.vaskevicius AT spc.ox.ac.uk) |

Lectures | Tuesdays 9:00-10:00 & Thursdays 9:00-10:00, weeks 1-8, LG.01 |

Tutorials | Weeks 3, 5, 7 of Michaelmas Term. Meeting in Hilary Term to be decided. Details below |

## Tutorials

Group | Time | Location | Class Tutor / Teaching Assistant |
---|---|---|---|

Part C & OMMS, group 1 | Thursdays 14:00-15:30 Weeks 3, 5, 7 of Michaelmas Term Weeks 1 of Hilary Term |
LG.04 LG.04 |
Lorenzo Pacchiardi |

Part C & OMMS, group 2 | Fridays 11:30-13:00 Weeks 3, 5, 7 of Michaelmas Term Weeks 1 of Hilary Term |
LG.04 LG.03 |
Patrick Rebeschini / Tomas Vaškevičius |

Part C & OMMS, group 3 | Fridays 13:00-14:30 Weeks 3, 5, 7 of Michaelmas Term Weeks 1 of Hilary Term |
LG.04 LG.03 |
Tomas Vaškevičius |

MSc in Statistical Science | Fridays 9:00-10:00 Weeks 3, 5, 7 of Michaelmas Term Weeks 0 of Hilary Term |
LG.01 | Patrick Rebeschini |

## Syllabus

The course is meant to provide a rigorous theoretical account of the main ideas underlying machine learning, and to offer a principled framework to understand the algorithmic paradigms being used, along with non-asymptotic methods for the study of random structures in high-dimensional probability, statistics, and optimisation.- Statistical learning frameworks for prediction, estimation, and online learning.
- Probability:
- Maximal inequalities.
- Rademacher and Gaussian complexities.
- Elements of VC theory.
- Covering and packing numbers.
- Chaining.
- Concentration inequalities.
- Statistics:
- Bayes decision rules.
- Empirical risk minimisation. Error decomposition: generalisation, optimisation, and approximation.
- Learning via uniform convergence, margin bounds, and algorithmic stability.
- Regularisation: explicit (constraints and penalisation) and implicit (algorithmic).
- Convex loss surrogates.
- Slow and fast rates.
- Minimax lower bounds and hypothesis testing.
- Optimisation:
- Elements of convex theory.
- Oracle model. Gradient descent. Mirror descent.
- Stochastic oracle model. Stochastic gradient descent. Stochastic mirror descent. Variance reduction techniques.
- Approximate Message Passing.
- Online optimisation.
- Examples:
- Linear predictors, including Boosting.
- Non-linear predictors, including Support Vector Machines and Neural Networks.
- High-dimensional estimators for sparse and low-rank problems, including Lasso.
- Online learning, including multi-armed bandit problems, reinforcement learning and algorithms.

- Shai Shalev-Shwartz and Shai Ben-David.
*Understanding Machine Learning: From Theory to Algorithms.*Cambridge University Press, 2014. - Sébastien Bubeck.
*Convex Optimization: Algorithms and Complexity.*Foundations and Trends in Machine Learning, 2015. - Ramon van Handel.
*Probability in High Dimension.*Lecture notes available online, 2016. - Tor Lattimore and Csaba Szepesvári.
*Bandit Algorithms.*Book available online (draft). - Martin J. Wainwright.
*High-Dimensional Statistics. A Non-Asymptotic Viewpoint.*Cambridge University Press, 2019.

- The course requires a good level of mathematical maturity. Students are expected to have a working knowledge of core concepts in probability theory (basic properties of probabilities, such as union bounds, and of conditional expectations, such as the tower property; basic inequalities, such as Markov's and Jensen's), statistics (confidence intervals, hypothesis testing), and linear algebra (matrix-vector operations, eigenvalues and eigenvectors; basic inequalities, such as Cauchy-Schwarz's and Hölder's). Previous exposure to machine learning (empirical risk minimisation, overfitting, regularisation) is recommended.
- Students would benefit from being familiar with the material covered in SB2.1 (formerly SB2a) Foundations of Statistical Inference (in particular, Decision Theory) and in SB2.2 (formerly SB2b) Statistical Machine Learning.

## Lecture Material

The lecture notes are self-contained. They give full details of the results and proofs covered in this course, and include all the material needed to complete the problem sheets and the exams.**Students are expected to study all the proofs presented in the lecture notes, even those not covered during lectures.**

The slides used during lectures provide the high-level narrative and highlight the main ideas of the results covered in the lecture notes.

Proof ideas and graphical illustrations are given on the whiteboard during lectures.

0 | Preliminaries | Notes » |
---|---|---|

1 | Introduction | Notes » Slides » |

2 | Maximal Inequalities and Rademacher Complexity | Notes » Slides » |

3 | Rademacher Complexity. Examples | Notes » Slides » |

4 | VC Dimension. Covering and Packing Numbers | Notes » Slides » |

5 | Covering Numbers Bounds for Rademacher Complexity. Chaining | Notes » Slides » |

6 | Sub-Gaussian Concentration Inequalities. Bounds in Probability | Notes » Slides » |

7 | Bernstein’s Concentration Inequalities. Fast Rates | Notes » Slides » |

8 | Convex Loss Surrogates. Elements of Convex Theory | Notes » Slides » |

9 | Oracle Model. Gradient Descent Methods | Notes » Slides » |

10 | Non-Euclidean Settings. Mirror Descent Methods | Notes » Slides » |

11 | Stochastic Oracle Model. Algorithmic Stability | Notes » Slides » |

12 | High-Dimensional Statistics. Gaussian Complexity | Notes » Slides » |

13 | The Lasso Estimator. Proximal Gradient Methods | Notes » Slides » |

14 | Least Squares Regression. Implicit Bias and Implicit Regularization | Notes » Slides » |

15 | Stochastic Multi-Armed Bandit Problem and Algorithms | Notes » Slides » |

16 | Minimax Lower Bounds and Hypothesis Testing | Notes » Slides » |

## Problem Sheets

Week 3 | Problems on material covered up to Lecture 2 | Sheet » due on October 30th at 13:00 |
---|---|---|

Week 5 | Problems on material covered up to Lecture 7 | Sheet » due on November 13th at 13:00 |

Week 7 | Problems on material covered up to Lecture 11 | Sheet » due on November 27th at 13:00 |

Week 1 (Hilary Term) | Problems on material covered up to Lecture 16 | Sheet » due on January 22nd at 13:00 |

Printed copies of the solutions will be provided during tutorials. Solutions will

**not**be distributed in electronic format. If you miss a tutorial, please send an email to your teaching assistant to receive a printed copy of the solutions.

# Feedback

Leave your feedback to the instructors, anonymously: click here