The Interplay between Statistics
and Optimization in Learning

Workshop at the Alan Turing Institute, London

Monday, June 11th 2018

The workshop aims at bringing together researchers working on the theoretical foundations of learning, with an emphasis on methods at the intersection of statistics, probability and optimization.

Update (15/06/2018): The recordings of the talks can be found here.

General Information

Organizers	Quentin Berthet (Cambridge, Stats Lab) Varun Kanade (Oxford, Computer Science) Patrick Rebeschini (Oxford, Statistics)
When	Monday, June 11th 2018
Where	Enigma Room, Alan Turing Institute, London (British Library, 96 Euston Road, London NW1 2DB). Main British Library entrance, which opens at 9.30.

Schedule

Time	Speaker	Title/Abstract
9.30 - 10.15	Registration
10.15 - 11.00	Marco Cuturi (ENSAE)	Regularization for Optimal Transport and Dynamic Time Warping Distances Machine learning deals with mathematical objects that have structure. Two common structures arising in applications are point clouds / histograms, as well as time series. Early progress in optimization (linear and dynamic programming) have provided powerful families of distances between these structures, namely Wasserstein distances and dynamic time warping scores. Because they rely both on the minimization of a linear functional over a (discrete) space of alignments and a continuous set of couplings respectively, both result in non-differentiable quantities. We show how two distinct smoothing strategies result in quantities that are better behaved and more suitable for machine learning applications, with applications to the computation of Fréchet means.
11.00 - 11.45	Lorenzo Rosasco (MIT / IIT)	Building efficient learning algorithms: a computational regularization perspective Classical algorithms design in machine learning is based on minimizing an empirical objective over a class of models. The latter imposes a bias on the kind of solutions sought for, and further allows to prevent possible instability. While this approach provides a useful abstraction, in practice it requires a number of further design choices. The impact of these choices is typically assessed separately and often on an empirical basis via trial and errors. In this talk I will consider a least squares learning scenario and show how a number of different algorithmic tricks can indeed be understood within a unifying regularization framework. These observations provides new perspective on how to build algorithms that are accurately as well as resource efficient.
11.45 - 12.30	Ya-Ping Hsieh (EPFL)	Mirrored Langevin Dynamics We consider the posterior sampling problem in constrained distributions, such as the Latent Dirichlet Allocation (LDA), which has posed significant challenges to both non-asymptotic analysis and algorithmic design. We propose a unified framework, which is inspired by the classical mirror descent, to derive novel deterministic and stochastic first-order sampling schemes. We prove that, for a general target distribution with strongly convex potential, our framework implies the existence of a first-order algorithm achieving O(epsilon^{-2}d) convergence, suggesting that the state-of-the-art O(\epsilon^{-6}d^5) can be vastly improved. With the important LDA application in mind, we specialize our algorithm to sample from Dirichlet posteriors, and derive the first non-asymptotic O(\epsilon^{-2}d^2 R_0) rate for first-order sampling, where $R_0$ is the initial distance. We further extend our deterministic framework to the mini-batch setting and prove convergence rates when only stochastic gradients are available. Finally, we report state-of-the-art experimental results for LDA on real datasets.
12.30 - 13.30	Lunch
13.30 - 14.15	Mark Herbster (UCL)	Who needs kernels anyway? We discuss the problem of predicting the entries of a binary matrix in the online learning setting. We will give an algorithm that scales with the _margin complexity_ of the underlying matrix. If each row of the matrix represents a prediction task over a finite set of objects (the columns), our per-task bound is nearly the same as the number of mistakes made by the Kernel Perceptron with an optimal kernel matrix in hindsight (on the most difficult task). An application to biclustering highlights the fact that our bound is optimal up to a logarithmic factor.
14.15 - 15.00	András György (DeepMind)	Adaptive Sampling via Sequential Decision Making Sampling algorithms are widely used in machine learning, and their success often depends on how well the algorithms are adapted to the problem at hand. This talk considers sampling methods that run several samplers in parallel and try to allocate more resources to the ones that produce better-quality samples. First, we show how multi-armed bandit algorithms can be used to choose sequentially from a set of unbiased Monte Carlo samplers so that the mean-squared error of the final combined estimate will be close to that of the best sampler in retrospect. Next, we generalize this approach to Markov-chain Monte Carlo (MCMC) samplers, which requires to overcome several challenges, for example how to measure sample quality properly or what to do with slowly mixing chains and multimodal target distributions. The resulting adaptive MCMC algorithm is asymptotically consistent and may lead to significant speedups, especially for multimodal target distributions, as demonstrated by our experiments.
15.00 - 15.45	Tor Lattimore (DeepMind)	A full classification of finite adversarial partial monitoring Abstract: Partial monitoring is a generalisation of the multi-armed bandit framework that models bandit games, full-information games and variants that lie between and beyond these two extremes. One of the core questions is the dependence of the regret in terms of the structure of the game. In this talk I will present the complete classification of the finite adversarial version of this problem, which characterises all games as belonging to one of four categories. This is joint work with Csaba Szepesvari (https://arxiv.org/pdf/1805.09247.pdf)
15.45 - 17.00	Small reception

Attendance

Attendance is by invitation only. If you would like to participate, get in touch with the organizers by filling this form:

click here

The Interplay between Statisticsand Optimization in Learning

Workshop at the Alan Turing Institute, London

General Information

Schedule

Attendance

The Interplay between Statistics
and Optimization in Learning