Professor of Statistics and Machine Learning, Department of Statistics, University of Oxford. I work on developing methodologies and theoretical foundations for large-scale learning problems.

Before joining the University of Oxford, I have been a Lecturer in the Computer Science department at Yale University and a Postdoctoral Associate at the Yale Institute for Network Science, hosted by Sekhar Tatikonda.
I have a Ph.D. in Operations Research and Financial Engineering from Princeton University, where I worked in probability theory under the supervision of Ramon van Handel.

Here is my Curriculum Vitae.

I am interested in the investigation of fundamental principles in **high-dimensional probability**, **statistics** and **optimization** to design computationally efficient and statistically optimal algorithms for machine learning.

Since 2022, I have been serving as Area Chair for COLT: COLT 2022, COLT 2023, COLT 2024.

In 2022, I chaired the session on Advanced Theoretical Statistics at the IMS Annual Meeting in London.

As a Fellow at the Alan Turing Institute London, I organized the following meetings:

On June 28-July 2 2021, I taught the week-long summer school Mathematics of Machine Learning.

On January 13-14 2020, I co-organized the two-day workshop Statistics and Computation.

On June 11 2018, I co-organized the one-day workshop The Interplay between Statistics and Optimization in Learning.

I am a Co-Investigator for the Imperial-Oxford StatML Centre for Doctoral Training (CDT). I am a member of the Bernoulli Society, Institute of Mathematical Statistics (IMS), and European Laboratory for Learning and Intelligent Systems (ELLIS). I am an alumnus of the Yale Institute for Network Science and Princeton Statistical Laboratory.

**Meta-learning the mirror map in policy mirror descent**(with C. Alfano, S. Towers, S. Sapora, and C. Lu). [arXiv]**Generalization bounds for label noise stochastic gradient descent**(with J. E. Huh), International Conference on Artificial Intelligence and Statistics (AISTATS) 2024. [arXiv]**Sample-efficiency in multi-batch reinforcement learning: The need for dimension-dependent adaptivity**(with E. Johnson and C. Pike-Burke), International Conference on Learning Representations (ICLR) 2024. [arXiv]**Optimal convergence rate for exact policy mirror descent in discounted Markov decision processes**(with E. Johnson and C. Pike-Burke), Conference on Neural Information Processing Systems (NeurIPS) 2023. Presented at the 16th European Workshop on Reinforcement Learning (EWRL 2023). [arXiv]**A novel framework for policy mirror descent with general parametrization and linear convergence**(with C. Alfano and R. Yuan), Conference on Neural Information Processing Systems (NeurIPS) 2023. Presented at the 16th European Workshop on Reinforcement Learning (EWRL 2023). [arXiv]**Exponential tail local Rademacher complexity risk bounds without the Bernstein condition**(with V. Kanade and T. Vaškevičius). [arXiv]**Comparing classes of estimators: When does gradient descent beat ridge regression in linear models?**(with D. Richards and E. Dobriban). [arXiv]**The statistical complexity of early-stopped mirror descent**(with V. Kanade and T. Vaškevičius), Information and Inference: A Journal of the IMA, to appear.**Nearly minimax-optimal rates for noisy sparse phase retrieval via early-stopped mirror descent**(with F. Wu), Information and Inference: A Journal of the IMA, vol. 12, no. 2, pp. 633-713, 2023. [journal] [arXiv]**Implicit regularization in matrix sensing via mirror descent**(with F. Wu), Conference on Neural Information Processing Systems (NeurIPS), vol. 34, pp. 20558-20570, 2021. [proceedings] [arXiv] [code]**Distributed machine learning with sparse heterogeneous data**(with D. Richards and S. Negahban), Conference on Neural Information Processing Systems (NeurIPS), vol. 34, pp. 18008-18020, 2021. [proceedings] [arXiv]**On optimal interpolation in linear regression**(with E. Oravkin), Conference on Neural Information Processing Systems (NeurIPS), vol. 34, pp. 29116-29128, 2021. [proceedings] [arXiv] [code]**Time-independent generalization bounds for SGLD in non-convex settings**(with T. Farghly), Conference on Neural Information Processing Systems (NeurIPS), vol. 34, pp. 19836-19846, 2021. [proceedings] [arXiv]**Hadamard Wirtinger flow for sparse phase retrieval**(with F. Wu), International Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research (PMLR), vol. 130, pp. 982-990, 2021. Oral presentation. [proceedings] [arXiv] [code]**A continuous-time mirror descent approach to sparse phase retrieval**(with F. Wu), Conference on Neural Information Processing Systems (NeurIPS), vol. 33, pp. 20192-20203, 2020. Spotlight presentation. [proceedings] [arXiv] [code]**The statistical complexity of early stopped mirror descent**(with V. Kanade and T. Vaškevičius), Conference on Neural Information Processing Systems (NeurIPS), vol. 33, pp. 253-264, 2020. Spotlight presentation. [proceedings] [arXiv] [code]**Decentralised learning with random features and distributed gradient descent**(with D. Richards and L. Rosasco), International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research (PMLR), vol. 119, pp. 8105-8115, 2020. [proceedings] [arXiv] [code]**Graph-dependent implicit regularisation for distributed stochastic subgradient descent**(with D. Richards), Journal of Machine Learning Research (JMLR), vol. 21, no. 34, pp. 1-44, 2020. [journal] [arXiv] [code]**Implicit regularization for optimal sparse recovery**(with V. Kanade and T. Vaškevičius), Conference on Neural Information Processing Systems (NeurIPS), vol. 32, pp. 2972-2983, 2019. [proceedings] [arXiv] [code]**Optimal statistical rates for decentralised non-parametric regression with linear speed-up**(with D. Richards), Conference on Neural Information Processing Systems (NeurIPS), vol. 32, pp. 1216-1227, 2019. [proceedings] [arXiv]**Decentralized cooperative stochastic bandits**(with D. Martínez-Rubio and V. Kanade), Conference on Neural Information Processing Systems (NeurIPS), vol. 32, pp. 4529-4540, 2019. [proceedings] [arXiv] [code]**Locality in network optimization**(with S. Tatikonda), IEEE Transactions on Control of Network Systems, vol. 6, no. 2, pp. 487-500, 2019. [journal] [arXiv]**A new approach to Laplacian solvers and flow problems**(with S. Tatikonda), Journal of Machine Learning Research (JMLR), vol. 20, no. 36, pp. 1-37, 2019. [journal] [arXiv]**Accelerated consensus via Min-Sum Splitting**(with S. Tatikonda), Conference on Neural Information Processing Systems (NIPS), vol. 30, pp. 1374-1384, 2017. [proceedings] [arXiv] [poster]**Decay of correlation in network flow problems**(with S. Tatikonda), 50th Conference on Information Sciences and Systems (CISS), pp. 169-174, 2016. [proceedings] [pdf]**Fast mixing for discrete point processes**(with A. Karbasi), 28th Conference on Learning Theory (COLT), pp. 1480-1500, 2015. [proceedings] [arXiv] [poster]**Can local particle filters beat the curse of dimensionality?**(with R. van Handel), Annals of Applied Probability, vol. 25, no. 5, pp. 2809-2866, 2015. [journal] [arXiv]**Phase transitions in nonlinear filtering**(with R. van Handel), Electronic Journal of Probability, vol. 20, no. 7, pp. 1-46, 2015. [journal] [arXiv]**Comparison theorems for Gibbs measures**(with R. van Handel), Journal of Statistical Physics, vol. 157, pp. 234-281, 2014. [journal] [arXiv]**Nonlinear filtering in high dimension**, Ph.D. thesis, Princeton University, 2014. [pdf]

**Upcoming:**

**@**Gatsby Computational Neuroscience Unit, UCL (March 6, 2024)

**@**Laboratory of Mathematics in Orsay, Paris (March 14, 2024)

**@**Department of Statistics, University of Warwick (May 13, 2024)**Algorithmic stability, generalization, and privacy for diffusion models**, Statistics Seminar, Collegio Carlo Alberto, University of Turin, October 2023.**Learning with mirror descent**, Statistics and Learning Theory summer school in Tsaghkadzor, Armenia (Yerevan State University, CREST/ENSAE, École Polytechnique Paris), July 2023.**Implicit regularization via uniform convergence**, GRAMSIA (Graphical Models, Statistical Inference, and Algorithms) Workshop, Cambridge (US), May 2023.**Implicit regularization in statistical learning: An overview and some recent results**, Berlin-Bielefeld-Paris Workshop on Early Stopping, Berlin, April 2023.**Implicit regularization in statistical learning: An overview and some recent results**, Physics of Machine Learning Workshop, Università degli Studi di Padova, Asiago, September 2022.**Concentration without Bernstein**, Advanced Theoretical Statistics session, IMS Annual Meeting, London, June 2022.**Sharp Excess Risk Bounds without the Bernstein Condition: An Algorithmic Viewpoint**, BIDSA Seminar Series, Department of Decision Sciences, Bocconi University, April 2022.**Sharp Excess Risk Bounds without the Bernstein Condition: An Algorithmic Viewpoint**, CDSML Seminar Series, Department of Mathematics, National University of Singapore, March 2022.**The Statistical Complexity of Early-Stopped Mirror Descent**, Statistical Methods in Machine Learning, Bernoulli-IMS One World Symposium 2020, August 2020. [video]**The Statistical Complexity of Early-Stopped Mirror Descent**, Probability Seminar, Division of Applied Mathematics, Brown University, May 2020.**Statistically and Computationally Optimal Estimators for Sparse Recovery and Decentralized Regression**, Adobe Research, San Jose, December 2019.**Implicit Regularization for Optimal Sparse Recovery**, Information Systems Lab (ISL) Colloquium, Stanford University, December 2019.**On the Interplay between Statistics, Computation and Communication in Decentralised Learning**, Decision and Control Systems, KTH, October 2019.**Implicit Regularization for Optimal Sparse Recovery**, Probability and Mathematical Statistics seminar, Department of Mathematics, KTH, October 2019.**Implicit Regularization for Optimal Sparse Recovery**, London Machine Learning Meetup, September 2019.**Implicit Regularization for Optimal Sparse Recovery**, Theory, Algorithms and Computations of Modern Learning Systems workshop, DALI/ELLIS, September 2019.**On the Interplay between Statistics, Computation and Communication in Decentralised Learning**, Optimization and Statistical Learning workshop (OSL 2019), Les Houches School of Physics. [slides]**On the Interplay between Statistics, Computation and Communication in Decentralised Learning**, School of Mathematics, University of Bristol, March 2019.**On the Interplay between Statistics, Computation and Communication in Decentralised Learning**, Algorithms & Computationally Intensive Inference Seminar, University of Warwick, February 2019.**Multi-Agent Learning: Implicit Regularization and Order-Optimal Gossip**, Theory and Algorithms in Data Science, The Alan Turing Institute, August 2018.**Multi-Agent Learning: Implicit Regularization and Order-Optimal Gossip**, Statistical Scalability Programme, Isaac Newton Institute, June 2018.**Multi-Agent Learning: Implicit Regularization and Order-Optimal Gossip**, Statistics Seminar Series, Department of Decision Sciences, Bocconi University, May 2018.**Distributed and Decentralised Learning: Generalisation and Order-Optimal Gossip**, Amazon Berlin, April 2018.**Locality and Message Passing in Network Optimization**, Workshop on Optimization vs Sampling, The Alan Turing Institute, February 2018.**Accelerated Consensus via Min-Sum Splitting**, Statistics Seminar, University of Cambridge, November 2017.**Accelerating message-passing using global information**, OxWaSP Workshop, University of Warwick, October 2017.**Accelerating message-passing using global information**, StatMathAppli 2017, Statistics Mathematics and Applications, Fréjus, September 2017.**Accelerated Min-Sum for consensus**, Large-Scale and Distributed Optimization, LCCC Workshop, Lund University, June 2017.**Message-passing in convex optimization**, WINRS conference, Brown University, March 2017.**Min-Sum and network flows**, Workshop on Optimization and Inference for Physical Flows on Networks, Banff International Research Station, March 2017.**Locality and message-passing in network optimization**, DISMA, Politecnico di Torino, January 2017.**Locality and message-passing in network optimization**, LIDS Seminar Series, MIT, November 2016.**Locality and message-passing in network optimization**, Probability Seminar, Division of Applied Mathematics, Brown University. November 2016.**Message-passing in network optimization**, YINS Seminar Series, Yale University, November 2016.**Tractable Bayesian computation in high-dimensional graphical models**, Mathematical Sciences Department, IBM Thomas J. Watson Research Center, June 2016.**From sampling to learning submodular functions**, 2016 New England Statistics Symposium (NESS), Yale University, April 2016.**Scale-free sequential Monte Carlo**, Seminar on particle methods in Statistics, Statistics Department, Harvard University, April 2016.**Decay of correlation in network flow problems**, 50th Annual Conference on Information Sciences and Systems (CISS 2016), Princeton University, March 2016.**Locality in network optimization**, INFORMS, Philadelphia, November 2015.**Local algorithms in high-dimensional models**, Statistics Department, University of Oxford, September 2015.**Killed random walks and graph Laplacians: local sensitivity in network flow problems**, Yale Probabilistic Networks Group seminar, Statistics Department, Yale University, September 2015.**Decay of correlation in graphical models; algorithmic perspectives**, School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, August 2015.**Fast mixing for discrete point processes**, 28th Annual Conference on Learning Theory (COLT 2015), Université Pierre et Marie Curie, July 2015. [poster] [video]**Filtering compressed signal dynamics in high dimension**, 45th Annual John H. Barrett Memorial Lectures, University of Tennessee, May 2015.**On the role of the Hessian of submodular functions**, Yale Probabilistic Networks Group seminar, Statistics Department, Yale University, April 2015.**Submodular functions, from optimization to probability**, Probability Theory and Combinatorial Optimization, The Fuqua School of Business, Duke University, March 2015.**Estimating conditional distributions in high dimension**, Applied Mathematics seminar, Yale University, October 2014.**Nonlinear filtering in high dimension**, Yale Probabilistic Networks Group seminar, Statistics Department, Yale University, September 2014.**Particle filters and curse of dimensionality**, Monte Carlo Inference for Complex Statistical Models workshop, Isaac Newton Institute for Mathematical Sciences, University of Cambridge, April 2014. [slides] [video]**Particle filters and curse of dimensionality**, Cambridge Machine Learning Group, University of Cambridge, February 2014.**New phenomena in nonlinear filtering**, Yale Probabilistic Networks Group seminar, Statistics Department, Yale University, February 2014.**Filtering in high dimension**, Cornell Probability Summer School, Cornell University, July 2013.

At the **University of Oxford**, I regularly organize reading groups on learning theory and statistical optimization:

In March 2023, together with Ciara Pike-Burke, I served as Module Leader for a doctoral course on online learning, bandits, and reinforcement learning. Invited speakers: Nicolò Cesa-Bianchi (University of Milan), Tor Lattimore (DeepMind), Gergely Neu (Universitat Pompeu Fabra).

In 2022, I served as a research supervisor for UNIQ+ DeepMind summer interns.

Since 2018, I have been designing and teaching Algorithmic Foundations of Learning, for which I received the 2019 Oxford MPLS Teaching Award.

In Spring 2021, I taught Simulation and Statistical Programming. In Spring 2018, I taught Advanced Simulation Methods.

Since 2017, I have been teaching probability theory, statistics, and graph theory as part of my tutorial duties at University College Oxford.

I regularly contribute to Maths/Stats Open Days at Oxford. Here is a 2021 video to introduce statistics to high-school students via machine learning and the multi-armed bandit problem.

At **Yale University**, in Fall 2016 I served as the Head Instructor for CS50 — Introduction to Computing and Programming — taught jointly with Harvard University. This is one of the largest classes offered at Yale and Harvard.
Here is an article in the Yale Daily News. Here is the intro class in Machine Learning and Python, or its VR version.
I was a member of the Yale Postdoctoral Association, and for three years in a row, from 2015 to 2017, I organized the Julia Robinson Mathematics Festival at Yale, a celebration of ideas and problems in mathematics that enable junior high and high school students to explore fun math in a non-competitive setting.

At **Princeton University**, in 2013 I received the Excellence in Teaching Award from the Princeton Engineering Council while serving as Head Teaching Assistant for ORF 309 (Probability and Stochastic Systems) at Princeton University. I was also a fellow of the McGraw Center for Teaching and Learning at Princeton University.