Trinity term 2016


Week 1:      Friday 29th April, 3.30 p.m. 

Speaker:    Professor Sayan Mukherjee, Department of Statistical Science, Duke University
Title:           Stochastic topology and inference
Abstract:    We will discuss three examples where stochastic topology is relevant to statistical inference or probability theory.

Modeling surfaces: Given morphological data in the form of meshes we introduce two related statistical summaries, the Euler Characteristic Transform and the Persistent Homology Transform. We use the PHT and ECT to represent shapes and execute operations such as computing distances between shapes. We prove the transforms are injective maps and satisfy a formal definition of statistical sufficiency. In addition, the ECT provides for a simple exponential family formulation which allows for likelihood based modeling of surfaces without the need for landmarks. We present results on a set heel bones of 106 extinct and extant primates.

Percolation on manifolds: Given n points drawn from a point process on a manifold, consider the  random set which consists of the union of balls of radius r around the points. As n goes to infinity, r is sent to zero at varying rates. For this stochastic process we provide scaling limits and phase transitions on the counts of Betti numbers and critical points. This study falls into the category of higher-dimensional notions of percolation.

Random hypergraph models: It may be of interest to model conditional independence structure beyond graphs with the goal of capturing higher-order interactions. We develop a framework for posterior inference and prior specification for random hypergraphs using ideas from computational geometry and spatial point processes. We illustrate the utility of this approach on simulated data.

Joint work with: Katharine Turner and Doug Boyer, Omer Bobrowski, Simon Lunagomez, Robert Wolpert, Edo Airoldi

Week 4:      Wednesday 18th May, 3.30 p.m.

Speaker:    Professor Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, New York
Title:           The Statistical Crisis in Science
Abstract:    Top journals in psychology routinely publish ridiculous, scientifically implausible claims, justified based on “p < 0.05.”  And this in turn calls into question all sorts of more plausible, but not necessarily true, claims, that are supported by this same sort of evidence.  To put it another way:  we can all laugh at studies of ESP, or ovulation and voting, but what about MRI studies of political attitudes, or embodied cognition, or stereotype threat, or, for that matter, the latest potential cancer cure?  If we can’t trust p-values, does experimental science involving human variation just have to start over?  And what do we do in fields such as political science and economics, where preregistered replication can be difficult or impossible?  Can Bayesian inference supply a solution?  Maybe.  These are not easy problems, but they’re important problems.

Week 7:      Friday 10th June, 3.30 pm

Speaker:    Professor Gabor Lugosi, Barcelona Graduate School of Economics
Title:            How to estimate the mean of a random variable?
Abstract:   Given n independent, identically distributed copies of a random variable, one is interested in estimating the expected value.  Perhaps surprisingly, there are still open questions concerning this very basic problem in statistics.

In this talk we are primarily interested in non-asymptotic sub-Gaussian estimates for potentially heavy-tailed random variables. We discuss various estimates and extensions to high dimensions, empirical risk minimization, and multivariate problems. This talk is based on joint work with Emilien Joly, Luc Devroye, Matthieu Lerasle, and Roberto Imbuzeiro Oliveira.


Week 8:      Friday 17th June, 4 p.m. (Please note change of time)

Professor Andrea Montanari, Stanford University
Title:      Phase transitions in semi-definite relaxations

Abstract:  Statistical inference problems arising within signal processing, data mining, and machine learning naturally give rise to hard combinatorial optimization problems. These problems become intractable when the dimensionality of the data is large, as is often the case for modern datasets. A popular idea is to construct convex relaxations of these combinatorial problems, which can be solved efficiently for large scale datasets.
Semidefinite programming (SDP) relaxations are among the most powerful methods in this family, and are surprisingly well-suited for a broad range of problems where data take the form of matrices or graphs. It has been observed several times that, when the `statistical noise' is small enough, SDP relaxations correctly detect the underlying combinatorial structures.
I will present a few asymptotically exact predictions for the `detection thresholds' of SDP relaxations, with applications to synchronization and community detection. Apart from being successful in theory, SDP-based methods can be implemented on large instances: I will discuss an implementation that can be used to cluster graphs of size 10^5 in a matter of minutes.
[Based on Joint work with Adel Javanmard, Federico Ricci-Tersenghi and Subhabrata Sen]