Back to top

Previous Distinguished Speaker Seminars

01 Jan 70

Previous Distinguished Speaker Seminars:


Speaker:  Yingying Fan, University of Southern California, USA

Title:         RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs

Abstract: Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we provide theoretical foundations on the power and robustness for the model- free knockoffs procedure introduced recently in Cand`es, Fan, Janson and Lv (2016) in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-free knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knock- offs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real data set is analyzed to further assess the performance of the suggested knockoffs procedure.


Speaker:  Patricia Bouret, Centre national de la recherché scientifique (CNRS), France

Title:         Family wise separation rates for multiple testing

Abstract:  Starting from a parallel between some minimax adaptive tests of a single null hypothesis, based on aggregation approaches, and some tests of multiple hypotheses, we propose a new second kind error-related evaluation criterion, as the core of an emergent minimax theory for multiple tests. Aggregation-based tests are justified through their first kind error rate, which is controlled by the prescribed level on the one hand, and through their separation rates over various classes of alternatives, rates that are minimax on the other hand. We show that these tests can be viewed as the first steps of classical step-down multiple testing procedures, and ac-cordingly be evaluated from the multiple testing point of view also, through a control of their Family-Wise Error Rate (FWER). Conversely, many multiple testing procedures, from the historical ones of Bonferroni and Holm, to more recent ones like min-p procedures or randomized procedures, can be investigated from the minimax adaptive testing point of view. To this end, we extend the notion of separation rate to the multiple testing field, by defining the weak Family-Wise Separation Rate and its stronger counterpart, the Family-Wise Separation Rate (FWSR). As for non-parametric tests of a single null hypothesis, we prove that these new concepts allow an accurate analysis of the second kind error of a multiple testing procedure, leading to clear definitions of minimax and minimax adaptive multiple tests. Some illustrations in a classical Gaussian framework corroborate several expected results under particular conditions on the tested hypotheses, but also lead to more surprising results.This is a joint work with M. Fromont and M. Lerasle.



Speaker: David Swafford, Senior Research Scientist, Biological Sciences, Duke University

Title: Generality and Robustness of the SVDQuartets Method for Phylogenetic Species Tree Estimation

Abstract: Methods for inferring evolutionary trees based on phylogenetic invariants were first proposed nearly three decades ago, but have been virtually ignored by biologists. A new invariants-based method for estimating species trees under the multispecies coalescent model was recently developed by Julia Chifman and Laura Kubatko, building on earlier work by Elizabeth Allman, John Rhodes, and Nicholas Eriksson. This method comes from algebraic statistics and uses singular value decomposition to estimate the rank of matrices of site pattern frequencies. Although the approach shows great promise, its performance on empirical and simulated data sets has not been adequately evaluated.

I will give a general introduction to the SVDQuartets method and present some results from a simulation study currently in progress (collaboration with Laura Kubatko and Colby Long) that demonstrate that SVDQuartets is potentially highly robust to deviations from the standard evolutionary models assumed by other species-tree estimation methods.


Speaker: Wim Hordijk, The KLI Institute, Klosterneuburg, Austria

Title: Autocatalytic Sets and the Origin of Life

Abstract: The main paradigm in origin of life research is that of an RNA world, where the idea is that life started with one or a few self-replicating RNA molecules. However, so far nobody has been able to show that RNA can catalyze its own template-directed replication. What has been shown experimentally, though, is that certain sets of RNA molecules can mutually catalyze each other’s formation from shorter RNA fragments. In other words, rather than having each RNA molecule replicate itself, they all help each other’s formation from basic building blocks, in a self-sustaining network of molecular cooperation.

Such a cooperative molecular network is an instance of an autocatalytic set, a concept that was formalized and studied mathematically and computationally as RAF theory.This theory has shown that autocatalytic sets are highly likely to exist in simple polymer models of chemical reaction networks, and that such sets can, in principle, be evolvable due to their hierarchical structure of many autocatalytic subsets. Furthermore, the framework has been applied succesfully to study real chemical and biological examples of autocatalytic sets.

In this talk I will give a general (and gentle) introduction to RAF theory, present its main results and how they could be relevant to the origin of life, and argue that the framework could possibly also be useful beyond chemistry, such as in analyzing ecosystems or even economic systems.


Speaker: Professor Peter Hoff, Department of Statistics, Duke University

Title: Adaptive FAB confidence intervals with constant coverage

Abstract: Confidence intervals for the means of multiple normal populations are often based on a hierarchical normal model. While commonly used interval procedures based on such a model have the nominal coverage rate on average across a population of groups, their actual coverage rate for a given group will be above or below the nominal rate, depending on the value of the group mean.

In this talk I present confidence interval procedures that have constant frequentist coverage rates and that make use of information about across-group heterogeneity, resulting in constant-coverage intervals that are narrower than standard t-intervals on average across groups.
These intervals are obtained by inverting Bayes-optimal frequentist tests, and so are “frequentist, assisted by Bayes” (FAB). I present some asymptotic optimality results and some extensions to other scenarios, such as linear regression and tensor analysis.



Speaker: Professor Wendelin Werner, ETH Zürich, Switzerland

Title: Random cracks in space.

Abstract: We will describe in non-technical terms some old and new ideas about what basic natural random objects and fields one can define in a given space with some geometric structure, and what one can do with them. This will probably include various joint recent and ongoing work with Jason Miller, Scott Sheffield, Qian Wei and Titus Lupu.


Speaker: Professor Jean-Philippe Vert, Mines ParisTech, France

Title: Machine learning for patient stratification from genomic data

Abstract: As the cost and throughput of genomic technologies reach a point where DNA sequencing is close to becoming a routine exam at the clinics, there is a lot of hope that treatments of diseases like cancer can dramatically improve by a digital revolution in medicine, where smart algorithms analyze « big medical data » to help doctors take the best decisions for each patient. The application of machine learning-based techniques to genomic data raises however numerous computational and mathematical challenges that I will illustrate on a few examples of cancer patient stratification from gene expression or somatic mutation profiles.


Speaker: Professor Martin Hairer, University of Warwick

Title: A BPHZ theorem for stochastic PDEs

Abstract: A classical result obtained in the 50’s and 60’s by Bogoliubov, Parasiuk, Hepp and Zimmerman provides a prescription on how to renormalise amplitudes of Feynman diagrams arising in perturbative quantum field theory in a consistent way. We will discuss an analogue of this theorem which has both an analytic and a probabilistic interpretation. In particular, we will see that it implies that the solutions to a large class of nonlinear stochastic PDEs depend on their driving noise in a surprisingly rigid way. This rigidity is a mathematical manifestation of the “universality” taken for granted when building our intuition on the large-scale behaviour of probabilistic models.



Speaker: Professor Susan Holmes, Stanford University

Title: Statistical Challenges posed by the Human Microbiome

Abstract: We propose a new statistical workflow for the analyses of bacterial strains in longitudinal data analyses of data from the Human Microbiome. This includes using hierarchical mixtures for abundance modeling, hierarchical testing strategies and the propagation of uncertainty through the analyses to the ordination plots. We use a combination of normalization and Bayesian methods that incorporate estimates of uncertainty due to sample library size differences and differences in precision for different types of samples. We will show applications to the study of the vaginal microbiome and prediction of preterm birth.

This contains joint work with Ben Callahan, Kris Sankaran, Lan Nguyen, Julia Fukuyama, Sergio Bacallado, Stefano Favaro, Lorenzo Trippa and Boyu Ren and the Relman Lab at Stanford.


Speaker: Professor Matthew Stephens, Department of Human Genetics, University of Chicago

Title: “Come join the multiple testing party!”

Abstract: Multiple testing is often described as a “burden”. My goal is to convince you that multiple testing is better viewed as an opportunity, and that instead of laboring under this burden you should be looking for ways to exploit this opportunity. I invite you to a multiple testing party.


Speaker: Professor Richard Durbin, Wellcome Trust Sanger Institute, Cambridge

Title: Inferring population history from whole genome sequences

Abstract: Genome sequences carry genetic information to make an organism, but they are also products of evolution and as such carry information about the genetic history of individuals and species. In recent years analysis of genome sequence data has told us much about the origins of human populations across the world, their migrations and intermixing with other populations, including with archaic hominins such as Neanderthals and Denisovans. However, we are still at the beginning of the process of interpreting genetic history from genome sequences. To extract this information requires use of statistical analysis methods that make use of efficient approximations to population genetic models. I will discuss a series of methods to infer population history from whole genome sequence, with a particular emphasis on cases where there is gene flow or introgression between ancestral populations. I will present a new method based on hidden Markov models to infer ancestral introgression from deeply diverged populations, illustrated with an application to recently obtained genome sequences of Papuans and aboriginal Australians (Malaspinas et al., 2016).