# Part C Synopses 2018-2019

**Recommended Prerequisites: **A8 Probability, SB3.1 (formerly SB3a) Applied Probability would be helpful.

**Course Term: **Michaelmas

**Number of Lectures: **16

**Level:** M-level

**Weight:** One Unit

**Method of Assessment:** Written examination

**Aims & Objectives**

The aim of the lectures is to introduce modern stochastic models in mathematical population genetics and give examples of real world applications of these models. Stochastic and graph theoretic properties of coalescent and genealogical trees are studied in the first eight lectures. Diffusion processes and extensions to model additional key biological phenomena are studied in the second eight lectures.

**Synopsis**

Evolutionary models in Mathematical Genetics: The Wright-Fisher model. The Genealogical Markov chain describing the number ancestors back in time of a collection of DNA sequences.

The Coalescent process describing the stochastic behaviour of the ancestral tree of a collection of DNA sequences. Mutations on ancestral lineages in a coalescent tree. Models with a variable population size.

The frequency spectrum and age of a mutation. Ewens’ sampling formula for the probability distribution of the allele configuration of DNA sequences in a sample in the infinitely-many-alleles model. Hoppe’s urn model for the infinitely-many-alleles model.

The infinitely-many-sites model of mutations on DNA sequences. Gene trees as perfect phylogenies describing the mutation history of a sample of DNA sequences. Graph theoretic constructions and characterizations of gene trees from DNA sequence variation. Gusfield’s construction algorithm of a tree from DNA sequences. Examples of gene trees from data.

Modelling biological forces in Population Genetics: Recombination. The effect of recombination on genealogies. Detecting recombination events under the infinitely-many-sites model. Hudson’s algorithm. Haplotype bounds on recombination events. Modelling recombination in the Wright-Fisher model. The coalescent process with recombination: the ancestral recombination graph. Properties of the ancestral recombination graph.

Introduction to diffusion theory. Tracking mutations forward in time in the Wright-Fisher model. Modelling the frequency of a neutral mutation in the population via a diffusion process limit. The generator of a diffusion process with two allelic types. The probability of fixation of a mutation. Genic selection. Extension of results from neutral to selection case. Behaviour of selected mutations.

**Reading**

- R. Durrett, Probability Models for DNA Sequence Evolution, Springer, 2008
- A.Etheridge, Some Mathematical Models from Population Genetics. Ecole d’Eté de Probabilities de Saint-Flour XXXIX-2009, Lecture Notes in Mathematics, 2012
- W. J. Ewens, Mathematical Population Genetics, 2nd Ed, Springer, 2004
- J. R. Norris, Markov Chains, Cambridge University Press, 1999
- M. Slatkin and M. Veuille, Modern Developments in Theoretical Population Genetics, Oxford Biology, 2002
- S. Tavaré and O. Zeitouni, Lectures on Probability Theory and Statistics, Ecole d’Eté de Probabilities de Saint-Flour XXXI – 2001, Lecture Notes in Mathematics 1837, Springer, 2004

**Recommended Prerequisites:** A8 Probability and A9 Statistics

**Course Term: **Michaelmas

**Number of Lectures: **14. For this course, 2 lectures and 2 intercollegiate classes are replaced by 2 practical classes. (The total time for this course is the same as for other Part C courses.)

**Level:** M-level

**Weight: **One Unit

**Method of Assessment:** Written examination

**Aims and Objectives**

Many data come in the form of networks, for example friendship data and protein-protein interaction data. As the data usually cannot be modelled using simple independence assumptions, their statistical analysis provides many challenges. The course will give an introduction to the main problems and the main statistical techniques used in this field. The techniques are applicable to a wide range of complex problems. The statistical analysis benefits from insights which stem from probabilistic modelling, and the course will combine both aspects.

**Synopsis**

Exploratory analysis of networks. The need for network summaries. Degree distribution, clustering coefficient, shortest path length. Motifs.

Probabilistic models: Bernoulli random graphs, geometric random graphs, preferential attachment models, small world networks, inhomogeneous random graphs, exponential random graphs.

Small subgraphs: Stein’s method for normal and Poisson approximation. Branching process approximations, threshold behaviour, shortest path between two vertices.

Statistical analysis of networks: Sampling from networks. Parameter estimation for models. Inference from networks: vertex characteristics and missing edges. Nonparametric graph comparison: subgraph counts, subsampling schemes, MCMC methods. A brief look at community detection. Examples: protein interaction networks, social ego-networks.

**Reading**

- R. Durrett, Random Graph Dynamics, Cambridge University Press,2007
- P. Grindrod, Mathematical Underpinnings of Analytics, Oxford University Press, 2015
- R.v.d. Hofstad, Random Graphs and Complex Networks, Manuscript available at http://www.win.tue.nl/~rhofstad/
- E.D Kolaczyk and G. Csádi, Statistical Analysis of Network Data with R, Springer, 2014
- M. Newman, Networks: An Introduction. Oxford University Press, 2010

**Recommended Prerequisites: **The course requires a good level of mathematical maturity. Students are expected to be familiar with core concepts in statistics (regression models, bias-variance tradeoff, Bayesian inference), probability (multivariate distributions, conditioning) and linear algebra (matrix-vector operations, eigenvalues and eigenvectors). Previous exposure to machine learning (empirical risk minimisation, dimensionality reduction, overfitting, regularisation) is highly recommended.

Students would also benefit from being familiar with the material covered in the following courses offered in the Statistics department: SB2.1 (formerly SB2a) Foundations of Statistical Inference and in SB2.2 (formerly SB2b) Statistical Machine Learning.

**Course Term: **Hilary

**Number of Lectures: **16

**Level:** M-level

**Weight: **One Unit

**Methods of Assessment:** Written examination.

**Aims and Objectives**

Machine learning is widely used across many scientific and engineering disciplines, to construct methods to find interesting patterns and to predict accurately in large datasets.

This course introduces several widely used data machine learning techniques and describes their underpinning statistical principles and properties. The course studies both unsupervised and supervised learning and several advanced topics are covered in detail, including some state-of-the-art machine learning techniques. The course will also cover computational considerations of machine learning algorithms and how they can scale to large datasets.

**Synopsis**

Convex optimisation and support vector machines. Loss functions. Empirical risk minimisation.

Kernel methods and reproducing kernel Hilbert spaces. Representer theorem. Representation of probabilities in RKHS.

Nonlinear dimensionality reduction: kernel PCA, spectral clustering.

Probabilistic and Bayesian machine learning: mixture modelling, information theoretic fundamentals, EM algorithm, Probabilistic PCA. Variational Bayes. Laplace Approximation.

Collaborative filtering models, probabilistic matrix factorisation.

Gaussian processes for regression and classification. Bayesian optimisation.

(+Latent Dirichlet allocation [if time allows])

**Software**

Knowledge of Python is not required for this course, but some examples may be done in Python. Students interested in learning Python are referred to the following free University IT online courses, which should ideally be taken before the beginning of this course: https://help.it.ox.ac.uk/courses/index

**Reading**

- C. Bishop, Pattern Recognition and Machine Learning, Springer,2007
- K. Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012

Further Reading

- T. Hastie, R. Tibshirani, J Friedman, Elements of Statistical Learning, Springer, 2009
- Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011, http://scikit-learn.org/stable/tutorial/

**Recommended Prerequisites: **The course requires a good level of mathematical maturity as well as some statistical intuition and background knowledge to motivate the course. Students are expected to be familiar with core concepts from probability (conditional probability, conditional densities, properties of conditional expectations, basic inequalities such as Markov’s, Chebyshev’s and Cauchy-Schwarz’s, modes of convergence), basic limit theorems from probability in particular the strong law of large numbers and the central limit theorem, Markov chains, aperiodicity, irreducibility, stationary distributions, reversibility and convergence. Most of these concepts are covered in courses offered in the Statistics department, in particular prelims probability, A8 probability and SB3.1 (formerly SB3a) Applied Probability.

Familiarity with basic Monte Carlo methods will be helpful, as for example covered in A12 Simulation and Statistical Programming.

Some familiarity with concepts from Bayesian inference such as posterior distributions will be useful in order to understand the motivation behind the material of the course.

**Course Term: **Hilary

**Number of Lectures: **16

**Level:** M-level

**Weight: **One Unit

**Methods of Assessment:** Written examination

**Aims and Objectives**

The aim of the lectures is to introduce modern simulation methods.

This course concentrates on Markov chain Monte Carlo (MCMC) methods and Sequential Monte Carlo (SMC) methods. Examples of applications of these methods to complex inference problems will be given.

**Synopsis**

Classical methods: inversion, rejection, composition.

Importance sampling.

MCMC methods: elements of discrete-time general state-space Markov chains theory, Metropolis-Hastings algorithm.

Advanced MCMC methods: Gibbs sampling, slice sampling, tempering/annealing, Hamiltonian (or Hybrid) Monte Carlo, pseudo-marginal MCMC.

Sequential importance sampling.

SMC methods: nonlinear filtering.

**Reading**

- C.P. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd edition, Springer-Verlag, 2004

Further reading

- J.S. Liu, Monte Carlo Strategies in Scientific Computing, Springer-Verlag, 2001

**Recommended Prerequisites: **The basics of Markov chains (in particular, conditional independence) from Part A Probability is assumed. Likelihood theory, contingency tables, and likelihood-ratio tests are also important; this is covered in Part A Statistics. Knowledge of exponential families and linear models , as covered in SB2.1 (formerly SB2a) Foundations of Statistical Inference and SB1.1 (formerly SB1a) Applied Statistics, would be useful, but is not essential.

**Course Term: **Michaelmas

**Number of Lectures: **16

**Level:** M-level

**Weight: **One Unit

**Method of Assessment:** Written examination

**Aims and Objectives**

This course will give an overview of the use of graphical models as a tool for statistical inference. Graphical models relate the structure of a graph to the structure of a multivariate probability distribution, usually via a factorisation of the distribution or conditional independence constraints.

This has two broad uses: first, conditional independence can provide vast savings in computational effort, both in terms of the representation of large multivariate models and in performing inference with them; this makes graphical models very popular for dealing with big data problems. Second, conditional independence can be used as a tool to discover hidden structure in data, such as that relating to the direction of causality or to unobserved processes. As such, graphical models are widely used in genetics, medicine, epidemiology, statistical physics, economics, the social sciences and elsewhere.

Students will develop an understanding of the use of conditional independence and graphical structures for dealing with multivariate statistical models. They will appreciate how this is applied to causal modelling, and to computation in large-scale statistical problems.

**Synopsis**

Independence, conditional independence, graphoid axioms

Exponential families, mean and canonical parameterisations, moment matching; contingency tables, log-linear models.

Undirected graphs, cliques, paths; factorisation and Markov properties, Hammersley-Clifford Theorem (statement only)

Trees, cycles, chords, decomposability, triangulation. Maximum likelihood in decomposable models, iterative proportional fitting.

The multivariate Gaussian distribution and Gaussian graphical models.

Directed acyclic graphs, factorisation. Paths, d-separation, moralisation. Ancestral sets and sub-models. Decomposable models as intersection of directed and undirected models.

Running intersection property, Junction trees; message passing, computation of marginal and

conditional probabilities, introduction of evidence. Gibbs sampling.

Causal models, structural equations, interventions, constraint-based learning, faithfulness. The trek rule and back-door adjustment.

**Reading**

- S.L. Lauritzen, Graphical Models, Oxford University Press, 1996
- T. Hastie, R Tibshirani and J.Friedman, Elements of Statistical Learning, 2nd edition, Springer 2009

(available for free at https://statweb/stanford.edu/~tibs/ElemStatLearn/) - D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009
- J. Pearl, Causality, 3rd edition, Cambridge University Press, 2013
- M.J. Wainwright and M.I. Jordan, Graphical Models, Exponential Families, and Variational Inference, Foundations and Trends in Machine Learning, 2008 (available for free at https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf)

**Recommended Prerequisites:** SB2.1 (formerly SB2a) Foundations of Statistical Inference is desirable, of which 6 lectures on Bayesian inference, decision theory and hypothesis testing with loss functions are assumed knowledge. A12 Simulation and Statistical Programming desirable.

**Course Term: **Hilary

**Number of Lectures: **16

**Level:** M-level

**Weight: **One Unit

**Method of Assessment:** Written examination

**Synopsis**

Theory: Decision-theoretic foundations, Savage axioms. Prior elicitation, exchangeability. Bayesian Non-Parametric (BNP) methods, the Dirichlet process and the Chinese Restaurant Process. Asymptotics, information criteria and the Bernstein-von Mises approximation.

Computational methods: Bayesian inference via MCMC; Estimation of marginal likelihood; Approximate Bayesian Computation and intractable likelihoods; reversible jump MCMC.

Case Studies: extend understanding of prior elicitation, BNP methods and asymptotics through a small number of substantial examples. Examples to further illustrate building statistical models, model choice, model averaging and model assessment, and the use of Monte Carlo methods for inference.

**Reading**

- C.P. Robert,The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd edition, Springer, 2001

Further Reading

- A. Gelman et al, Bayesian Data Analysis, 3rd edition, Boca Raton Florida: CRC Press, 2014
- P Hoff, A First Course in Bayesian Statistical Methods, Springer, 2010
- DeGroot, Morris H., Optimal Statistical Decisions. Wiley Classics Library. 2004.

**Recommended Prerequisites: **The lectures attempt to be self-contained but clearly knowledge algorithms, combinatorics and probability theory – A8 Probability and SB3.1 (formerly SB3a) Probability- would be a help. The course requires a good level of mathematical maturity.

**Course Term: **Hilary

**Number of Lectures: **16

**Level:** M-level

**Weight: **One Unit

**Method of Assessment:** Mini-project

**Aims & Objectives**

Modern molecular biology generates large amounts of data, such as sequences, structures and expression data, that needs different forms of statistical analysis and modelling to be properly interpreted. This course focuses on four topics within this vast area: Molecular Dynamics, Molecule Enumeration, Comparative Biology and Overview of Computational Biology and Computational Neurosciences.

**Synopsis**

Overview of Computational Biology and Computational Neuroscience – Computational Biology is a very large and diverse field: Basically all the fields of biology where computation has started to be essential. Computational Neuroscience is in massive growth, but has a history going back to the 1940s with publications like McCullogh and Pitts (1943) paper on neural networks. The present progress is driven by progress on three fronts: (i) Experimental data on brains, nerve systems and individual neurons, (ii) increased success in designing artificial neural networks with an increasing variety in architectures with applications in Deep Learning/AI and (iii) the simulate very complex models as models of biological neural networks.

Molecular Dynamics (MD) – MD is another huge application area that describes the dynamics of molecules with few to thousands of atoms, for very short time periods like microseconds to nanoseconds. MD is bound to continue to grow for decades, and stochastic methods are central in exploring a large configuration space. The lectures are in Hamiltonian Dynamics; the canonical distribution and stochastic differential distributions, the Langevin model for Brownian motion and comparison of MD trajectories.

Molecule Enumeration – How many molecules are possible with a given number of atoms, from the set of carbon/nitrogen/phosphorus/oxygen/sulphur (CNPOS)? This question is central in drug design and has many statistical problems embedded. Exhaustive enumeration is at present limited to molecules with 18 CNPOS atoms, and including one more atom expands the numbers about a factor of 10-100 at this point. But there are many other possible avenue such as sampling or exploring a subspace generated by an initial set of molecules and a set of reactions. There are many advanced issues in counting molecules such Polya Counting and imposing constraints making molecular graphs embeddable in 3D.

Comparative Biology – Phylogenetics and comparative genomics have been the important areas of the last 15-20 years as a consequence of the growth to sequence data. However, there are other levels of data and biological organisation that are as least as interesting: protein structures, networks, shapes, movements. The lectures include models of evolution of these data types; the so-called COMPARATIVE MODEL; and simultaneous modelling of several levels.

**Reading**

The teaching material from 2018 would be useful to browse, but the 2019 will have some change in syllabus and improvements:

Topics in Computational Biology Course 18

Further Reading

- M. Steel, Phylogeny: Discrete and Random Processes in Evolution, chapt 1-2, SIAM Press (2003).
- T. Schlick, Molecular Modeling and Simulation. Chapt 13-14, Springer (2010).
- M. Meringer “Structure Enumeration and Sampling” chapt. 8 in Handbook in Chemoinformatics Algorithms (eds Faulon) (2010). Chapman and Hall.
- B.C. O’Meara Evolutionary Inferences from Phylogenies: A Review of Methods, Annu. Rev. Ecol. Evol. Syst. 2012. 43:267–85.

**Recommended Prerequisites: **Discrete and continuous time Markov process on countable state space, as covered for example in A8 Probability and SB3.1 (formerly SB3a) Applied Probability.

**Course Term: **Michaelmas

**Number of Lectures:** 16

**Level:** M-level

**Weight: **One Unit

**Method of Assessment:** Written examination

**Aims and Objectives**

The aim is to introduce fundamental probabilistic and combinatorial tools, as well as key models, in the theory of discrete disordered systems. We will examine the large-scale behaviour of systems containing many interacting components, subject to some random noise. Models of this type have a wealth of applications in statistical physics, biology and beyond, and we will see several key examples in the course. Many of the tools we will discuss are also of independent theoretical interest, and have far reaching applications. For example, we will study the amount of time it takes for this process to reach the stationary distribution (mixing time). This concept is also important in many statistical applications, such as studying the run time of MCMC methods.

**Synopsis**

- Percolation and phase transitions.
- Uniform spanning trees, loop-erased random walks, and Wilson’s algorithm.
- Random walks on graphs and electrical networks (discrete potential theory).
- Important models: Stochastic Ising model (Glauber dynamics), Random-cluster model, Contact process, Exclusion process, Hard-core model.
- Important tools: Monotonicity, coupling, duality, FKG inequality.
- Gibbs measures and a discussion of phase transitions in this context.
- Mixing times and the spectral gap.

**Reading**

- G. Grimmett, Probability on Graphs: random processes on graphs and lattices, Cambridge University Press, 2010; 2017 (2nd edition).
- B. Bollobás, O. Riordan, Percolation, Cambridge University Press, 2006.
- T. Liggett, Continuous time Markov Processes: an introduction, American Mathematical Society, 2010.
- D. A. Levin, Y. Peres, E. L. Wilmer, Markov Chains and Mixing Times, American Mathematical Society, 2009.
- P. G. Doyle, J. L. Snell, Random Walks and Electric Networks, Mathematical Association of America, 1984.

**Recommended Prerequisites: **The course requires a good level of mathematical maturity. Students are expected to be familiar with core concepts in probability (basic properties of probabilities, such as union bounds, and of conditional expectations, such as the tower property; basic inequalities, such as Markov’s and Jensen’s), statistics (confidence intervals, hypothesis testing), and linear algebra (matrix-vector operations, eigenvalues and eigenvectors; basic inequalities, such as Cauchy-Schwarz’s and Hölder’s). Previous exposure to machine learning (empirical risk minimisation, overfitting, regularisation) is recommended.

Students would benefit from being familiar with the material covered in SB2.1 (formerly SB2a) Foundations of Statistical Inference (in particular, Decision Theory) and in SB2.2 (formerly SB2b) Statistical Machine Learning.

**Course Term: **Michaelmas

**Number of Lectures: **16

**Level:** M-level

**Weight: **One Unit

**Method of Assessment:** Written examination

**Aims and Objectives**

This course is meant to provide a rigorous theoretical account of the main ideas underlying machine learning, and to offer a principled framework to understand the algorithmic paradigms being used, involving tools from probability, statistics, and optimisation in high-dimension.

**Synopsis**

- Statistical learning frameworks for prediction, estimation and online learning.
- Probability
- Maximal inequalities.
- Rademacher and Gaussian complexities.
- Elements of VC theory.
- Covering and packing numbers.
- Chaining.
- Concentration inequalities.

- Statistics
- Bayes decision rules.
- Empirical risk minimisation. Error decomposition: generalisation, optimisation, and approximation.
- Learning via uniform convergence, margin bounds, and algorithmic stability.
- Structural regularisation: constrains and penalisation.
- Implicit/algorithmic regularisation.
- Convex loss surrogates.
- Slow and fast rates.
- Minimax lower bounds and hypothesis testing.

- Optimisation
- Elements of convex theory.
- Oracle model. Gradient descent. Mirror descent.
- Stochastic oracle model. Stochastic gradient descent. Stochastic mirror descent. Variance reduction techniques.
- Approximate Message Passing.
- Online optimisation.

- Examples
- Linear predictors, including Boosting.
- Non-linear predictors, including Support Vector Machines and Neural Networks.
- High-dimensional estimators for sparse and low-rank problems, including Lasso.
- Online learning, including multi-armed bandit problems and algorithms.

**Reading**

- Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014
- Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning, 2015
- Ramon van Handel. Probability in High Dimension. Lecture notes available online (http://www.princeton.edu/~rvan/APC550.pdf), 2016
- Tor Lattimore and Csaba Szepesvári. Bandit Algorithms. Book available online http://downloads.tor-lattimore.com/book.pdf), 2018.

Mathematics and Statistics Synopses for Part C 2018-2019 [PDF]