# Part B Synopses 2018-2019

SB1.1 and SB1.2 Applied and Computational Statistics

Prerequisites: A8 Probability and A9 Statistics

Course Term: Michaelmas and Hilary

Number of Lectures: 26

Level: H-level

Weight: Double unit

Method of Assessment: Written examination plus assessed practical assignments. The practical assignments contribute 1/3 of the marks for SB1. Please see below for the hand-in deadlines for practical assignments.

Learning Outcomes

The course aims to develop the theory of statistical methods, and also to introduce students to the analysis of data using a statistical package. The main topics are: simulation based inference, practical aspects of linear models, logistic regression and generalized linear models, and computer-intensive methods.

SB1.1 Applied Statistics – 13 lectures MT

Synopsis

The normal linear model: use of matrices, least squares and maximum likelihood estimation, normal equations, distribution theory for the normal model, hypothesis tests and confidence intervals.

Practical aspects of linear models and analysis of variance: multiple regression, categorical variables and interactions, blocks and treatments, orthogonality, model selection (including AIC, but not the derivation of AIC), fit criteria, use of residuals, outliers, leverage, model interpretation.

Normal linear mixed models, hierarchical models.

Generalised Linear Models: logistic regression, linear exponential families and generalized linear models, scale parameter, link functions, canonical link. Maximum likelihood fitting. Iteratively reweighted least squares. Asymptotic theory: statement and applications to inference, analysis of deviance, model checking, residuals.

• A. C. Davison, Statistical Models, CUP, 2003
• J.J. Faraway, Linear Models with R, Chapman and Hall, 2005
• A. J. Dobson and A.G Barnett, An Introduction to Generalized Linear Models, Chapman and Hall, 2008
• J.J. Faraway, Extending the Linear Model with R : Generalized Linear, Mixed Effects and Nonparametric Regression Models, Chapman and Hall, 2006

• F. L. Ramsey and D. W. Schafer, The Statistical Sleuth: A Course in Methods of Data Analysis, 2nd edition, Duxbury, 2002.

SB1.2 Computational Statistics – 13 lectures HT

Synopsis

Smoothing methods (local polynomials). Nonparametric inference (bandwidth and Generalised Cross Validation).
Multivariate smoothers and Generalised Additive Models.

Inference using simulation methods. Monte-Carlo Tests. Permutation tests. Rank statistics.

Bootstrapping.

Hidden Markov Models: specification. Forward-backward algorithm. Kalman filter.

• J. D. Gibbons, Nonparametric Statistical Inference, Marcel Dekker, 1985, pp 1-193, 273- 290.
• G.H. Givens and J.A. Hoeting, Computational Statistics, 2nd edition, Wiley, 2012.
• G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer , 2013. This book is freely available online: http://www-bcf.usc.edu/~gareth/ISL/
• R. H. Randles and D. A. Wolfe, Introduction to the Theory of Nonparametric Statistics, Wiley 1979, pp 1-322.
• L. Wasserman, All of Nonparametric Statistics, Springer, 2005.
• L. Wasserman, All of Statistics, Springer, 2004.

• A.C. Davison and D.V. Hinkley, Bootstrap Methods and their Application, CUP, 1997.
• C.R. Shalizi, Advanced Data Analysis from an Elementary Point of View, http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/.

Practicals

In addition to the lectures there will be five supervised practicals. Four of these contain problems whose written solutions will be assessed as part of the unit examination.

The hand-in deadlines for the four assessed practicals are:
1st practical: 12 noon Monday week 8, Michaelmas Term 2018
2nd practical: 12 noon Monday week 2, Hilary Term 2019
3rd practical: 12 noon Monday week 8, Hilary Term 2019
4th practical: 12 noon Monday week 2, Trinity Term 2019.

Candidates who miss the above deadlines may ask their college to apply to the Head of the Department of Statistics for permission to submit late. Where there is a valid reason, the Head of Department would normally approve the late submission without penalty. Where it is deemed that there is no valid reason, the Head of Department will advise the Examiners to apply a penalty in accordance with the late penalty tariff found in the Mathematics and Statistics Examination Conventions.

SB2.1 Foundations of Statistical Inference

Prerequisites: A9 Statistics, A8 Probability

Course Term: Michaelmas

Number of Lectures: 16

Level: H-level

Weight: One unit

Method of Assessment: Written examination

Learning outcomes

Understanding how data can be interpreted in the context of a statistical model. Working knowledge and understanding of key-elements of model-based statistical inference, including awareness of similarities, relationships and differences between Bayesian and frequentist approaches.

Synopsis

Exponential families: Curved and linear exponential families; canonical parametrization; likelihood equations. Sufficiency: Factorization theorem; sufficiency in exponential families.

Frequentist estimation: unbiasedness; method of moments; the Cramer-Rao information inequality; Rao-Blackwell theorem: Lehmann-Scheffé Theorem and Rao-Blackwellization; Statement of complete sufficiency for Exponential families.

The Bayesian paradigm: likelihood principal; subjective probability; prior to posterior analysis; asymptotic normality; conjugacy; examples from exponential families. Choice of prior distribution: proper and improper priors; Jeffreys’ and maximum entropy priors. Hierarchical Bayes models.

Decision theory: risk function; Minimax rules, Bayes rules. Point estimators and admissibility of Bayes rules. The James-Stein estimator, shrinkage estimators and Empirical Bayes. Hypothesis testing as decision problem.

• P. H. Garthwaite, I. T. Jolliffe and Byron Jones, Statistical Inference, 2nd edition, Oxford University Press, 2002.
• G.A.Young and R.L. Smith, Essentials of Statistical Inference, Cambridge University Press, 2005.
• T. Leonard and J.S.J. Hsu, Bayesian Methods, Cambridge University Press, 2005.

• D. Barber, Bayes Reasoning and Machine Learning, Cambridge University Press, 2012.
• D. R. Cox, Principles of Statistical Inference, Cambridge University Press, 2006.
• H. Liero and S. Zwanzig, Introduction to the Theory of Statistical Inference, CRC Press, 2012.
SB2.2 Statistical Machine Learning

Prerequisites: A9 Statistics, A8 Probability. SB2a Foundations of Statistical Inference useful but not essential.

Course Term: Hilary

Number of Lectures: 16

Level: H-level

Weight: One unit

Method of Assessment: Written examination

Learning Outcomes

Machine learning studies methods that can automatically detect patterns in data, and then use these patterns to predict future data or other outcomes of interest. It is widely used across many scientific and engineering disciplines.

This course covers statistical fundamentals of machine learning, with a focus on supervised learning and empirical risk minimisation. Both generative and discriminative learning frameworks are discussed and a variety of widely used classification algorithms are overviewed.

Synopsis

Visualisation and dimensionality reduction: principal components analysis, biplots and singular value decomposition. Multidimensional scaling. K-means clustering.

Introduction to supervised learning. Evaluating learning methods with training/test sets. Bias/variance trade-off, generalisation and overfitting. Cross-validation. Regularisation. Performance measures, ROC curves. K-nearest neighbours as an example classifier.

Linear models for classification. Discriminant analysis. Logistic regression. Generative vs Discriminative learning. Naive Bayes models.

Decision trees, bagging, random forests, boosting.

Neural networks and deep learning.

• C. Bishop, Pattern Recognition and Machine Learning, Springer, 2007.
• T. Hastie, R. Tibshirani, J Friedman, Elements of Statistical Learning, Springer, 2009.
• K. Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012.

• B. D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, 1996.
• G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.
SB3.1 Applied Probability

Prerequisite: A8 Probability

Course Term: Hilary Term

Number of Lectures: 16

Level: H-level

Weight: One unit

Method of Assessment: Written examination

Learning Outcomes

This course is intended to show the power and range of probability by considering real examples in which probabilistic modelling is inescapable and useful. Theory will be developed as required to deal with the examples.

Synopsis

Poisson processes and birth processes. Continuous-time Markov chains. Transition rates, jump chains and holding times. Forward and backward equations. Class structure, hitting times and absorption probabilities. Recurrence and transience. Invariant distributions and limiting behaviour. Time reversal.

Renewal theory. Limit theorems: strong law of large numbers, strong law and central limit theorem of renewal theory, elementary renewal theorem, renewal theorem, key renewal theorem. Excess life, inspection paradox.

Applications in areas such as: queues and queueing networks – M/M/s queue, Erlang’s formula, queues in tandem and networks of queues, M/G/1 and G/M/1 queues; insurance ruin models; applications in applied sciences.

• J.R. Norris: Markov Chains. Cambridge University Press, 1997.
• G.R. Grimmett and D.R. Stirzaker: Probability and Random Processes, 3rd edition, Oxford University Press, 2001.
• G.R. Grimmett and D.R. Stirzaker: One Thousand Exercises in Probability. Oxford
University Press, 2001.
• S.M. Ross: Introduction to Probability Models, 4th edition, Academic Press, 1989.
• D.R. Stirzaker: Elementary Probability, 2nd edition, Cambridge University Press, 2003.

Prerequisites: A9 Statistics

Course Term: Hilary

Number of Lectures: 16

Level: H-level

Weight: One unit

Method of Assessment: Written examination

Learning Outcomes

Event times and event counts appear in many social and medical data contexts, and require a specialised suite of techniques to handle properly, broadly known as survival analysis. This course covers the basic definitions of hazard rates and survival functions, techniques for creating and interpreting life tables, nonparametric estimation and comparison of event-time distributions, and evaluating the goodness of fit of various semiparametric models. A focus is on understanding when and why particular models ought to be chosen, and on using the standard software tools in R to carry out data analysis.

Synopsis

1. Introduction to survival data: hazard rates, survival curves, life tables.
2. Censoring and truncation, introduction through the census approximation.
3. Parametric survival models.
4. Nonparametric estimation of survival curves.
5. Nonparametric model tests (log-rank test and relatives).
6. Semiparametric models
a. Proportional hazards;
c. Accelerated failure models.
7. Model-fit diagnostics.
8. Dynamic prediction and model information quality.
9. Repeated events.

Topics:

Life tables: Basic notation, life expectancy and remaining life expectancy, curtate lifetimes. Survival models: general lifetime distributions, force of mortality (hazard rate), survival function. Periods and cohorts. Lexis diagrams. Census and vital statistics. Multiple decrements model.

Censoring and truncation. Maximum likelihood estimation for parametric models. Kaplan-Meier and Nelson-Aalen estimator with variance estimation (including Greenwood’s formula). Applications in epidemiology. Parametric models generalised linear regression. Nonparametric comparison of survival distributions, including log-rank test and serial-correlations test. Using the survival package in R.

Relative risk (proportional hazards) including the Cox model, additive hazards model, accelerated failure models. Partial likelihood. Efron’s estimator for survival distributions.

Residual tests, including Cox—Snell residuals, martingale residuals, Schoenfeld residuals. Dynamic prediction and predictive power of models: Cross validation,

Anderson—Gill model, Poisson regression, negative binomial model. Multistate models and Markov processes.

• Statistical Lifetime Models lecture notes, revised 2019.
• Kenneth W. Wachter. Essential Demographic Methods. Harvard University Press, 2014.
• J.P. Klein and M.L. Moeschberger, Survival Analysis, Springer, 1997.

• Farhat Yusuf, David Swanson, Jo Martins. Methods of Demographic Analysis. Springer, 2013.
• Subject CT4 Models Core Reading, Faculty & Institute of Actuaries.
• Odd O. Aalen et al., Survival and Event History Analysis, Springer, 2008.
• D. F. Moore, Applied Survival Analysis Using R, Springer, 2016.
• H. C. van Houwelingen and T. Stijnen, “Cox Regression Model”, in J. P. Klein et al. (ed.) Handbook of Survival Analysis, pp. 5—26, CRC Press, 2014.
• T. Martinussen and L. Peng, “Alternatives to the Cox Model”, in J. P. Klein et al. (ed.) Handbook of Survival Analysis, pp. 49—76, CRC Press, 2014.
• H. C. van Houwelingen and H. Putten, Dynamic Prediction in Clinical Survival Analysis. CRC Press, 2011.
• Lawless, J. F. and Yuan, Y. (2010). “Estimation of prediction error for survival models”. Statistics in Medicine, 29(2), 262-274.
SB4.1 Actuarial Science

Prerequisites: A8 Probability is useful, but not essential. If you have not done A8 Probability, make sure that you are familiar with Prelims work on Probability.

Course Term: Michaelmas

Number of Lectures: 16

Level: H-level

Weight: One unit

Method of Assessment: Written examination

Synopsis

Fundamental nature of actuarial work. Use of generalised cash flow model to describe financial transactions. Time value of money using the concepts of compound interest and discounting.

Interest rate models. Present values and accumulated values of a stream of equal or unequal payments using specified rates of interest. Interest rates in terms of different time periods.

Equation of value, rate of return of a cash flow, existence criteria.

Single decrement model. Present values and accumulated values of a stream of payments taking into account the probability of the payments being made according to a single decrement model. Annuity functions and assurance functions for a single decrement model. Risk and premium calculation.

Liabilities under a simple assurance contract or annuity contract.

Theories of value, St Petersburg Paradox, statement of Expected Utility Theory (EUT) and Subjective Expected Utility (SEU) representation theorems.

Risk aversion, the Arrow-Pratt approximation, comparative risk aversion, classical utility functions.

First and second order stochastic dominance, the Rothschild-Stiglitz Proposition.

Mossin’s Theorem, static portfolio choice. Consumption and saving. Felicity Function and Prudence.

Time consistency. Desynchronisation.