My research explores the potential of computational statistics and statistical machine learning to assist in the medical and health sciences. In this respect I oversee a small research group working on probabilistic models and Bayesian decision analysis in complex biomedical data environments. This includes theoretical foundations, novel methodology, and “handson” study driven data science.
I hold a joint Statutory Professorship (Oxford speak for Chair) in Biostatistics at the departments of Statistics and the Nuffield Department of Medicine. Within the Nuffield medical school I am an Affiliate Member of the Li Ka Shing Centre for Health Information and Discovery. My research is partly funded through a Programme Leaders award in Statistical Genomics from the UK's Medical Research Council. I am Scientific Director for the Health Programme at the Alan Turing Institute, London. I am also Director of the OxfordWarwick EPSRCMRC Centre for Doctoral Training in modern statistical science.
This project aims to develop and apply computational statistics and machine learning methods to enhance interpretation of data from the International Mouse Phenotyping Consortium and to facilitate their use in identification of models for human disease. The IMPC is a multicentre collaboration aimed at measuring the phenotypic consequences of knocking out each gene in the mouse genome in turn. Several hundred measurements are taken on each animal, in procedures ranging from clinical blood chemistry, through calorimetry and body composition to behavioural phenotypes. Our current focus is on the use of sparse hierarchical factor models to effectively identify and interpret multivariate phenotype perturbations and impute unmeasured phenotypes.
HCV infects around 200,000 people within the UK and 200 million people worldwide (2% of the world population). STOPHCV is a flexible and dynamic UK wide consortium that will use patient stratification to optimise treatment of infected patients. The consortium builds on existing cutting edge clinical and scientific expertise, in partnership with industry. Our overarching aim is to define and develop a deeper understanding of patient strata and to develop prognostic models so that rational treatment strategies can be deployed. In a new era of novel Directly Acting Antiviral (DAA) therapies, treating only a subset of patients with DAA will cost the NHS an estimated £96 million/year. Therefore, refined patient stratification will be of enormous clinical and economic benefit. A focus of our program will be study of HCV genotype3 infection, highly prevalent in the UK, with a characteristic clinical phenotype, and a higher relapse rate with DAA therapy. We will also focus on difficulttotreat patient groups such as those with cirrhosis and those coinfected with HIV, where optimal management pathways will be of particular benefit in patients. Our consortium is underpinned by HCV Research UK, a network of 18 UK centres biobanking samples from 10,000 HCV infected patients, linked to a stateoftheart clinical database. A unique aspect of the STOPHCV consortium is the availability of complementary datasets on a common set of samples. This allows for integrative analysis, using all of the data for improved scientific understanding of the mechanisms underlying heterogeneity of disease susceptibility and progression, as well as hostviral interaction and implications for therapeutic response. The joint data will also enable an integrative approach to biomarker panel construction using genetic, genomic, and serum marker data, informed by the in vitro immunity experiments. Integrative analysis will take place in phases beginning with pairwise analysis of datasets prior to integration of heterogeneous sources of data.
The aim of the combined effort of SCORT consortium is to better diagnose colorectal cancer (CRC) in such a way as to increase the likelihood that the treatment with the highest chance of success, is prescribed to patients. It also aims to minimise the potential negative side effects associated with various therapies. Part of this work is to develop novel statistical methods, employing computational statistics and machine learning approaches, in biostatistics and statistical genomics to integrate the multiomics data (DNA sequence, methylation, transcriptome and patient records) generated by the consortium in order to provide a greater biological understanding of CRC and how that underlies the prediction of outcome.
New technologies are providing opportunities to measure health and disease in many novel ways. This project focuses on two such technologies that measure genomewide gene expression (RNAseq transcriptomics) and concentrations of a broad range of small molecules involved in metabolism (metabolomics). One of the next big steps in precision medicine promises to be the integration of genetic data with such longitudinally varying molecular phenotypes to enhance prediction of disease and stratify treatments across patient groups to improve health outcomes. This project gathers RNASeq and metabolomic data longitudinally from ~700 members of the TwinsUK cohort, with these data accompanied by genotypes and extensive clinical and lifestyle information. We will explore how molecular traits track and vary over time, determine how such variation relates to underlying genetic variation, and explore the joint contribution of genetic and genomic data to disease risk, with a particular focus on type II diabetes. We will develop bespoke statistical and machine learning approaches to infer the longitudinal multivariate relationships amongst genotype, molecular traits, and environmental/lifestyle factors; and their application to identify robust, reproducible signatures associated with disease susceptibility and onset.
We are members of the Computational Statistics and Machine Learning group within the Department of Statistics.
Projects and group funded by:
