Dr George Nicholson

Postdoctoral Researcher

About Me

I studied mathematics as an undergraduate, before focusing my doctoral research on population genetics – where we use probabilistic models to help us understand how population movements and selective pressures gave rise to modern-day human genetic variation.

Since then, I’ve developed a general passion for the process of discovery in biomedical science. How can we best design scientific experiments and update our beliefs, based on the resulting data, to help improve public health? Our essential common goal is to investigate and refine scientific hypotheses about the mechanisms of disease, through careful experimentation and observation, followed up by robust, reproducible analyses.

Research Interests

We as statisticians contribute to science by developing statistical models that probabilistically relate data to underlying mechanisms of interest. Methodological tools such as Bayesian networks and Markov chain Monte Carlo allow us to work with arbitrarily complex statistical models. We strive to design and fit models that satisfactorily represent the mechanisms through which data arise.

Modern scientific datasets are often large, highly structured, and multifaceted. They span multiple high-dimensional data types (such as genetic, molecular, clinical, image or audio data), are gathered sequentially and/or spread spatially, can be affected by selection bias and harbour missing data. Such large, multimodal, complex datasets present challenges as well as opportunities. While we are theoretically capable of modelling all data types and generating mechanisms in an all-encompassing model, high computational complexity may mean it is infeasible in practice to fit our model in reasonable time. I’m interested in statistical methods that help us perform inference in this setting, in ways that are computationally efficient yet still probabilistically coherent. Here are some example themes and applications of current focus:

Multivariate methods. Effective modelling of multivariate relationships in high dimensional data can often provide transformative insights. We are developing composable multivariate models, based on sparse factor representations, to extract information from high dimensional phenotypic measurements with missing data. 

Longitudinal data analysis. We are developing methods for multivariate longitudinal analysis of clinical trials data, whereby we can harness information both across clinical endpoints and across time points of an individual patient. We are also interested in inferring longitudinal trajectories from infrequently measured phenotypes in UK biobank data. We recently implemented a susceptible-infectious-recovered (SIR) to model changes in local Covid prevalence over time.

Composable inference. We’re interested in developing composable statistical methods that allow us to extract information from diverse datasets separately and conveniently, and then to synthesise information coherently and pragmatically (e.g., Markov melding). We have employed composable statistical inference in application areas ranging from UK biobank, randomised clinical trials, multivariate phenotyping, and Covid testing data.

Inference under model misspecification. When combining information from multiple data sources, we may want to control the influence of less reliable or poorly modelled sources (e.g., generalised Bayesian inference). I’m interested in computationally efficient ways of doing this. We used this form of inference to obtain unbiased local Covid prevalence estimates: we combined highly accurate randomised testing surveys (REACT) with precise but inaccurate symptoms-ascertained data (Test-and-trace).

Publications

Venkatesh, S., Ganjgahi, H., Palmer, D., Nicholson, G., Nellaker, C., Holmes, C. and Lindgren, C. (2023) “Genetic architecture of longitudinal obesity trajectories in primary care electronic health records”, in EUROPEAN JOURNAL OF HUMAN GENETICS, pp. 39–39.