In statistical modelling, extreme outliers are often written off as 'noise'. But a new study by researchers from Oxford's Department of Statistics and Big Data Institute published this week in The American Journal of Human Genetics reverses that principle, using these outliers as the basis of a targeting system for locating rare, high-impact genetic mutations.
This type of mutation can help reveal the biological mechanisms that drive disease. By showing which genes have the greatest impact on a condition, these mutations can help identify promising targets for future therapies.
By comparing an individual's actual physical measurements or disease status against what their polygenic score predicted, the study isolates extreme statistical outliers – people whose bodies appear to defy predictions based on their common genetic variants. This might include someone who is very short despite carrying genetic markers associated with above-average height, or someone who develops heart disease despite having a low predicted genetic risk.
Finding these mutations has traditionally been a brute-force exercise. Researchers sequence the genomes of large cohorts, assessing millions of genetic markers before asking whether any are associated with a particular trait or disease. Because the variants can occur in fewer than 0.1% of the population, finding meaningful signals can be expensive and computationally intensive – a needle in a genomic haystack.
The phenotypic misalignment framework reverses this process by starting with the phenotype. The framework is built on the idea that when an individual's phenotype dramatically diverges from their polygenic prediction, that mismatch may be driven by a rare, high-impact mutation. The researchers hypothesised that these statistical outliers would therefore be enriched for such mutations.
‘By first applying our framework to well-characterised traits like height and type 2 diabetes, we were able to validate that individuals who drastically deviate from their polygenic predictions are indeed highly enriched for rare, high-impact mutations,’ said lead author Dr Nikolas Baya, postdoctoral fellow in the Department of Statistics. ‘The model successfully identified known rare mutations linked to height and type 2 diabetes, demonstrating that it could reliably pinpoint individuals more likely to carry high-impact genetic variants without first examining sequencing data.’
Once this proof-of-concept was established, the researchers expanded the analysis across thousands of genes. Rather than looking for genes associated with a specific disease or physical trait, they searched for genes associated with the mismatch between a person's observed characteristics and their polygenic prediction. This identified 74 statistically significant gene-trait associations, highlighting several genes that had not previously been prioritised for further investigation.
Among the newly identified signals were associations between rare variants in ACSL6 and lower-than-expected body mass index, and between variants in KANK1 and age at menopause. These findings may offer new insights into the biology underlying these traits.
‘What is exciting about this approach is that it allows us to move beyond simply validating known biology and begin identifying entirely new areas for investigation,’ said Dr Nikolas Baya. ‘By focusing on the people who most strongly deviate from genetic expectations, we can more efficiently identify genes that may be playing an important role in human health and disease.’
In the future, the researchers believe the framework could help make the search for rare disease-causing mutations more targeted. Rather than sequencing entire populations to find rare disease links, clinicians may eventually be able to use polygenic screening to identify individuals most likely to benefit from further genetic investigation.
Senior co-author Professor Cecilia Lindgren from the Department of Statistics said: ‘This application of statistical genetics, which focuses on individuals who deviate from genetic expectations, offers a new route to discovering rare, high-impact genetic variants. The approach has the potential to accelerate both biological discovery and therapeutic target identification.’
The study ‘Individuals who deviate from polygenic expectation are enriched for damaging variants in genes linked to rare disease’ has been published in The American Journal of Human Genetics.
Professor Ben Lambert wins MPLS Research Supervision Award
Professor Ben Lambert has won a 2025/26 Mathematical, Physical and Life Sciences (MPLS) Award for Outstanding Research Supervision, recognised for his exceptional dedication to supporting researchers at every stage of their development.
Brian Ripley recognised in $1 million Rousseeuw Prize for contributions to the R Project
Professor Brian Ripley, Emeritus Professor of Applied Statistics and former Head of the Department of Statistics at the University of Oxford, has been named as one of five laureates of the 2026 Rousseeuw Prize for Statistics.
Professor Charlotte Deane elected Fellow of the Royal Society
Professor Charlotte Deane MBE, Professor of Structural Bioinformatics in Oxford’s Department of Statistics, has today been announced among the new Fellows of the Royal Society.