Oxford Statistics at the forefront of AI-driven drug discovery | Department of Statistics

THE RADCLIFFE CAMERA.. Copyright © University of Oxford Images / John Cairns Photography

Oxford Statistics at the forefront of AI-driven drug discovery

10 Jun 2025

Professor Charlotte Deane from the Department of Statistics has been announced as a senior principal investigator on a £8 million government-backed consortium that will create the world's largest dataset for AI-driven drug discovery.

OpenBind, a new £8 million consortium, will create the world's largest open dataset of drug-protein interactions, generating over 500,000 experimentally validated protein-ligand complex structures over the next five years – representing a 20-fold increase over all publicly available data collected in the past half-century.

‘OpenBind realises a major gear-shift for AI in drug discovery by investing in the data that powers it,’ said Professor Deane. ‘This funding will mean we can begin generating a catalogue that not only dwarfs in quantity everything messily accumulated over half a century, but transcends it in quality and is geared towards powering the AI algorithms.’

Most medicines work by binding to specific proteins – the building blocks that make our bodies function – but researchers have historically lacked sufficient high-quality data about these interactions to train AI systems effectively. This data shortage has been a major barrier to using artificial intelligence to predict which new compounds might work as drugs, leaving pharmaceutical companies reliant on empirical testing methods that can take decades and cost billions. OpenBind promises to bridge that gap by creating structured, comprehensive data specifically designed for machine learning applications.

The consortium will deploy automated chemistry and high-throughput X-ray crystallography at Diamond Light Source, the UK's national synchrotron facility in Oxfordshire, to generate unprecedented volumes of precise molecular interaction data structured for AI training.

Professor Deane is working alongside an international team of researchers, including colleagues Professor Frank von Delft (who also holds a position at Diamond Light Source) and Professor Paul Brennan, both from Oxford’s Nuffield Department of Medicine. The consortium also includes Nobel Prize winner Professor David Baker from the University of Washington, and leading computational scientists from institutions including Memorial Sloan Kettering Cancer Centre, MIT, and Columbia University.

The OpenBind dataset is designed to support multiple areas of computational innovation, including structure prediction, generative molecular design, docking algorithms, and active learning workflows. These applications demonstrate how statistical methods developed for one domain can have far-reaching impacts across multiple fields of scientific inquiry. The project also has potential applications beyond healthcare, supporting research into engineering biology solutions for challenges such as developing new enzymes to tackle plastic waste.

OpenBind is backed by the UK government's newly established Sovereign AI Unit and positions the UK at the forefront of AI-driven scientific discovery. The project will help train the next generation of AI models for drug discovery while establishing new standards for open scientific data sharing. The announcement comes as part of the government's broader Plan for Change, highlighting how statistical and computational expertise developed at Oxford is directly contributing to national economic growth and international scientific leadership.

The project also demonstrates how statistical expertise developed in the department is being applied to accelerate medical breakthroughs that could benefit patients worldwide – and underpin decades of future innovation in computational biology and pharmaceutical research.

What does a statistician do for the England football team?

From squad selection to modelling how footballs behave at altitude, statistician Matt Penn explains how data is helping shape the modern game, and why coaches will always matter more than the numbers.

Find out more

New evidence suggests vast hidden magma systems inside Mars

Researchers from the Departments of Earth Science and Statistics have found evidence that Mars once hosted enormous, Earth-like magmatic systems deep below its surface – even though the planet lacks the plate tectonics long considered essential for this kind of geological complexity. The findings open up new possibilities for how rocky planets become habitable.

Find out more

Finding a needle in the genomic haystack: Targeting rare genes using statistical outliers

In statistical modelling, extreme outliers are often written off as 'noise'. But a new study by researchers from Oxford's Department of Statistics and Big Data Institute published this week in The American Journal of Human Genetics reverses that principle, using these outliers as the basis of a targeting system for locating rare, high-impact genetic mutations.

Find out more