IMPUTE, developed in the Department of Statistics, has changed the field of human genetics by enabling the accurate prediction of ‘missing’ genetic data. This allows much easier identification of genes that are may be associated with any given disease.

In genetic studies of human disease it is now routine to collect genetic data on thousands of individuals. A typical study will measure up to a million variable positions across the genome (single nucleotide polymorphisms, or SNPs) in thousands of subjects, and look for significant differences between individuals with and without a particular disease. The identification of these ‘disease genes’ can help understanding of the disease mechanisms. However, the genetic data collected is incomplete, with many millions of sites of the genome unmeasured. Any method of predicting, or imputing, the unobserved genetic data would be of enormous use in genetic studies.

The first model to be able do this was developed by Professor Jonathan Marchini and Professor Peter Donnelly as part of their involvement in the Wellcome Trust Case Control Consortium (WTCCC) from 2006-2007. They realised that existing reference databases such as the 1000 Genomes Project (which contains millions of SNPs) could be used to help predict unobserved genotypes, and that recently developed Hidden Markov models developed in the area of population genetics could be adapted to carry out this task.

Professor Marchini wrote IMPUTE to predict missing data using patterns of haplotypes (a set of SNPs that are associated statistically) that are shared between two datasets: a reference database and a genetic study. IMPUTE was applied successfully to all 7 disease studies carried out by the WTCCC. For common genetic variants of interest, the accuracy of imputation is over 95%. Further refinements to IMPUTE enabled the method to scale better as reference databases increased in size, allowing the selection of relevant subsets of the reference database for each individual. This also allowed predictions to be matched to an individual’s ancestry (e.g. European or African).

One key benefit of the method is that once unobserved genotypes have been predicted in several different studies, they can then be combined, via meta-analysis, to produce much more powerful studies. This approach has changed the field of human genetics and groups now routinely share data via this approach. One of the earliest examples of this was in the study of type 2 diabetes; a meta-analysis of three studies involving over 10,000 individuals and 2.2 million SNPs led to the discovery of 6 new genes that were strongly associated with the inheritance of type 2 diabetes.

IMPUTE has had a significant impact on two major companies working in the field of genetics and pharmaceuticals: Affymetrix and Roche. Affymetrix uses IMPUTE as a central part of the process of designing all its genotyping products including SNP arrays, which are used to study slight variations between genomes and thus determine susceptibility to disease. The design of these arrays has helped the company to win a genotyping contract worth around £25m.

Roche saved around $1m by using IMPUTE in a study of drug response. Many medications exhibit a variable response rate that is thought to be partly genetic. Roche used IMPUTE to analyse the genetics of response to the drug tocilizumab (used to treat rheumatoid arthritis). The study was able to implicate the involvement of 8 loci in the patient response to tocilizumab treatment, and show that patients carrying the specific genetic markers had a higher remission compared to those who did not.

“The impute software that we licenced from Oxford University has been used extensively at Affymetrix […] This has made a significant impact in the way we design arrays and could not have happened without using IMPUTE2, which has been shown to be the most accurate method of imputation in the literature”
Vice President for Informatics, Affymetrix







Genome-wide scan for seven common diseases from the Wellcome Trust Case Control Consortium, using IMPUTE

Research funded by the Wellcome Trust