Predicting proteins in three dimensions

Researchers in the Department of Statistics have developed novel rapid methods for predicting protein structure, which are now used by major pharmaceutical companies in the development of new drugs.

Proteins perform crucial functions in all biological processes. What they do is generally determined by their three-dimensional structure, which makes information about protein structure critical to processes such as drug discovery. Experimental methods such as x-ray crystallography, however, cannot always give all the necessary information, especially when trying to establish the loop-structures of proteins. Loops are the most variable regions of protein structure and tend to be the aspect most closely related to protein function. Computational methods for loop prediction can therefore offer a powerful addition to the data derived from experiments.

As there are a vast number of different proteins, it had been assumed that the most powerful approach was to predict protein structure from first principles. However, research led by Professor Charlotte Deane identified that the power of database search methods had been underestimated. Deane and her team re-evaluated FREAD, a database search program and loop-modelling algorithm, and developed this into a new program, pyFREAD.

pyFREAD’s approach was based on the fact that, given fixed anchor structures, the loop's structure is independent of that of the rest of the protein and is solely determined by its amino acid sequence. Thus, accurate loop modelling can be achieved by searching the database of known structures for a stretch of amino acids with similar sequence and similar anchor structures. The revised program incorporated a completely new scoring system which, combined with bigger databases of protein structures and faster computers, resulted in a significant improvement in the ability to model loops.

Subsequent research was undertaken in collaboration with UCB Pharma, a large pharmaceutical company operating in 40 countries worldwide, with a global revenue of €3.4 billion in 2012. This work established that pyFREAD could also be used when multiple segments of data were missing, and thus help to model residues not defined owing to the experimental limitations of x-ray crystallography and molecular dynamics simulations. pyFREAD’s algorithms were also significantly speeded up, and the method generalised to allow modelling of any fragment of the protein, not just loop structures.

UCB Pharma have made extensive use of pyFREAD in their drug discovery work. The company has found the program to be at least 1000 times faster than comparable commercial packages, and to produce more accurate results. Lead compound optimisation is one of the most costly steps in drug discovery and development, requiring on average £6m per campaign, and pyFREAD is expected to save the company over £5m per drug approval.

A version of pyFREAD coded in C also exists in a free, downloadable version, as well as a web-based computational version which in 2013 performed an average of over 60 predictions per month and was visited by over 200 unique users per month from throughout the world. It is used regularly by, among others, Oxford spin-out computational drug discovery company InhibOx.

 

pyFREAD The predictive power of FREAD: The black loop is the actual protein loop structure. The grey loop shows the prediction made by the original FREAD program. The white loop is the prediction made by Deane’s new version of FREAD – very close to the actual structure.

“The research work at Professor Deane’s laboratory has generated significant economic value for UCB Pharma through the acceleration of the drug discovery process. More importantly, faster drug discovery means that patients receive better treatment sooner. While the impact on patients’ quality of life is hard to quantify, it is what matters most”
Director of Computational Structural Biology, UCB Pharma

Links:
Free, downloadable version of pyFREAD: http://opig.stats.ox.ac.uk/webapps/fread/php/
Web-based computational version of pyFREAD: http://opig.stats.ox.ac.uk/sites/fread/

Research funded by EPSRC, BBSRC and the Wellcome Trust