Site_Graphic Site_Graphic

The TPS Algorithm & the Evolutionary Path of Protein Structures

Comparing homologous objects are central to biology and the last decade have been dominated by this in the field of comparative genomics. However, many other objects than sequences in biology are homologous and can be subjected to evolutionary study. Examples could be networks and structures. Protein structures have been compared for decades now in a non-statistical and non-evolutionary way. Such analysis would benefit tremendously by methods that modelled the evolution over time of structures explicit. The motivation for doing this, has increased due to the large set of known structures presently, but has not materialized due to lack of models and computational power. Fortunately, such models have been explored extensively both in statistics and also in Molecular Dynamics (MD). Molecular Dynamics can simulate the behaviour of molecular systems with up to 106 atoms (typically 103-104 atoms) and for time periods of up to a microsecond (10-6 s) dependent on details. Applications often involve dynamic paths, where both start configuration and end configuration of the system is known – for instance the catalysis of a substrate into a product. This is a well studied problem in statistics and the natural algorithms are now being used large scale in MD under the name Transition Path Sampling (TSP). The modelling problem described here is very similar to the TPS problem and can be explored using the same algorithm.

The proposed project clearly is ambitious, but is worth pursuing as it represents the optimal way of studying protein evolution. It is ambitious since it involves involve investigating all possible paths between two protein structures. And for each path it involves predicting all protein structures on the steps of the path. There are methods to do this, but great take care must be taken to be efficient in computations. Both paths and structures will have great redundancy allowing for reuse of calculations. Additionally, protein structures can be represented at different levels of detail, with the coarsest being representing secondary structure elements (SSEs) as labelled sticks with relationships to other SSEs, and the (almost) finest level being a full atomic representation of the structure. Making the correct representational choice is crucial in making an interesting analysis in finite time and at the same time not having trivialized the problem.

Thomas Darden, Jotun Hein, Lee Pedersen, Mark Sansom and Willie Taylor have a series of specific projects fitting this framework, but we will first formulate a plan for the complete Dphil after a dialogue with the successful applicant. Thomas Darden is a mathematician with main expertise is in methodology development within Molecular Dynamics. Lee Pedersen is a theoretical chemist with main interest in Molecular Dynamics. Jotun Hein is a bioinformatician-geneticist interested in expanding the domain of evolutionary modelling and analysis. Mark Sansom is a biochemist with focus on the application of molecular dynamics to especially membrane proteins. Willie Taylor is a computational biologist with focus on proteins.