About Me
I am a student at the University of Oxford, studying for my DPhil in Statistics. My research
is on detecting recombination breakpoints in DNA multiple alignments.
Research
I have developed a method called the Stochastic Topology Hidden Markov Model (ST-HMM) for
inferring recombination breakpoints, rate heterogeneity and for inferring the topologies that
generated the data.
The program is written in Java and can be downloaded below. The method is described in a paper to
appear in Bioinformatics. Advance access is available from
http://bioinformatics.oxfordjournals.org/cgi/reprint/btn607.
Various other programs are provided to run the ST-HMM or to aid interpreting the output. A document
which details how the programs should be installed and run is available below.
A parallel tempering version of the ST-HMM is also available. Multiple chains can be run at different
temperatures to improve mixing of the algorithm. The Heated ST-HMM is available below.
Downloads
Java code to run the ST-HMM, Heated ST-HMM and analyse the output:
The ST-HMM.jar program.
The HeatedSTHMM.jar program.
The BioInf.jar package required to run
the ST-HMM and Heated ST-HMM.
The STHMMPosterior.jar program
for summarizing the output of the chain.
The TreeSummary.jar program for
summarizing the trees output by the STHMMPosterior program.
The TreePositions.jar program
for summarizing the posterior probability of all trees at a given site of the alignment.
The documentation for the
programs.
Various datasets anaylsed by the program can also be downloaded:
4 taxa data simulated using the BARGE program of Husmeier (2005) barge.phy
6 taxa data simulated using the SeqGen program of Rambaut and Grassly (1997) 6taxa.phy
15 taxa data simulated using the SeqGen program of Rambaut and Grassly (1997) 15taxa.phy
KAL153: 4 HIV-1 strains alignment with isolate KAL153 KAL153.phy
15 inbred mouse strains SNPs: 1Mb region (34Mb-35Mb) from Chromosome 4 Chr04_34-35.phy
Example settings file for the barge.phy dataset settings.cmdfile
References
Webb, A., Hancock, J. M. and Holmes, C. C. (2008) Phylogenetic Inference Under Recombination Using Bayesian Stochastic Topology Selection, Bioinformatics.
Husmeier, D. (2005) Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic hidden Markov models, Bioinformatics, 21, 166-172
Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees, Comput. Appl. Biosci., 13, 235-238.