About Me
I am a student at the University of Oxford, studying for my DPhil in Statistics. My research is on detecting recombination breakpoints in DNA multiple alignments.

Research
I have developed a method called the Stochastic Topology Hidden Markov Model (ST-HMM) for inferring recombination breakpoints, rate heterogeneity and for inferring the topologies that generated the data. The program is written in Java and can be downloaded below. The method is described in a paper to appear in Bioinformatics. Advance access is available from http://bioinformatics.oxfordjournals.org/cgi/reprint/btn607. Various other programs are provided to run the ST-HMM or to aid interpreting the output. A document which details how the programs should be installed and run is available below. A parallel tempering version of the ST-HMM is also available. Multiple chains can be run at different temperatures to improve mixing of the algorithm. The Heated ST-HMM is available below.

Downloads
Java code to run the ST-HMM, Heated ST-HMM and analyse the output:

  • The ST-HMM.jar program.
  • The HeatedSTHMM.jar program.
  • The BioInf.jar package required to run the ST-HMM and Heated ST-HMM.
  • The STHMMPosterior.jar program for summarizing the output of the chain.
  • The TreeSummary.jar program for summarizing the trees output by the STHMMPosterior program.
  • The TreePositions.jar program for summarizing the posterior probability of all trees at a given site of the alignment.
  • The documentation for the programs.

    Various datasets anaylsed by the program can also be downloaded:

  • 4 taxa data simulated using the BARGE program of Husmeier (2005) barge.phy
  • 6 taxa data simulated using the SeqGen program of Rambaut and Grassly (1997) 6taxa.phy
  • 15 taxa data simulated using the SeqGen program of Rambaut and Grassly (1997) 15taxa.phy
  • KAL153: 4 HIV-1 strains alignment with isolate KAL153 KAL153.phy
  • 15 inbred mouse strains SNPs: 1Mb region (34Mb-35Mb) from Chromosome 4 Chr04_34-35.phy
  • Example settings file for the barge.phy dataset settings.cmdfile

    References

  • Webb, A., Hancock, J. M. and Holmes, C. C. (2008) Phylogenetic Inference Under Recombination Using Bayesian Stochastic Topology Selection, Bioinformatics.
  • Husmeier, D. (2005) Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic hidden Markov models, Bioinformatics, 21, 166-172
  • Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees, Comput. Appl. Biosci., 13, 235-238.