RESEARCH INTERESTS
I am a member of the Bioinformatics Group
I am interested in Molecular Evolution, Molecular Population
Genetics, Bioinformatics and Computational Biology. At present I am
working on the topics sketched below:
Statistical alignment
The approach taken to alignment by Thorne, Kishino and Felsenstein
(TKF) is in my view superior to other approaches, but seriously
remains to be developed to be practical for actual data analysis. My
main goals for the near future is:
A tractable time-reversible model allowing for longer
insertions/deletions.
Combining a profile Hidden Markov Model (HMM) with the TKF
model.=20 This leads to interesting problems as the HMM will have
segments added and deleted in the course of evolution.
Hein (2001) gives an algorithm that can analyze a set of
sequences related by a binary tree and evolving by the TKF model. It
is not practical for real data. In collaboration with Jens Ledet
Jensen and Kim Mouridsen were work on practical methods based on MCMC
techniques.
Coalescent Theory
Including realistic molecular models of Gene Conversion and
Recombination in the Coalescent Model. Interesting questions in this
context is how much population data is needed to distinguish
different molecular mechanisms. It should also be of relevance for
fine scale gene mapping.
Methods of evolutionary analysis of sequence that experience
Recombination and Gene Conversion.
Stochastic Grammars and Molecular Evolution
Stochastic Grammars are very flexible tools to describe structural
relationships in biology, such as secondary structures in proteins
and RNA (and much more). Goldman, Thorne & Jones were the first to
combine Stochastic Grammars with Molecular Evolution.
We have since initiated three new applications of this:
Stochastic Context Free Grammars, Molecular Evolution and RNA
Secondary Structure, where Bjarne Knudsen is the main contributor.
This started in the summer of 98 and has been a major success.
Gene Finding and Molecular Evolution, where Jakob Skou
Pedersen is the main contributor. This started in February 2000 and
is very promising. Especially as many closely related genomes are
available, this will be very useful.
I hope our next project will be to devise a Viral Gene Finder
that uses alignments of large number or viral genomes to find the
reading frames.
Other Projects
I am in minor way involved in a series of other projects, three of which are:
The measurement of absulute evolutionary rates in viruses,
when the viruses have been sequenced at different time points. The
main contributors to this project are Roald Forsberg and Anne-Mette
Hein.
Metrics on trees based on recombination events. This posed a
major problem in RecPars, where a heuristic had to be used, since an
exact algorithm was too slow. An overlooked problem in this context,
is that real recombinations operate on rooted trees, while people
often unroot their trees. Thomas Christensen has found an example
where the metric on rooted trees are different from the metric on
unrooted trees. It involves 2 rooted trees with 9 leaves that are 3
recombination events apart. If the trees are unrooted, their distance
is only 2 recombinations.
Beyond these topics, I supervise students in molecular evolution,
viral evolution and sequence algorithms.
I normally expand into new areas by including new subjects in the
courses offered. At present I wish to put more emphasis on:
Metabolic Pathways. I started teaching this in 98.
Expression Data and Modelling Regulatory Networks. I started teaching
this in 99.
Gene Mapping and Coalescent Theory.
Teaching
I have started most of the courses that are being taught at AAU in
Bioinformatics, Molecular Evolution and Molecular Population
Genetics, but I have continuously expanded my repertoire and others
have continued or taken over, so the total amount of courses being
offered by the group is now very large. The course that I have
taught or initiated can be found on the
Bioinformatics Research
Center pages at Aarhus University.
Software
The software described is all written in C.
TreeAlign - was written by me in the period 85-89. It aligns and
finds the phylogeny at the same time for a set of homologous proteins
or DNA/RNA sequences.
GenAl - was written by Jens Støvlbæk (92/93) in collaboration with
me. Can align pairs of DNA sequences with an arbitrary set of reading
frames (including overlapping). GenAl is design to compare homologous
viral genomes.
Mulal - was written by Jens Støvlbæk (93/94) in collaboration with
me. Made an evolutionary analysis of aligned pairs of viral genomes,
in terms of transition/transversion and nucleotide composition bias
and selective constraints. It could handle overlapping reading frames.
RecPars - by me in 90, but a superior version was programmed by Kim
Fisker in 94. It tries to find the most parsimonious history of a set
of sequences in terms of substitutions and recombinations. It is
designed to analyze aligned viral genomes.
Spatial - by me in 95/96, but substantially improved by Mikkel Nygård
in 98. Does exactly the same as Hudson's 83 algorithm, but by
scanning along sequences instead of going back in time. It will
generate recombination-genealogies for sequences sample from an
idealized population. It can also generate data if a substitutional
process is specified.
Ancestors - by me in 95/96. Simulates the fate of one sequence going
back in time, tracking the number of ancestors and ancestral
segments. This has been used to estimate the number of genetic
ancestors to present the present human population (with Carsten Wiuf).
Some of this software is publically available
here.
|