A major computational problem in biology is the reconstruction
of evolutionary trees for species sets, and accuracy is measured
by comparing the topologies of the reconstructed tree and
the model tree. One of the major debates in the field is whether
large evolutionary trees can be even approximately accurately
reconstructed from biomolecular sequences of realistically
bounded lengths (up to about 2000 nucleotides) using
standard techniques (polynomial-time distance
methods, and heuristics for NP-hard optimization problems).
Using both analytical and experimental techniques, we
show that on large trees, the two most popular methods in
systematic biology, Neighbor-Joining and Maximum Parsimony heuristics,
as well as two promising methods introduced by theoretical
computer scientists, are all likely to have significant
errors in the topology reconstruction of the model tree.
We also present a new general technique for combining
outputs of different methods (thus producing hybrid methods),
and show experimentally how one such hybrid method has better performance
than its constituent parts.
Keywords:
CR Categories:
Sources:
The LaTeX version of the article;
this is a Unix file that contains the LaTeX source;
note that you must be a subscriber (institutional or individual) to access this file.
The Postscript
version or PDF
version of the article;
note that you must be a subscriber (institutional or individual) to access
these files.
The HTML version of the article;
note that you must be a subscriber (institutional or individual) to access
this file.
The software suite accompanying the article;
this is a Unix tar file, which includes the source code, a Makefile,
and the test files used in the article.