The Exelixis Lab

NEW: AnA-FiTS: A very fast forward-in-time simulator for polymorphism data

Andre has designed a very fast forward simulator for pop. gen. simulations that is between 2-3 orders of magnitude faster than current codes.
For details and obtaining the code please go to the AnA-FiTS page.

NEW: A pipeline for perpetually updating trees

Fernando, Stephen Smith, and John Cazes have developed a pipeline that automatically updates reference trees using RAxML-Light when new sequences for the clade of interest appear on GenBank. The tool uses RAxML-Light to extend trees and Stephen's PHLAWD pipeline to extend alignments by new sequences. The code can be run on stand-alone servers and on cluster systems.

The prototype version including documentation is available via Fernando's github repository.


NEW: PTP a tool for delimiting species on phylogenies

Jiajie Zhang has designed a tool called PTP that is based on Poisson Tree Processes that can delimit for species on phylogenies as the are generated for instance by RAxML. Unlike other tools (e.g., GMYC) it does not require a time-calibrated ultrametric tree as input.

Jiajie also generated a integrated pipeline that combines the PTP method with the Evolutionary Placement algorithm in RAxML to assess the diversity of a phylogenetic placement run, by inferring the number of species per placement.

The code and data are available for download here:
An up-to date version of the code is maintained by Jiajie on his github repository


NEW: GapsMis Tool for by reference genome assembly

GapsMis, the successor of GapMis, is a tool for flexible pairwise sequence alignment with a variable, but bounded, number of gaps.

Around 6,000,000 pairwise sequence alignments performed, under realistic conditions based on the properties of real full-length genomes, show
that GapsMis can increase the accuracy of extending short-read alignments by 0.01-0.04% compared to state-of-the-art approaches.

The software is available here
The open source github code repository can be found here


NEW: ExaML Exascale Maximum Likelihood Code


Exascale Maximum Likelihood (ExaML) code for phylogenetic inference using MPI.

This code implements the popular RAxML search algorithm for maximum likelihood based inference of phylogenetic trees. It uses a radically new MPI parallelization approach that yields improved parallel efficiency, in particular on partitioned multi-gene or whole-genome datasets.

It is up to 3.2 times faster than RAxML-Light [1].

As RAxML-Light, ExaML also implements checkpointing, SSE3, AVX vectorization and memory saving techniques.
The code and some documentation can be downloaded via Alexis github repository

[1] A. Stamatakis, A.J. Aberer, C. Goll, S.A. Smith, S.A. Berger, F. Izquierdo-Carrasco: "RAxML-Light: A Tool for computing TeraByte Phylogenies", Bioinformatics 2012; doi: 10.1093/bioinformatics/bts309.

Sweed, a faster version of SweepFinder


We developed SweeD, a parallel and checkpointable tool that implements a composite likelihood ratio test for detecting selective sweeps.
SweeD is based on the SweepFinder algorithm (Nielsen et al. 2005).

SweeD can calculate the theoretical SFS of a given demographic model (stepwise changes or with an exponential growth phase + stepwise changes) by using the method by Živković and Stephan (2011).

SweeD is numerically more stable than SweepFinder (in terms of floating-point arithmetic operations and in particular for folded data), and is faster than SweepFinder when the number of sequences is large.
SweeD has been tested on simulated datasets with up to 10,000 sequences and 1,000,000 SNPs.

The sequential version of SweeD is up to 21 times faster than SweepFinder, depending on the number of SNPs and the number of sequences.
Performance improves over SweepFinder with an increasing number of sequences.
For few sequences, SweeD is as fast as SweepFinder.

SweeD has been also used to analyze the Chromosome 1 from the 1000 Genomes Project.
The dataset comprises more than 2000 sequences and about 2,896,000 SNPs. The analysis required 8h and 15mins.

You can download the source code of version 3.1.1 here
bug history:
  • v3.1    Fixed bug in the VCF file format parser that was associated with handling missing
  • v3.1.1 Changed a default parameter value
The manual is available here

The experimental data and scripts used in the manuscript are available for download here

OPASM for efficient parallel string matching

Contributors: Christos Hadjinikolis, Costas S. Iliopoulos, Solon P. Pissis, Alexandros Stamatakis

Description: The tool focuses on the efficient parallelisation of approximate string-matching algorithms, which are based on dynamic programming, using the message-passing programming paradigm (MPI).

Technical report: C. Hadjinikolis, C.S. Iliopoulos, S.P. Pissis, A. Stamatakis: "Minimising processor communication in parallel approximate string matching", Heidelberg Institute for Theoretical Studies, Exelixis-RRDR-2012-8, August 2012. PDF

Source code and data: provided freely for academic use under the
terms of the GNU General Public License here

An optimized version of DPPDIV

Download the open source code of the optimized (using vector intrinsics) and parallelized version (using OpenMP) of the DPPDIV (original code by Tracy Heath) code for estimating divergence times with a dirichilet process prior here

For details about the program please read the paper

When using the optimized DPPDIV code please cite the original paper:

T. A. Heath, M.T. Holder, J.P. Huelsenbeck: "A Dirichlet process prior for estimating lineage-specific substitution rates". Molecular Biology  and Evolution, 2011.

and the technical report below:

T. Flouris, A. Stamatakis: "An Improvement to DPPDIV", Heidelberg Institute for Theoretical Studies, Exelixis-RRDR-2012-7, August 2012. PDF

For support please use the DPPDIV google group.


RAxML-Light Web-Service

The new RAxML-Light tool is now also available as web-service thanks to the efforts of the great colleagues at the San Diego Supercomputer Center and support by the NSF iPlant collaborative.

To use this service you will first need to create an iPlant login here and subsequently log in on the CIPRES portal using your iPlant credentials.

Microbenchmark for Denormalized Floating Point Numbers

Denormalized floating point values can have a dramatic impact on program performance, i.e., operations of O(n) theoretical run-time can have significantly different execution times, depending on the input data.

The microbenchmark has been extracted from RAxML that exhibited some unexplicable run time variations due to this problem.

The benchmark is available under GNU GPL from Alexi's github repository.

Please read:

Björndalen J., Anshus O. Trusting floating point benchmarks-are your benchmarks really data-independent?
Applied Parallel Computing. State of the art in Scientific Computing 2010; pp 178-188, Springer.

for some background info.

The Gapmis library

solon, simon, tomas, and nikos have developed the gapmis library:

GapMis
is a tool for pairwise sequence alignment with a single gap.

GapMis is a command-driven program implemented in the C programming language and developed under GNU/Linux operating system. It only requires standard C libraries and gcc compiler for compilation.

GAPMIS software page

Rogue Taxon Identification Web-Server (alpha version)

The consensus tree of a set of bootstrap trees can frequently exhibit poor resolution or sub-optimal branch support because ofunstable taxa (also referred to as rogue taxa). Analogously, rogue taxa may also affect branch support values for maximum likelihood trees.

Recently, Andre and Denis made available a web-service, that allows to identify rogue taxa in a set of bootstrap trees. Optionally, if also a single best-known tree (under ML/MP) is provided, our algorithm can identify rogue taxa with respect to the branch support values drawn onto this tree (for both cases the set of rogue taxa is often similar, but not necessarily identical).

The URL to our server is http://exelixis-lab.org/roguenarok.html

Using the web-interface, you can compare the results of various rogue taxon searches (including stability measures such as the taxonomic instability index and the leaf stability index). Finally, the websites integrates a tree viewer that can be used to visualize your consensus tree/best-known tree before and after removing various sets of rogue taxa.

Feedback and comments are most welcome, preferably via the RAxML google group


OmegaPlus Population Genetics code

A parallel tool for rapid & scalable detection of selective sweeps in whole-genome datasets

We have developed OmegaPlus, a scalable implementation of the omega-statistic (Kim and Nielsen 2004) to detect selective sweeps in whole-genome data based on linkage disequilibrium patterns.
OmegaPlus has been tested with fully phased data, but also with unphased data, where we can determine to which diploid individual a SNP belongs to, but we can not determine which of the two chromosomes carries the SNP.
Outgroup information is not required. The program recognizes FASTA, Hudson's ms-like, and MaCS-like (http://www-hsc.usc.edu/~garykche/) formats.
OmegaPlus can scan the DPGP dataset (www.dpgp.org, reference release 1.0 September 2009, 37 sequences and ~340,000 SNPs) for positive selection in 55 seconds.
In addition to the efficient sequential implementation, we provide three parallelized versions that use fine-, coarse-, and multi-grained parallelism.

Right now this is a pure command-line tool, available for Windows and Linux operating systems.
We strongly recommend to use the LINUX version of OmegaPlus.
Note that only limited support will be provided for the Windows version.

For compiling the code, GCC version 4.4 or greater is recommended. For gcc versions prior to version 4.4 please remove the optimization flag (-O3) from the Makefiles before compiling the code.
When OmegaPlus is compiled with older gcc versions it will yield different results on identical input data compared to the ouput it generates when -O3 is activated. This is most probably due to some to aggressive optimizations under -O3.
Many thanks to Stefan Laurent (LMU Munich) for pointing this out.

download the most recent GNU GPL Linux version 2.2.2 here , it includes a bug fix in the VCF file parser that was associated with handling missing data.

previous Linux versions:
  • Linux version 2.2.1 here , it includes a minor bug fix in the ms parser
  • Linux version 2.2 here it includes a new command line flag -no-singletons to exclude the singletons from the analysis
  • Linux version 2.1 here which can now also parse the Variant Call Format (.vcf)
  • Linux version 2.0 here

download GNU GPL Windows version here

download the manual here

Examples are provided with the source code (see directory "examples").

If you have questions or you would like to report a bug, please register at the OmegaPlus google group (http://groups.google.com/group/omegaplus)
or send an email to pavlidisp@gmail.com or n.alachiotis@gmail.com

RAxML Memory Requirements calculator by Simon Berger

taxa (n): pattern (m):

Required size:

(n-2) * m * (x * 8) bytes = MEM

Code Availability

For up-to-date versions of reconfigurable architectures go to opencores
For up-to-date development versions of RAxML, RAxML-Light, Parsimonator, and TreeCounter go to github


RAxML questions, help & bug reports

Please register at the RAxML google group and ask your question there.

A simple Visual Tree Comparison Tool

Simon Berger has developped a simple graphical tree comparison tool that highlights differences between up to four trees by highlighting the branches (bipartitions/splits) that are not shared among those trees.

The JAVA code can be downloaded here

There are three different operating modes:
  1. run it on the command line and output a phyloxml file that can be viewed with any compatible tree viewer: "java -jar vtd.jar tree1 tree2 > out.xml"
  2. run on the command line and start the archeopteryx tree viewer to view the differences between tree1 and tree2: "java -jar vtd.jar tree1 tree2 -v"
  3. GUI mode: just type "java -jar vtd.jar" and a file dialogue will open to select input tree files
In the default mode (when two trees are supplied) branches that occur (are shared) in both trees are colored white, missing branches are red.
If you open more than two trees, the tool will do a multi tree comparison (this works for up to 4 tree files).
In this mode the first tree will be compared to the remaining trees.
The branch coloring is then based on rgb color mixing: If a branch exists in trees 1 and 2 it is colored red. If it exists in trees 1 and 3/4 it will be green/blue.
If a branch exists in more than two trees the colors are mixed (red+green=yellow, red+green+blue=white and so on...).

Here is a screen shot where we compare two small 10 taxon trees:



Graphical User Interface for the RAxML Evolutionary Placement Algorithm

Denis Krompass, our Master's student has put together JAVA-based GUI packages for running and analyzing short read placement runs with the RAxML EPA algorithm described in: S.A. Berger, D. Krompaß, A. Stamatakis: "Performance, Accuracy and Web-Server for Evolutionary Placement of Short Sequence Reads under maximum-likelihood". In Systematic Biology 60(3):291-302, 2011. PDF

This stand-alone GUI is similar in functionality to the EPA web-server  here
It also allows you to build reference trees with RAxML for the original full-length sequence alignment
Windows version
Linux 32-bit version
Linux 64-bit version

Just download the file (this may take some time), then unzip it: "unzip RAxML_Workbench_Linux.zip" then change to the directory: "cd RAxML_Workbench" and then start the GUI by typing: "java -jar RAxML_Worbench.jar"

Here is a screenshot of the GUI:




Reconfigurable Architectures

Also look at opencores for latest updates

Reconfigurable FPGA Pipelined Floating-Point Exponential Unit available here

Source code under GNU GPL version 3 or higher by Nikos Alachiotis. The following restriction to GNU GPL applies: Always cite:

Nikos Alachiotis, Alexandros Stamatakis: "FPGA Optimizations for a Pipelined Floating-Point Exponential Unit", accepted for publication, 7th International Symposium on Applied Reconfigurable Computing (ARC 2011), Belfast, United Kingdom, March 2011.

when using this code.

An IEEE-754 compliant logarithm approximation unit for FPGAs by Nikos Alachiotis

Download an open-source VHDL implementation of a fast space- and resource-efficient logarithm approximation unit for FPGAs.
By using this component you agree to cite it as: "Efficient Floating-Point Logarithm Unit for FPGAs", by Nikos Alachiotis and Alexandros Stamatakis, accepted for publication at RAW workhsop, held in conjunction with IPDPS 2010. PDF

UDP Transceiver Core by Nikos Alachiotis and Simon A. Berger

Download an open-source VHDL implementation of a component that can be connected to the input port of the Virtex-5 Ethernet MAC Local Link Wrapper and that allows for transceiving IPv4 ethernet packets. The archive contains a JAVA test application and is also available at opencores.org
By using this component, you agree to cite it as:  "Efficient PC-FPGA Communication over Gigabit Ethernet", by Nikos Alachiotis, Simon A. Berger, and Alexandros Stamatakis, Exelixis Rapid Research Dissemination Report, Exelixis-RRDR-2010-4, TU Munich, February 2010.  PDF


TreeCounter

Code by A. Stamatakis to compute the number of possible rooted and unrooted binary trees for n taxa or to compute the number of possible binary trees given a multi-furcating constraint tree. This code needs the GNU GMP library.
Program options:
  • treeCounter -h for help
  • treeCounter -n numTaxa for the number of all possible trees with numTaxa taxa
  • treeCounter -t constraint for the number of all possible trees under the constraint 
TreeCounter download


PaPaRa: PArsimony-based Phylogeny-Aware Read alignment program

Code by Simon A. Berger for aligning short reads to reference phylogenies and alignments.

NEW: significantly faster SSE3 vectorized version of PaPaRa 2.0 available for download  here
NEW: SSE3-vectorized and hybrid CPU/GPU-optimized version of PaPaRa available for download here

Also please check for code updates on Simon's github repository


Parsimonator: A fast open-source parsimony program

Parsimonator v1.0.2 source code available here

Parsimonator is a no-frills light-weight implementation for building starting trees under parsimony for RAxML-Light (see below)
It deploys a randomized stepwise addition order algorithm to build trees and thereafter conducts a couple of SPR
(Subtree Pruning Re-Grafting moves) to further improve the parsimony score.
Right now, parsimonator can only compute trees on DNA datasets. It uses SSE3 128-bit wide and 256-bit wide AVX (new as of v101) vector instructions to significantly accelerate parsimony computations.
Although it is significantly slower than TNT, I think that it is the fastest open-source parsimony function implementation, albeit the search algorithm itself is rather naïve.
It can also extend given trees that do not comprise all taxa of an input alignment by using a randomized stepwise addition order algorithm and for those taxa that are not contained in the starting tree. The source archive includes a manual.
The current version has been used to compute a parsimony tree on an alignment with 1481 taxa and 20,000,000 sites (a phylip file with a size of 27GB!).

Old version without OpenMP parallelization and bug fixes for very large datasets available here 1.0.1

Old Version without AVX instructions still available here 1.0.0


RAxML-Light: a strapped down checkpointable RAxML version for computing huge trees

Get the most up-to-date RAxML-Light version from github

RAxML-Light v1.0.9 source code available here

RAxML-Light is a strapped down RAxML version for conducting tree searches on very large trees under the CAT approximation and GAMMA model of rate heterogeneity.
It's key features are:
  1. A light-weight efficient checkpointing and restart capability
  2. A highly optimized fine-grain MPI parallelization that allows you to concurrently compute the likelihood of a single tree on hundreds or thousands of processors, provided that you have a low latency interconnect.
  3. new as of v102: memory saving option -S for gappy multi-gene datasets. With this new option the memory consumption could be reduced from 70GB to 19GB for analyzing a dataset with about 10 genes and 120,000 taxa with 90% missing data. 
  4. new as of v102: AUTO protein model option: RAxML will automatically select the best protein substitution model (WAG, JTT, LG, etc) when model parameters are optimized during the tree search.
  5. new as of v103: a little bug fix :-)
  6. new as of v104: 
    • a little bug fix for the restart from checkpoint option. This will not affect previous results. 
    • Also, the so called search convergence criterion (-D option) from the standard RAxML 728 version has been re-introduced (including restart capability) for tree searches on extremely large trees.
  7. new as of v105: 
    • Bug fixes to compute TeraByte trees with MPI, i.e., trees with 1TB memory requirements for the likelihood vectors of a single tree on more than 600 cores
    • Implementation of GAMMA models of rate heterogeneity
    • Parsing option to parse and compress large alignments into a binary file that can be read much faster
  8. new as of v106:
    • -r option to save memory by recomputing ancestral probability vectors instead of storing them
    • -Q option for improved load balance on partitioned datasets
  9. new as of v108:
    • some minor bug fixes
    • improved manual
  10. new as of v109:
    • major bug fix for CAT likelihood computations on protein models
A usage manual is also included in the archive.

Older version RAxML-Light 1.0.8 available here
Older version RAxML-Light 1.0.6 available here
Older version RAxML-Light 1.0.5 available here
Older version RAxML-Light 1.0.4 available here
Older version RAxML-Light 1.0.3 available here
Older version RAxML-Light 1.0.2 available here
Older version RAxML-Light 1.0.1 available here


RAxML

Get the most up-to-date RAxML version from github

RAxML v7.2.8 alpha release source code available here

new features:
  • several bug fixes
  • added some new protein substitution models: MTART, MTZOA, PMB, HIVB, HIVW, JTTDCMUT, FLU
Documentation:
Read this before running a RAxML analysis! compute RAxML memory requirements.
Since datasets are getting larger here is a formula to estimate RAxML memory requirements:
Given an alignment of n taxa and m distinct patterns the memory consumption is approximately:
  • MEM(AA+GAMMA)    = (n-2) * m * (80 * 8) bytes
  • MEM(AA+CAT)           = (n-2) * m * (20 * 8) bytes
  • MEM(DNA+GAMMA) = (n-2) * m * (16 * 8) bytes
  • MEM(DNA+CAT)        = (n-2) * m * (4  * 8)  bytes
To convert bytes to MB or GB you can use this on line converter

WEB-Servers for evolutionary placement of short reads

Web-Servers for phylogenetic placement of short sequence reads (including alignment and visualization tools)

Web-Servers for tree building

co-maintaned by Exelixis Lab

Vital IT unit of the Swiss Institute of Bioinformatics



CIPRES portal at San Diego Supercomputer Center

New beta-version of the CIPRES portal that provides a full workbench




not maintained by the exelixis lab.
Bioportal in Norway (University of Oslo)


Trex on-line Web-Server at  Université du Québec à Montréal

Graphical User Interfaces

RAxML Graphical User Interfaces

  • Daniele Silvestro and Ingo Michalak at the Senckenberg Museum and Research Center have started developing a GUI for RAxML that runs under MACs, Windows, and Linux. The code for the GUI is available here. Please send suggestions and comments to Daniele Silvestro at senckenberg de
  • Jacek Kominek from the University of Gdansk in Poland has developed this nice GUI here

Older Versions

RAxML v7.2.7 (alpha) available for download here
RAxML v7.2.6 available for download here and here is a windows executable
RAxML v7.2.5 (alpha) available for download here and here is a windows executable
RAxML v7.2.4 (alpha) available for download here
RAxML v7.2.3 (alpha) available for download here
RAxML v7.2.2 available for download here and  download windows executable
RAxML v7.2.1 (alpha) available for download here windows executable here
RAxML v7.2.0 (alpha) available for download here
RAxML v7.1.0 (alpha) available for download here
RAxML v7.0.4 available for download here
RAxML v7.0.3 available for download here
Ancient versions:

RAxML-VI-HPC (version 2.2.3) and a comprehensive Manual (v2.2.3)
Windows Executable for RAxML-VI-HPC (version 2.2.3)  I am extremely grateful to Graham Jones who is a free-lance Computer Scientist in the U.K. He ported and compiled this Windows version of RAxML.
MAC Executable for RAxML-VI-HPC (version 2.2.3). Dave Carmean (carmean_at_sfu_ca) at Simon Fraser University has kindly put together this RAxML executable for MACs. He has also set up a web-page Installing and running RAxML on a Mac in less than a minute...
RAxML-VI-HPC (version 2.0.2) and a comprehensive Manual (v2.0)
RAxML-VI-HPC (version 1.0) and a comprehensive Manual (v1.0)
RAxML-VI: Sequential program with significantly accelerated hill-climbing search algorithm for huge alignment data.
PERL script for non-parametric bootstrapping with RAxML-VI. Note that—depending on your installation—you might have to replace “./raxml” by “raxml” in this script.
RAxML-VI DOS executables for Windows. Those executables have kindly been provided by Jarno Tuimala (jtuimala_at_csc.fi, CSC Finland), whom I would like to thank  for his help and valuable comments.
RAxML-III:  Sequential program, includes more models of nucleotide substitution than RAxML-II.
RAxML-II: Sequential, Parallel, and Distributed implementation of RAxML with less model functionality.

Helper Scripts and Tools

Phylogenetic Binning tool

Phylogenetic binning tool for paper on "Morphology-based phylogenetic binning of the lichen genera Allographa and Graphis via molecular site wieght calibration" by Simon Berger available for download here tech report PDF

File Conversion scripts

Wrapper Scripts

Apurva Narechania at the American Museum of Natural history has kindly put togetehr a couple of wrapper scripts for RAxML :-)
  • raxml_launch_serially.sh: A simple shell script that launches one job after the other awaiting for completion of each job.
  • raxml_nexusPartConvert.pl: A Perl script that parses a partitioned alignment in Nexus format with charsets and produces a partition guide file to be fed to RAxML with -q. Preliminary - works with DNA or AA, but not the two together yet, so not suitable for mixed-molecule data. Unless the output gets redirected to a file with ">", it will appear on screen.
  • raxml_wrapper.pl: A Perl script that reads a raxml.config file with common run parameters and executes a directory of Phylip alignment files in batch, then outputs the results in another directory. See the documentation with "perldoc ./raxml_wrapper.pl".

Guy Leonard at Exeter has updated his wrapper environment called easyRax

Alexis has developed a couple of perls scripts

perl script for computing bootstrap branch lengths with RAxML. This script can be used to perform the following task with RAxML:
  • Given a best-known ML tree, generate a number of Bootstrap replicates and just re-estimate the branch lengths for that given fixed tree topology on each Bootstrap replicate.
  • To invoke the script call it as follows: "perl bsBranchLengths.pl alignmentFileName treeFileName numberOfReplicates".  The script assumes that the RAxML executable is located in the directory where you execute it. Otherwise, if RAxML is located in your Linux/Unix path just replace every occurence of "./raxmlHPC" by "raxmlHPC" in the script. The bootstrapped trees with branch lengths will be written into a file called "bsTrees"
  • This script is intended for use with programs that infer divergence time estimates.
A perl script for finding the best protein substitution model
  • Here is a little perl-script that will automatically determine the best-scoring AA substitution model on a fixed starting tree.  Note that raxmlHPC must be in your $PATH for this to work.
  • For unpartitioned datasets execute it like this: perl ProteinModelSelection.pl alignmentFile.phylip > outfile The outfile will then contain the best-scoring AA model to use with RAxML.
  • For partitioned datasets execute it like this: perl ProteinModelSelection.pl alignmentFile.phylip partitionData.txt > outfile The outfile will then contain the best-scoring AA model for every partition.
James Munro has written a Guide to install RAxML on MACs

Olaf Bininda-Emonds has written batchRAxML.pl. This nice script by my good colleague from Munich times Olaf Bininda-Emonds provides a wrapper around RAxML to easily analyze a set of data files according to a common set of the search criteria. Also organizes the RAxML output into a set of subdirectories.

Frank Kauff
has written PYRAXML2. Frank Kauff at University of Kaiserslautern (formerly at Duke University) has written this cool script that reads NEXUS-style data files and prepares the necessary input files and command-line options for RAxML-VI-HPC. You can download the BETA-version here: PYRAXML2 It requires PYTHON and BIOPYTHON to be installed on your computer.

On-Line Material for some old papers

Material (alignments) for 2008 Systematic Biology paper on the rapid bootstrap algorithm
  • test datasets available here

Material (test datasets) for 2007 Supercomputing paper on parallelizing RAxML on the IBM BlueGene/L
  • test datasets available here
Material for  HICOMB2006 paper: “Phylogenetic Models of Rate Heterogeneity: A High Performance Computing Perspective"
  • Click here for a table with the experimental raw data

Material for HPCC05 paper: “Parallel Divide-and-Conquer Phylogeny Reconstruction by Maximum Likelihood”
Material on RAxML-VI performance:
  • 1,000 taxa plot alignment Alignment of 1,000 sequences from the ARB database containing Eucarya, Bacteria, Archaea by Harald Meier, TU München
  • 1,497 taxa plot Alignment of 1,497 Bacteria by Josh Wilcox, Pace Lab, University of Colorado at Boulder, for more information on this alignment please contact the Pace Lab
  • 1,663 taxa plot alignment Alignment of 1,663 sequences from the ARB database containing Eucarya, Bacteria, Archaea by Harald Meier, TU München
  • 1,728 taxa plot alignment Alignment of 1,728 Archaea by Chuck Robertson, Pace Lab, University of Colorado at Boulder
  • 2,000 taxa plot alignment Ribosomal RNA sequences by Gutell Lab, University of Texas at Austin, for more information on this alignment please contact Robin Gutell
  • 2,560 taxa plot alignment upon request via email Kallersjo, M., et al., Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants and flowering plants. Pl. Syst. Evol., 1998. 213: p. 259-287.
  • 4,114 taxa plot alignment 16S ribosomal Actinobacteria RNA sequences, by Usman Roshan, New Jersey Institute of Technology
  • 6,722 taxa plot alignment Ribosomal RNA sequences by Gutell Lab, University of Texas at Austin, for more information on this alignment please contact Robin Gutell
  • 7,769 taxa plot alignment Ribosomal RNA sequences by Gutell Lab, University of Texas at Austin, for more information on this alignment please contact Robin Gutell
  • 8,780 taxa plot alignment Alignment of 8,780 sequences from the ARB database containing Eucarya, Bacteria, Archaea. Original alignment by Harald Meier, TU München, modified by Usman Roshan, New Jersey Institute of Technology
  • 25,057 taxa plot alignment Alignment of 25,057 Protobacteria, by Usman Roshan, New Jersey Institute of Technology
Old Alignment Benchmark Set

The old Alignment Benchmark set: includes some large real-world alignments and best-known trees for those alignments


ChromatoGate 1.2

A code for analyzing/editing chromatogram data by Nikos Alachiotis: Windows code available for download here
A manual with step-by-step instructions is available for download here PDF

ChromatoGate (CG) accelerates the process of detecting potential errors in DNA sequences that have been introduced/generated by Sanger sequencing.
To detect errors, CG starts from a multiple sequence alignment instead of inspecting every sequence and chromatogram separately prior to alignment.
CG does not align nor change anything in the sequences, that is, it does not automatically remove potential sequencing errors. It implements a series of user-controlled steps that are required in the multiple sequence alignment generation and correction process. During the alignment generation procedure (relying on any external MSA tool), the tool gathers information about alignment gaps, trimmed sequence edges, forward/reversed/consensus sequences, and corrections that have already been applied to the sequences by the user. Using this collected information, CG detects and reports chromatogram peaks to the user for thoses bases in the sequence alignment that have been identified as "problematic" based on a user-defined threshold.


AxParafit: Highly optimized and parallelized version of Parafit


What do the Programs do?

AxParafit and AxPcoords are highly optimized versions of Pierre Legendre's Parafit and DistPCoA programs for statistical analysis of host-parasite coevolution. AxParafit has also been parallelized with MPI (Message Passing Interface) for compute clusters. We have used parallel AxParafit to carry out the largest co-evolutionary analysis to date for the paper describing the software.

Citing  AxParafit & AxPcoords: When publishing results using AxParafit or AxPcoords please cite the following papers:
If you also used the CopyCat tool in your analyses, please cite:

Manual, Source Code (under GPL), and Binaries
Link to CopyCat version with AxParafit/AxPcoords

Some pre-compiled Binaries:
Libraries required for compiling fast version:
Results and data from the paper: An empirical Study of Smut Fungi and their Hosts


Tree Visualization Tool (pretty old)

Phylogenetic Visualization tool using treemaps & taxonomic information. The screenshot below was taken from the visualization of a phylogenetic tree containing 2415 mammalian sequences.



MrBayes

A hybrid MPI/OpenMP version of MrBayes v3.1.2 by Alexis Stamatakis and Wayne Pfeiffer

Download a hybrid MPI/OpenMP parallelization of MrBayes. DNA and Protein models work correctly, you will probably need an Intel compiler (icc) to produce fast code. By using this component you agree to cite it as:
F. Pratas, P. Trancoso, A. Stamatakis, L. Sousa:  "Fine-grain parallelism using Multi-core, Cell/BE, and GPU systems: Accelerating the Phylogenetic Likelihood Function". Proceedings of ICPP 2009, accepted for publication, Vienna, Austria, September 2009.  PDF
and
F. Ronquist, J.P. Huelsenbeck "MrBayes  3: Bayesian Phylogenetic Inference  under mixed models",  Bioinformatics 19(12):1572-1574,  2003.

Some performance data: PDF