The Exelixis Lab


Enabling Research in Evolutionary Biology

PaPaRa: PArsimony-based Phylogeny-Aware Read alignment program

A program for aligning short reads to reference phylogenies and alignments, by Simon A. Berger.

Last update: 2016-06-10.

Download

The easiest way is to use our precompiled binaries of PaPaRa 2.5:

  • For most Unix/Linux systems, use this static binary
  • For Mac, you can try this binary, compiled with OS X 10.11.5. Mac does not properly support static linking, so this might not work on different versions.

We however recommend to build PaPaRa on your own, for speed reasons. The compiler might be able to optimize better for your specific hardware. Also, you need to do this if the binaries do not work for your machine. See instructions below.

Usage

Invoke PaPaRa using

./papara -t <ref tree> -s <phylip RA> -q <fasta QS>

(or papara_static_x86_64, if you use the pre-compiled binary).

The phylip file (option -s) must contain the reference alignment (RA), consistent with the reference tree (option -t). The FASTA file (option -q) contains the unaligned query sequences (QS). Optionally, all sequences which are in <phylip RA> but do not occur in the <ref tree> are also interpreted as QS. The alignment parameters can be modified using the (optional) option -p <user_options>. <user options> is a string and must have the following form: <gap_open>:<gap_extend>:<mismatch>:<match_cgap>, so the default parameters used given in the paper correspond to the user option -p -3:-1:2:-3.

The output alignment will be written to papara_alignment.default. You can change the file suffix (i.e., “default”) by supplying a run-name with parameter -n. You can invoke the multi threaded version by adding the option -j <num threads>.

The latest source code and Readme are available at the PaPaRa GitHub repository.

Build Instructions

Get the source

If the provided binaries (see above) do not work, you need to compile PaPaRa on your own. On Unix/Linux systems, you first need the build tools. For example, on Debian based systems use

sudo apt-get install build-essential

Then, get the PaPaRa 2.5 source from here or download directly from the repository at

https://github.com/sim82/papara_nt

Unpack into papara_nt-master/.

If the sub-directory papara_nt-master/ivy_mike/ is empty, also download

https://github.com/sim82/ivy_mike/tree/3269b7b39dc6c129cfe72708d9086f1e8f8c2c98

and unpack its contents into papara_nt-master/ivy_mike/.

Get Boost

If you do not have the C++ Boost libraries installed on your system, you need to install them first.

For example, on Debian based systems use

sudo apt-get install libboost-all-dev

For Mac systems, call

brew install boost

which uses the package manager Homebrew.

Build

After that, compile PaPaRa by calling

sh build_papara2.sh

in papara_nt-master/. If you want a static binary, use sh build_papara2_static.sh instead. The latter only works on Unix/Linux systems, as Mac does not support static linking.

(Tested on a clean install Ubuntu 14.04 LTS Virtual Machine.)

Citation

When using the program or code, please cite:

S.A. Berger, A. Stamatakis
"Aligning short reads to reference alignments and trees"
Bioinformatics (2011) 27 (15): 2068-2075 first published online June 2, 2011. doi:10.1093/bioinformatics/btr320 

which is available here.

The faster and much improved version of PaPaRa 2.4/2.5 is described in the following technical report:

S.A. Berger, A. Stamatakis:
"PaPaRa 2.0: A Vectorized Algorithm forProbabilistic Phylogeny-Aware Alignment Extension",
Heidelberg, Institute for Theoretical Studies, Exelixis-RRDR-2012-5, March 2012. 

which is also available as PDF.

Previous Versions

We recommend to use the current version (see above). If you however need backwards compatibility, see here:

  • PaPara 2.4, used mainly between June 2014 and June 2016, is available here.
  • Before that, the significantly faster SSE3 vectorized version of PaPaRa 2.0 was introduced. It is available for download here.
  • The even older SSE3-vectorized and hybrid CPU/GPU-optimized version of PaPaRa is available for download here.

Also, see the PaPaRa GitHub repository.