PEAR - Paired-End reAd mergeR

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.

PEAR evaluates all possible paired-end read overlaps and without requiring the target fragment size as input. In addition, it implements a statistical test for minimizing false-positive results. Together with a highly optimized implementation, it can merge millions of paired end reads within a couple of minutes on a standard desktop computer.

Introduction

What is PEAR?

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory. PEAR is distributed under the Creative Commons license, and it runs on the command-line under Linux and UNIX based operating systems.

Installing PEAR

PEAR is available as source-code and also in the form of precompiled binaries.

Obtaining PEAR

PEAR sources and binaries are available on the official website. Binaries are available for Intel based architectures (i386 and x86_64) running Linux. If you intend to compile PEAR from source, make sure to get the source package.

Building from source

You will need to install GNU Autotools and GNU Libtool for compiling the source code. If you are using a Debian based Linux distribution (e.g. Ubuntu), you can install all dependencies by running:
apt-get install build-essential autoconf automake libtool

Building PEAR from source requires a C compiler such as GCC, a Makefile parser such as GNU Make, and the POSIX Threads library installed on the deployment platform.

git clone https://github.com/xflouris/PEAR.git
cd PEAR
./autogen.sh
./configure
make
sudo make install
Alternatively, in case you do not have superuser access on the machine you are installing PEAR, you may change the installation directory using the --prefix swich when running configure:
mkdir $HOME/pear
git clone https://github.com/xflouris/PEAR.git
cd PEAR
./autogen.sh
./configure --prefix=$HOME/pear
make
make install
The above sequence of commands will install PEAR in a the directory $HOME/pear.

Command-line

Usage

Currently, there is no graphical user interface (GUI) for PEAR. PEAR runs in console-mode and takes a number of mandatory and optional arguments which are explained in the following sections. The optional arguments affect the process of assemblying.

Main arguments

-f <str>

Specify the name of file that contains the forward paired-end reads.

-r <str>

Specify the name of file that contains the reverse paired-end reads.

-o <str>

Specify the name to be used as base for the output files. PEAR outputs four files. A file containing the assembled reads with a assembled.fastq extension, two files containing the forward, resp. reverse, unassembled reads with extensions unassembled.forward.fastq, resp. unassembled.reverse.fastq, and a file containing the discarded reads with a discarded.fastq extension.

Optional arguments

-p <str>

Specify a p-value for the statistical test. If the computed p-value of a possible assembly exceeds the specified p-value then the paired-end read will not be assembled. Valid options are: 0.0001, 0.001, 0.01, 0.05 and 1.0. Setting 1.0 disables the test. (default: 0.01)

-v <int>

Specify the minimum overlap size. The minimum overlap may be set to 1 when the statistical test is used. However, further restricting the minimum overlap size to a proper value may reduce false-positive assembles. (default: 10)

-m <int>

Specify the maximum possible length of the assembled sequences. Setting this value to 0 disables the restriction and assembled sequences may be arbitrary long. (default: 0)

-n <int>

Specify the minimum possible length of the assembled sequences. Setting this value to 0 disables the restriction and assembled sequences may be arbitrary short. (default: 50)

-t <int>

Specify the minimum length of reads after trimming the low quality part (see option -q). (default: 1)

-q <int>

Specify the quality score threshold for trimming the low quality part of a read. If the quality scores of two consecutive bases are strictly less than the specified threshold, the rest of the read will be trimmed. (default: 0)

-u <float>

Specify the maximal proportion of uncalled bases in a read. Setting this value to 0 will cause PEAR to discard all reads containing uncalled bases. The other extreme setting is 1 which causes PEAR to process all reads independent on the number of uncalled bases. (default: 1)

-g <int>

Specify the type of statistical test. Two options are available. (default: 1)

  1. Given the minimum allowed overlap, test using the highest OES. Note that due to its discrete nature, this test usually yields a lower p-value for the assembled read than the cut-off (specified by -p). For example, setting the cut-off to 0.05 using this test, the assembled reads might have an actual p-value of 0.02.
  2. Use the acceptance probability (m.a.p). This test methods computes the same probability as test method 1. However, it assumes that the minimal overlap is the observed overlap with the highest OES, instead of the one specified by -v. Therefore, this is not a valid statistical test and the 'p-value' is in fact the maximal probability for accepting the assembly. Nevertheless, we observed in practice that for the case the actual overlap sizes are relatively small, test 2 can correctly assemble more reads with only slightly higher false-positive rate.

-e

Disable empirical base frequencies. (default: use empirical base frequencies)

-e <int>

Specify the scoring method. (default: 2)

  1. OES with +1 for match and -1 for mismatch.
  2. Assembly score (AS). Use +1 for match and -1 for mismatch multiplied by base quality scores.
  3. Ignore quality scores and use +1 for a match and -1 for a mismatch.

-b <int>

Base PHRED quality score. (default: 33)

-y <str>

Specify the amount of memory to be used. The number may be followed by one of the letters K, M, or G denoting Kilobytes, Megabytes and Gigabytes, respectively. Bytes are assumed in case no letter is specified.

-j <int>

Number of threads to use

Getting support

Mailing list

A mailing list is set for handling bug reports, feature requests and user support. You may subscribe here.

Bug reports and inquiries

Found a bug, need something implemented in PEAR or need some help? Let us know about it! However, do not contact us directly. Instead, use the mailing list such that we avoid duplicate inquiries.

Important notice

The PEAR creative commons license prohibits commercial use of the code. For testing and using PEAR on a commercial basis you need to purchase a commercial software license. If you wish to purchase such a license please contact:
Prof. Alexandros Stamatakis

To download PEAR please go to the new download page