scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies.

Yong Kong1
01 Aug 2011-Genomics (Academic Press)-Vol. 98, Iss: 2, pp 152-153
TL;DR: Btrim is a fast and lightweight software to trim adapters and low quality regions in reads from ultra high-throughput next-generation sequencing machines and can reliably identify barcodes and assign the reads to the original samples.
About: This article is published in Genomics.The article was published on 2011-08-01 and is currently open access. It has received 480 citations till now. The article focuses on the topics: Adapter (computing).
Citations
More filters
Journal ArticleDOI
TL;DR: SOAPnuke is demonstrated as a tool with abundant functions for a “QC-Preprocess-QC” workflow and MapReduce acceleration framework that enables large scalability to distribute all the processing works to an entire compute cluster.
Abstract: Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a "QC-Preprocess-QC" workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments, respectively. As a workflow-like tool, SOAPnuke centralizes processing functions into 1 executable and predefines their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster.We conducted a benchmarking where SOAPnuke and other tools are used to preprocess a ∼30× NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved ∼5.7 times the fastest speed of other tools.

1,043 citations

Journal ArticleDOI
TL;DR: A novel algorithm, the bit-masked k-difference matching algorithm, which has O(kn) expected time with O(m) space, where k is the maximum number of differences allowed, n is the read length, and m is the adapter length is devised, which achieves as yet unmatched accuracies for adapter trimming with low time bound.
Abstract: Adapter trimming is a prerequisite step for analyzing next-generation sequencing (NGS) data when the reads are longer than the target DNA/RNA fragments. Although typically used in small RNA sequencing, adapter trimming is also used widely in other applications, such as genome DNA sequencing and transcriptome RNA/cDNA sequencing, where fragments shorter than a read are sometimes obtained because of the limitations of NGS protocols. For the newly emerged Nextera long mate-pair (LMP) protocol, junction adapters are located in the middle of all properly constructed fragments; hence, adapter trimming is essential to gain the correct paired reads. However, our investigations have shown that few adapter trimming tools meet both efficiency and accuracy requirements simultaneously. The performances of these tools can be even worse for paired-end and/or mate-pair sequencing. To improve the efficiency of adapter trimming, we devised a novel algorithm, the bit-masked k-difference matching algorithm, which has O(k n) expected time with O(m) space, where k is the maximum number of differences allowed, n is the read length, and m is the adapter length. This algorithm makes it possible to fully enumerate all candidates that meet a specified threshold, e.g. error ratio, within a short period of time. To improve the accuracy of this algorithm, we designed a simple and easy-to-explain statistical scoring scheme to evaluate candidates in the pattern matching step. We also devised scoring schemes to fully exploit the paired-end/mate-pair information when it is applicable. All these features have been implemented in an industry-standard tool named Skewer ( https://sourceforge.net/projects/skewer ). Experiments on simulated data, real data of small RNA sequencing, paired-end RNA sequencing, and Nextera LMP sequencing showed that Skewer outperforms all other similar tools that have the same utility. Further, Skewer is considerably faster than other tools that have comparative accuracies; namely, one times faster for single-end sequencing, more than 12 times faster for paired-end sequencing, and 49% faster for LMP sequencing. Skewer achieved as yet unmatched accuracies for adapter trimming with low time bound.

1,028 citations


Cites background or methods from "Btrim: a fast, lightweight adapter ..."

  • ...Further improvements in Ukkonen’s algorithm by bitwise parallelism were proposed by Myer [5] and implemented in Btrim [6], which has a time complexity of O(mn/w), where w is the word length of the computer; e.g. w equals 64 for a 64-bit machine....

    [...]

  • ...Other adapter trimmers showed advantages on a specific metric; e.g. AdapterRemoval, Flexbar, and EATools were the most sensitive, while TagCleaner, Btrim, and Scythe were the most conservative....

    [...]

  • ...%) for processing SE reads and was orders of magnitude faster (13X ∼ 400X) than the slow trimmers; Cutadapt, the most widely accepted adapter trimmer, exhibited a good compromise between sensitivity and specificity (96.27% vs. 96.93%), and had the highest mCC (0.9286) among the existing tools for processing SE reads; TrimGalore, a wrapper for Cutadapt, had a performance that was equivalent to EA-tools with default settings, but it was considerably slower than EAtools (28.2% ∼ 31.6% of the speed); SeqPrep, a dedicated PE reads adapter trimmer and merger, had the highest mCC (0.9975) among the existing tools for processing PE reads, but it was slow (0.64Mbp/s); Btrim had the highest speed (23.63Mbp/s) for adapter trimming, but it had low sensitivity (53.44%); Scythe had an mCC similar to that of Cutadapt for SE reads adapter trimming, but was more conservative; Flexbar had slightly lowermetrics and about 20% lower processing speed than TrimGalore; Trimmomatic was among the most conservative ones, but it had an acceptable sensitivity (72.31...

    [...]

  • ...%) and a relatively high speed (16.73Mbp/s); AlienTrimmer had similar metrics to Btrim, but was much slower (1.64Mbp/s); and AdapterRemoval had a similar overall performance as SeqPrep for PE reads processing, but unlike SeqPrep it can also handle SE reads....

    [...]

  • ...After investigating the processed data, we found that Btrim could recognize only the occurrence of the whole adapter sequence with a limited tolerance for insertions and deletions....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas and found that sun-exposed melanomas had markedly more ultraviolet-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas.
Abstract: We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 92% of sun-exposed melanomas This activating mutation, the third most frequent in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1P29S) in the highly conserved switch I domain Crystal structures, and biochemical and functional studies of RAC1P29S showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit

1,024 citations

Journal ArticleDOI
TL;DR: This study unveils general principles underlying NSC activation and lineage priming and opens potential avenues for regenerative medicine in the brain.

605 citations

Journal ArticleDOI
TL;DR: AdaptersRemoval is shown to be good at trimming adapters from both single-end and paired-end data, and it exhibits good performance both in terms of sensitivity and specificity.
Abstract: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5’ and 3’ ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data.

533 citations


Cites background from "Btrim: a fast, lightweight adapter ..."

  • ...Btrim Yes Yes Yes Yes No No No Yes Yes [7,8]...

    [...]

References
More filters
Journal ArticleDOI
Gene Myers1
TL;DR: An algorithm of comparable simplicity that requires only O(kn/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the approximate string matching problem, and is found to be more efficient than the previous results for many choices of k and small.
Abstract: The approximate string matching problem is to find all locations at which a query of lengthm matches a substring of a text of length n with k-or-fewer differences. Simple and practical bit-vector algorithms have been designed for this problem, most notably the one used in agrep. These algorithms compute a bit representation of the current state-set of the k-difference automaton for the query, and asymptotically run in either O(nm/w) or O(nm log s/w) time where w is the word size of the machine (e.g., 32 or 64 in practice), and s is the size of the pattern alphabet. Here we present an algorithm of comparable simplicity that requires only O(nm/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the problem. Thus, the algorithm's performance is independent of k, and it is found to be more efficient than the previous results for many choices of k and smallm.Moreover, because the algorithm is not dependent on k, it can be used to rapidly compute blocks of the dynamic programming matrix as in the 4-Russians algorithm of Wu et al.(1996). This gives rise to an O(kn/w) expected-time algorithm for the case where m may be arbitrarily large. In practice this new algorithm, that computes a region of the dynamic progr amming (d.p.) matrx w entries at a time using the basic algorithm as a subroutine is significantly faster than our previous 4-Russians algorithm, that computes the same region 4 or 5 entries at a time using table lookup. This performance improvement yields a code that is either superior or competitive with all existing algorithms except for some filtration algorithms that are superior when k/m is sufficiently small.

483 citations


"Btrim: a fast, lightweight adapter ..." refers background or methods in this paper

  • ...For adapter trimming, the program is based on modified Myers’s bit-vector dynamic programming algorithm Myers (1999)....

    [...]

  • ...For adapter trimming, the program is based on modified Myers’s bit-vector dynamic programming algorithm Myers (1999). For quality trimming, a simple moving window algorithm is used and the reads are trimmed at the point where the average quality score within the window drops below a threshold....

    [...]

  • ...As one of the fastest dynamic programming algorithms available with edit distance as the error model (each mismatch, insertion, or deletion counts as one error), the Myers’s bit-vector dynamic programming algorithm finds all locations at which the query matches a substring of the target sequence of length n with k or fewer errors. The algorithm scales linearly with the length of the target sequence (n) when the length of the query is less than the machine word size w (typically, w = 32 for 32-bit machines and w = 64 for 64-bit machines), regardless of k or query length Myers (1999). Before the search starts, the algorithm pre-processes the query sequences....

    [...]

  • ...The algorithm scales linearly with the length of the target sequence (n) when the length of the query is less than the machine word size w (typically, w = 32 for 32-bit machines and w = 64 for 64-bit machines), regardless of k or query length Myers (1999)....

    [...]