scispace - formally typeset
Search or ask a question
Journal ArticleDOI

fastp: an ultra-fast all-in-one FASTQ preprocessor.

01 Sep 2018-Bioinformatics (Oxford University Press)-Vol. 34, Iss: 17
TL;DR: Fastp is developed as an ultra‐fast FASTQ preprocessor with useful quality control and data‐filtering features that can perform quality control, adapter trimming, quality filtering, per‐read quality pruning and many other operations with a single scan of the FAST Q data.
Abstract: Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The expression of proinflammatory genes, especially chemokines, was markedly elevated in COVID-19 cases compared to community-acquired pneumonia patients and healthy controls, suggesting that SARS-CoV-2 infection causes hypercytokinemia.

767 citations

Journal ArticleDOI
TL;DR: In this article, using monoclonal antibodies (mAbs), animal immune sera, human convalescent sera and human sera from recipients of the BNT162b2 mRNA vaccine, the authors report the impact on antibody neutralization of a panel of authentic SARS-CoV-2 variants including a B.1.7 isolate, chimeric strains with South African or Brazilian spike genes and isogenic recombinant viral variants.
Abstract: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused the global COVID-19 pandemic. Rapidly spreading SARS-CoV-2 variants may jeopardize newly introduced antibody and vaccine countermeasures. Here, using monoclonal antibodies (mAbs), animal immune sera, human convalescent sera and human sera from recipients of the BNT162b2 mRNA vaccine, we report the impact on antibody neutralization of a panel of authentic SARS-CoV-2 variants including a B.1.1.7 isolate, chimeric strains with South African or Brazilian spike genes and isogenic recombinant viral variants. Many highly neutralizing mAbs engaging the receptor-binding domain or N-terminal domain and most convalescent sera and mRNA vaccine-induced immune sera showed reduced inhibitory activity against viruses containing an E484K spike mutation. As antibodies binding to spike receptor-binding domain and N-terminal domain demonstrate diminished neutralization potency in vitro against some emerging variants, updated mAb cocktails targeting highly conserved regions, enhancement of mAb potency or adjustments to the spike sequences of vaccines may be needed to prevent loss of protection in vivo.

716 citations

Journal ArticleDOI
25 Jan 2021-Science
TL;DR: In this article, the authors map how all mutations to the receptor binding domain (RBD) of SARS-CoV-2 affect binding by the antibodies in the REGN-COV2 cocktail and the antibody LYCoV016.
Abstract: Antibodies are a potential therapy for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), but the risk of the virus evolving to escape them remains unclear. Here we map how all mutations to the receptor binding domain (RBD) of SARS-CoV-2 affect binding by the antibodies in the REGN-COV2 cocktail and the antibody LY-CoV016. These complete maps uncover a single amino acid mutation that fully escapes the REGN-COV2 cocktail, which consists of two antibodies, REGN10933 and REGN10987, targeting distinct structural epitopes. The maps also identify viral mutations that are selected in a persistently infected patient treated with REGN-COV2 and during in vitro viral escape selections. Finally, the maps reveal that mutations escaping the individual antibodies are already present in circulating SARS-CoV-2 strains. These complete escape maps enable interpretation of the consequences of mutations observed during viral surveillance.

620 citations

Journal ArticleDOI
07 May 2020-Nature
TL;DR: It is shown that a coronavirus isolated from a Malayan pangolin has 100%, 98.6%, 97.8% and 90.7% amino acid identity with SARS-CoV-2 in the E, M, N and S proteins, respectively, which suggests that the latter may have originated from a recombination event involving Sars-related coronaviruses from bats and pangolins.
Abstract: The current outbreak of coronavirus disease-2019 (COVID-19) poses unprecedented challenges to global health1. The new coronavirus responsible for this outbreak-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-shares high sequence identity to SARS-CoV and a bat coronavirus, RaTG132. Although bats may be the reservoir host for a variety of coronaviruses3,4, it remains unknown whether SARS-CoV-2 has additional host species. Here we show that a coronavirus, which we name pangolin-CoV, isolated from a Malayan pangolin has 100%, 98.6%, 97.8% and 90.7% amino acid identity with SARS-CoV-2 in the E, M, N and S proteins, respectively. In particular, the receptor-binding domain of the S protein of pangolin-CoV is almost identical to that of SARS-CoV-2, with one difference in a noncritical amino acid. Our comparative genomic analysis suggests that SARS-CoV-2 may have originated in the recombination of a virus similar to pangolin-CoV with one similar to RaTG13. Pangolin-CoV was detected in 17 out of the 25 Malayan pangolins that we analysed. Infected pangolins showed clinical signs and histological changes, and circulating antibodies against pangolin-CoV reacted with the S protein of SARS-CoV-2. The isolation of a coronavirus from pangolins that is closely related to SARS-CoV-2 suggests that these animals have the potential to act as an intermediate host of SARS-CoV-2. This newly identified coronavirus from pangolins-the most-trafficked mammal in the illegal wildlife trade-could represent a future threat to public health if wildlife trade is not effectively controlled.

570 citations

Journal ArticleDOI
TL;DR: Critically, and in a similar manner to SARS-CoV-2, RmYN02 was characterized by the insertion of multiple amino acids at the junction site of the S1 and S2 subunits of the spike (S) protein, providing strong evidence that such insertion events can occur naturally in animal betacoronaviruses.

503 citations


Cites methods from "fastp: an ultra-fast all-in-one FAS..."

  • ...Raw reads were obtained from the 56 pools and were then adaptor- and quality- trimmed 316 with the Fastp program [22]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations


"fastp: an ultra-fast all-in-one FAS..." refers background in this paper

  • ...Cutadapt (Martin, 2011) is a commonly used adapter trimmer, which also provides some read-filtering features....

    [...]

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Abstract: Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic Contact: ed.nehcaa-htwr.1oib@ledasu Supplementary information: Supplementary data are available at Bioinformatics online.

39,291 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations

Journal ArticleDOI
TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Abstract: When small RNA is sequenced on current sequencing machines, the resulting reads are usually longer than the RNA and therefore contain parts of the 3' adapter. That adapter must be found and removed error-tolerantly from each read before read mapping. Previous solutions are either hard to use or do not offer required features, in particular support for color space data. As an easy to use alternative, we developed the command-line tool cutadapt, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features. Cutadapt, including its MIT-licensed source code, is available for download at http://code.google.com/p/cutadapt/

20,255 citations


"fastp: an ultra-fast all-in-one FAS..." refers background in this paper

  • ...Cutadapt (Martin, 2011) is a commonly used adapter trimmer, which also provides some read-filtering features....

    [...]

Trending Questions (1)
Why FastQ files need to quality control?

FastQ files need quality control to ensure that the data is clean and reliable for downstream analysis.