scispace - formally typeset

FASTQ format

About: FASTQ format is a(n) research topic. Over the lifetime, 369 publication(s) have been published within this topic receiving 61256 citation(s). The topic is also known as: FASTQ. more


Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTP352
Heng Li1, Bob Handsaker2, Alec Wysoker2, T. J. Fennell2  +5 moreInstitutions (4)
01 Aug 2009-Bioinformatics
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: Contact: [email protected] more

Topics: Variant Call Format (62%), Stockholm format (61%), FASTQ format (56%) more

35,747 Citations

Open accessJournal ArticleDOI: 10.1002/0471250953.BI1110S43
Abstract: This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. more

Topics: FASTQ format (52%)

4,001 Citations

Open accessJournal ArticleDOI: 10.7717/PEERJ.2584
Torbjørn Rognes1, Torbjørn Rognes2, Tomas Flouri3, Tomas Flouri4  +4 moreInstitutions (7)
18 Oct 2016-PeerJ
Abstract: Background: VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods: When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results: VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion: VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community. more

Topics: FASTQ format (53%)

3,673 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTR026
Robert Schmieder1, Robert Edwards1Institutions (1)
01 Mar 2011-Bioinformatics
Abstract: Summary: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis. Availability and Implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at Contact:[email protected]; [email protected] more

Topics: FASTQ format (55%), Perl (52%)

3,347 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTY560
Shifu Chen1, Yanqing Zhou, Yaru Chen, Jia Gu1Institutions (1)
01 Sep 2018-Bioinformatics
Abstract: Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at more

Topics: FASTQ format (65%), Preprocessor (50%)

2,501 Citations

No. of papers in the topic in previous years

Top Attributes

Show by:

Topic's top 5 most impactful authors

Tsachy Weissman

6 papers, 181 citations

Zexuan Zhu

5 papers, 77 citations

Idoia Ochoa

5 papers, 105 citations

Jorge González-Domínguez

4 papers, 42 citations

Szymon Grabowski

4 papers, 224 citations

Network Information
Related Topics (5)
Reference genome

5.5K papers, 367.5K citations

83% related
Sequence assembly

4.3K papers, 322.5K citations

82% related
Biological network

5.1K papers, 244.7K citations

80% related
Multiple sequence alignment

4.4K papers, 637.7K citations

80% related
Sequence database

1.3K papers, 280.7K citations

79% related