SURVEY AND SUMMARY The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

Open Access

SURVEY AND SUMMARY The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

Chats0

TLDR

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants as mentioned in this paper.

Abstract:

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/ Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

VSEARCH: a versatile open source tool for metagenomics

Torbjørn Rognes, +7 more

- 18 Oct 2016 -

PeerJ

TL;DR: VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with US EARCH for paired-ends read merging and dereplication.

...read moreread less

Journal ArticleDOI

PEAR: a fast and accurate Illumina Paired-End reAd mergeR

Jiajie Zhang, +3 more

- 01 Mar 2014 -

Bioinformatics

TL;DR: The PEAR software for merging raw Illumina paired-end reads from target fragments of varying length evaluates all possible paired- end read overlaps and does not require the target fragment size as input, and implements a statistical test for minimizing false-positive results.

...read moreread less

Journal ArticleDOI

NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.

Ravi K. Patel, +1 more

- 01 Feb 2012 -

PLOS ONE

TL;DR: The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis and analysis (statistics tools).

...read moreread less

Journal ArticleDOI

The sequence read archive.

Rasko Leinonen, +2 more

- 01 Jan 2011 -

Nucleic Acids Research

TL;DR: The content and structure of the SRA is presented, support for sequencing platforms and recommended data submission levels and formats are provided and the response to the challenge of data growth is outlined.

...read moreread less

Journal ArticleDOI

PANDAseq: paired-end assembler for illumina sequences

Andre P. Masella, +4 more

- 14 Feb 2012 -

BMC Bioinformatics

TL;DR: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads and shows significant improvements over naïve assembly with negligible loss of "good" sequence.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, +1 more

- 01 Jul 2009 -

Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

Journal ArticleDOI

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Ben Langmead, +3 more

- 04 Mar 2009 -

Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Journal ArticleDOI

Improved tools for biological sequence comparison.

William R. Pearson, +1 more

- 01 Apr 1988 -

Proceedings of the National Academy of S...

TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.

...read moreread less

Journal ArticleDOI

EMBOSS: The European Molecular Biology Open Software Suite

Peter Rice, +2 more

- 01 Jun 2000 -

Trends in Genetics

TL;DR: The European Molecular Biology Open Software Suite is a mature package of software tools developed for the molecular biology community that includes a comprehensive set of applications for molecular sequence analysis and other tasks and integrates popular third-party software packages under a consistent interface.

...read moreread less

Journal ArticleDOI

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

Daniel R. Zerbino, +1 more

- 01 May 2008 -

Genome Research

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less