Open Access
SURVEY AND SUMMARY The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
Reads0
Chats0
TLDR
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants as mentioned in this paper.Abstract:
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/ Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.read more
Citations
More filters
Journal ArticleDOI
VSEARCH: a versatile open source tool for metagenomics
Torbjørn Rognes,Torbjørn Rognes,Tomas Flouri,Tomas Flouri,Ben Nichols,Christopher Quince,Christopher Quince,Frédéric Mahé +7 more
TL;DR: VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with US EARCH for paired-ends read merging and dereplication.
Journal ArticleDOI
PEAR: a fast and accurate Illumina Paired-End reAd mergeR
TL;DR: The PEAR software for merging raw Illumina paired-end reads from target fragments of varying length evaluates all possible paired- end read overlaps and does not require the target fragment size as input, and implements a statistical test for minimizing false-positive results.
Journal ArticleDOI
NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.
Ravi K. Patel,Mukesh K. Jain +1 more
TL;DR: The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis and analysis (statistics tools).
Journal ArticleDOI
The sequence read archive.
TL;DR: The content and structure of the SRA is presented, support for sequencing platforms and recommended data submission levels and formats are provided and the response to the challenge of data growth is outlined.
Journal ArticleDOI
PANDAseq: paired-end assembler for illumina sequences
TL;DR: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads and shows significant improvements over naïve assembly with negligible loss of "good" sequence.
References
More filters
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI
Improved tools for biological sequence comparison.
TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Journal ArticleDOI
EMBOSS: The European Molecular Biology Open Software Suite
TL;DR: The European Molecular Biology Open Software Suite is a mature package of software tools developed for the molecular biology community that includes a comprehensive set of applications for molecular sequence analysis and other tasks and integrates popular third-party software packages under a consistent interface.
Journal ArticleDOI
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
Daniel R. Zerbino,Ewan Birney +1 more
TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.