scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Aligning Short Sequencing Reads with Bowtie

01 Dec 2010-Current protocols in human genetics (NIH Public Access)-Vol. 32, Iss: 1
TL;DR: This unit shows how to use the Bowtie package to align short sequencing reads, such as those output by second‐generation sequencing instruments, and includes protocols for building a genome index and calling consensus sequences from Bowtie alignments using SAMtools.
Abstract: This unit shows how to use the Bowtie package to align short sequencing reads, such as those output by second-generation sequencing instruments It also includes protocols for building a genome index and calling consensus sequences from Bowtie alignments using SAMtools

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: For example, miRDeep2 as mentioned in this paper identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples.
Abstract: microRNAs (miRNAs) are a large class of small non-coding RNAs which post-transcriptionally regulate the expression of a large fraction of all animal genes and are important in a wide range of biological processes. Recent advances in high-throughput sequencing allow miRNA detection at unprecedented sensitivity, but the computational task of accurately identifying the miRNAs in the background of sequenced RNAs remains challenging. For this purpose, we have designed miRDeep2, a substantially improved algorithm which identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples. Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6-99.9% and reported hundreds of novel miRNAs. To test the accuracy of miRDeep2, we knocked down the miRNA biogenesis pathway in a human cell line and sequenced small RNAs before and after. The vast majority of the >100 novel miRNAs expressed in this cell line were indeed specifically downregulated, validating most miRDeep2 predictions. Last, a new miRNA expression profiling routine, low time and memory usage and user-friendly interactive graphic output can make miRDeep2 useful to a wide range of researchers.

2,252 citations

Journal ArticleDOI
TL;DR: The findings provide the direct evidence that m(6)A reader YTHDC1 regulates mRNA splicing through recruiting and modulating pre-mRNA splicing factors for their access to the binding regions of targeted mRNAs.

1,244 citations

Journal ArticleDOI
03 May 2012-Nature
TL;DR: mTORC1 as mentioned in this paper is shown to regulate a translational program that requires the rapamycin-resistant 4E-BP family of translational repressors and consists almost entirely of mRNAs containing 5′ terminal oligopyrimidine or related motifs.
Abstract: mTORC1 is shown to regulate a translational program that requires the rapamycin-resistant 4E-BP family of translational repressors and consists almost entirely of mRNAs containing 5′ terminal oligopyrimidine or related motifs

1,193 citations

Journal ArticleDOI
TL;DR: The Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains is presented, demonstrating that the approach exhibits unrivaled speed while maintaining the accuracy of existing methods.
Abstract: Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

1,186 citations


Additional excerpts

  • ...Thus, comparative genomics has turned to highly efficient and accurate read mapping algorithms to carry out assembly-free analyses, spawning many mapping tools [49-52] and variant callers [53-55] for detecting SNPs and short Indels....

    [...]

Journal ArticleDOI
06 Jun 2013-Cell
TL;DR: It is concluded that cell-type-specific chromatin organization occurs at the submegabase scale and that architectural proteins shape the genome in hierarchical length scales.

1,092 citations

References
More filters
Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations


"Aligning Short Sequencing Reads wit..." refers methods in this paper

  • ...Figure 11.7.5 Output of the SAMtools consensus caller when calling SNPs from a simulated E. coli example dataset....

    [...]

  • ...The Sequence Alignment/Map format and SAMtools....

    [...]

  • ...More information about the SAM format is available in the MANUAL file included with the Bowtie package and on the SAMtools Web site at http://samtools.sourceforge.net/....

    [...]

  • ...See the SAMtools Web site at http://samtools.sourceforge.net for details about SAMtools output and command-line options....

    [...]

  • ...This protocol outlines how to accomplish this using the E. coli index and simulated E. coli reads that come with the Bowtie package, together with the SAMtools package....

    [...]

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations


"Aligning Short Sequencing Reads wit..." refers background in this paper

  • ...The Bowtie (Langmead et al., 2009) package enables ultrafast and memory-efficient alignment of large sets of sequencing reads to a reference sequence, such as the human genome....

    [...]

  • ...Key Reference Langmead et al., 2009....

    [...]

01 Jan 1994
TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.
Abstract: The charter of SRC is to advance both the state of knowledge and the state of the art in computer systems. From our establishment in 1984, we have performed basic and applied research to support Digital's business objectives. Our current work includes exploring distributed personal computing on multiple platforms, networking , programming technology, system modelling and management techniques, and selected applications. Our strategy is to test the technical and practical value of our ideas by building hardware and software prototypes and using them as daily tools. Interesting systems are too complex to be evaluated solely in the abstract; extended use allows us to investigate their properties in depth. This experience is useful in the short term in refining our designs, and invaluable in the long term in advancing our knowledge. Most of the major advances in information systems have come through this strategy, including personal computing, distributed systems, and the Internet. We also perform complementary work of a more mathematical flavor. Some of it is in established fields of theoretical computer science, such as the analysis of algorithms, computational geometry, and logics of programming. Other work explores new ground motivated by problems that arise in our systems research. We have a strong commitment to communicating our results; exposing and testing our ideas in the research and development communities leads to improved understanding. Our research report series supplements publication in professional journals and conferences. We seek users for our prototype systems among those with whom we have common interests, and we encourage collaboration with university researchers. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the Systems Research Center. All rights reserved. Authors' abstract We describe a block-sorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block of input …

2,753 citations


"Aligning Short Sequencing Reads wit..." refers methods in this paper

  • ...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994) to achieve both speed and space efficiency....

    [...]

  • ...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which in turn uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994)....

    [...]

Journal ArticleDOI
TL;DR: The FASTQ format is defined, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS.
Abstract: FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.

1,289 citations


"Aligning Short Sequencing Reads wit..." refers methods in this paper

  • ...Reads can be in the FASTA format (see APPENDIX 1B for information on FASTA), FASTQ format (Cock et al., 2010), or in a raw one-sequence-per-line format....

    [...]

Proceedings ArticleDOI
12 Nov 2000
TL;DR: A data structure whose space occupancy is a function of the entropy of the underlying data set is devised, which achieves sublinear space and sublinear query time complexity and is shown how to plug into the Glimpse tool.
Abstract: We address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because text T[1,u] is stored using O(H/sub k/(T))+o(1) bits per input symbol in the worst case, where H/sub k/(T) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P[1,p], the opportunistic data structure allows to search for the occurrences of P in T in O(p+occlog/sup /spl epsiv//u) time (for any fixed /spl epsiv/>0). If data are uncompressible we achieve the best space bound currently known (Grossi and Vitter, 2000); on compressible data our solution improves the succinct suffix array of (Grossi and Vitter, 2000) and the classical suffix tree and suffix array data structures either in space or in query time or both. We also study our opportunistic data structure in a dynamic setting and devise a variant achieving effective search and update time bounds. Finally, we show how to plug our opportunistic data structure into the Glimpse tool (Manber and Wu, 1994). The result is an indexing tool which achieves sublinear space and sublinear query time complexity.

1,188 citations


"Aligning Short Sequencing Reads wit..." refers methods in this paper

  • ...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which in turn uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994)....

    [...]

  • ...Index files can then be used by Bowtie to align reads to the reference genome....

    [...]

  • ...Indexes are compressed in the zip format....

    [...]

  • ...Indexes for commonly used reference genomes are also available for download from the Bowtie Web site at http://bowtie-bio.sf.net....

    [...]

  • ...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994) to achieve both speed and space efficiency....

    [...]