Aligning Short Sequencing Reads with Bowtie

doi:10.1002/0471250953.BI1107S32

Home
/
Papers
/
Aligning Short Sequencing Reads with Bowtie

Journal Article•DOI•

Aligning Short Sequencing Reads with Bowtie

Ben Langmead¹•Institutions (1)

Johns Hopkins University¹

01 Dec 2010-Current protocols in human genetics (NIH Public Access)-Vol. 32, Iss: 1

TL;DR: This unit shows how to use the Bowtie package to align short sequencing reads, such as those output by second‐generation sequencing instruments, and includes protocols for building a genome index and calling consensus sequences from Bowtie alignments using SAMtools.

read less

Abstract: This unit shows how to use the Bowtie package to align short sequencing reads, such as those output by second-generation sequencing instruments It also includes protocols for building a genome index and calling consensus sequences from Bowtie alignments using SAMtools

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades.

[...]

Marc R. Friedländer¹, Sebastian D. Mackowiak¹, Na Li¹, Wei Chen¹, Nikolaus Rajewsky¹ - Show less +1 more•Institutions (1)

Max Delbrück Center for Molecular Medicine¹

01 Jan 2012-Nucleic Acids Research

TL;DR: For example, miRDeep2 as mentioned in this paper identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples.

...read moreread less

Abstract: microRNAs (miRNAs) are a large class of small non-coding RNAs which post-transcriptionally regulate the expression of a large fraction of all animal genes and are important in a wide range of biological processes. Recent advances in high-throughput sequencing allow miRNA detection at unprecedented sensitivity, but the computational task of accurately identifying the miRNAs in the background of sequenced RNAs remains challenging. For this purpose, we have designed miRDeep2, a substantially improved algorithm which identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples. Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6-99.9% and reported hundreds of novel miRNAs. To test the accuracy of miRDeep2, we knocked down the miRNA biogenesis pathway in a human cell line and sequenced small RNAs before and after. The vast majority of the >100 novel miRNAs expressed in this cell line were indeed specifically downregulated, validating most miRDeep2 predictions. Last, a new miRNA expression profiling routine, low time and memory usage and user-friendly interactive graphic output can make miRDeep2 useful to a wide range of researchers.

...read moreread less

2,252 citations

Journal Article•DOI•

Nuclear m(6)A Reader YTHDC1 Regulates mRNA Splicing.

[...]

Wen Xiao¹, Samir Adhikari¹, Samir Adhikari², Ujwal Dahal², Ujwal Dahal¹, Yu-Sheng Chen², Yu-Sheng Chen¹, Ya-Juan Hao¹, Ya-Juan Hao², Bao-Fa Sun², Hui-Ying Sun², Hui-Ying Sun¹, Ang Li¹, Ang Li², Xiao-Li Ping², Weiyi Lai¹, Xing Wang¹, Xing Wang², Hai-Li Ma¹, Hai-Li Ma², Chun-Min Huang², Ying Yang², Niu Huang, Guibin Jiang¹, Hailin Wang¹, Qi Zhou¹, Xiu-Jie Wang¹, Yong-Liang Zhao², Yun-Gui Yang¹, Yun-Gui Yang² - Show less +26 more•Institutions (2)

Chinese Academy of Sciences¹, Beijing Institute of Genomics²

18 Feb 2016-Molecular Cell

TL;DR: The findings provide the direct evidence that m(6)A reader YTHDC1 regulates mRNA splicing through recruiting and modulating pre-mRNA splicing factors for their access to the binding regions of targeted mRNAs.

...read moreread less

1,244 citations

Journal Article•DOI•

A unifying model for mTORC1-mediated regulation of mRNA translation

[...]

Carson C. Thoreen¹, Lynne Chantranupong², Lynne Chantranupong³, Heather R. Keys², Heather R. Keys³, Timothy C. Wang², Nathanael S. Gray¹, David M. Sabatini³, David M. Sabatini² - Show less +5 more•Institutions (3)

Harvard University¹, Massachusetts Institute of Technology², Broad Institute³

03 May 2012-Nature

TL;DR: mTORC1 as mentioned in this paper is shown to regulate a translational program that requires the rapamycin-resistant 4E-BP family of translational repressors and consists almost entirely of mRNAs containing 5′ terminal oligopyrimidine or related motifs.

...read moreread less

Abstract: mTORC1 is shown to regulate a translational program that requires the rapamycin-resistant 4E-BP family of translational repressors and consists almost entirely of mRNAs containing 5′ terminal oligopyrimidine or related motifs

...read moreread less

1,193 citations

Journal Article•DOI•

The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

[...]

Todd J. Treangen, Brian D. Ondov, Sergey Koren, Adam M. Phillippy

19 Nov 2014-Genome Biology

TL;DR: The Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains is presented, demonstrating that the approach exhibits unrivaled speed while maintaining the accuracy of existing methods.

...read moreread less

Abstract: Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

...read moreread less

1,186 citations

Additional excerpts

...Thus, comparative genomics has turned to highly efficient and accurate read mapping algorithms to carry out assembly-free analyses, spawning many mapping tools [49-52] and variant callers [53-55] for detecting SNPs and short Indels....
[...]

Journal Article•DOI•

Architectural Protein Subclasses Shape 3D Organization of Genomes during Lineage Commitment

[...]

Jennifer E. Phillips-Cremins¹, Michael E.G. Sauria¹, Amartya Sanyal², Tatiana I. Gerasimova³, Bryan R. Lajoie², Joshua S.K. Bell¹, Chin-Tong Ong¹, Tracy A. Hookway⁴, Changying Guo³, Yuhua Sun⁵, Michael Bland¹, William Wagstaff¹, Stephen Dalton⁵, Todd C. McDevitt⁴, Ranjan Sen³, Job Dekker², James Taylor¹, Victor G. Corces¹ - Show less +14 more•Institutions (5)

Emory University¹, University of Massachusetts Medical School², Laboratory of Molecular Biology³, Georgia Institute of Technology⁴, University of Georgia⁵

06 Jun 2013-Cell

TL;DR: It is concluded that cell-type-specific chromatin organization occurs at the submegabase scale and that architectural proteins shape the genome in hierarchical length scales.

...read moreread less

1,092 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

"Aligning Short Sequencing Reads wit..." refers methods in this paper

...Figure 11.7.5 Output of the SAMtools consensus caller when calling SNPs from a simulated E. coli example dataset....
[...]
...The Sequence Alignment/Map format and SAMtools....
[...]
...More information about the SAM format is available in the MANUAL file included with the Bowtie package and on the SAMtools Web site at http://samtools.sourceforge.net/....
[...]
...See the SAMtools Web site at http://samtools.sourceforge.net for details about SAMtools output and command-line options....
[...]
...This protocol outlines how to accomplish this using the E. coli index and simulated E. coli reads that come with the Bowtie package, together with the SAMtools package....
[...]

Journal Article•DOI•

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

[...]

Ben Langmead¹, Cole Trapnell¹, Mihai Pop¹, Steven L. Salzberg¹•Institutions (1)

University of Maryland, College Park¹

04 Mar 2009-Genome Biology

TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.

...read moreread less

Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

...read moreread less

20,335 citations

"Aligning Short Sequencing Reads wit..." refers background in this paper

...The Bowtie (Langmead et al., 2009) package enables ultrafast and memory-efficient alignment of large sets of sequencing reads to a reference sequence, such as the human genome....
[...]
...Key Reference Langmead et al., 2009....
[...]

A Block-sorting Lossless Data Compression Algorithm

[...]

Michael Burrows, David Wheeler

01 Jan 1994

TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.

...read moreread less

Abstract: The charter of SRC is to advance both the state of knowledge and the state of the art in computer systems. From our establishment in 1984, we have performed basic and applied research to support Digital's business objectives. Our current work includes exploring distributed personal computing on multiple platforms, networking , programming technology, system modelling and management techniques, and selected applications. Our strategy is to test the technical and practical value of our ideas by building hardware and software prototypes and using them as daily tools. Interesting systems are too complex to be evaluated solely in the abstract; extended use allows us to investigate their properties in depth. This experience is useful in the short term in refining our designs, and invaluable in the long term in advancing our knowledge. Most of the major advances in information systems have come through this strategy, including personal computing, distributed systems, and the Internet. We also perform complementary work of a more mathematical flavor. Some of it is in established fields of theoretical computer science, such as the analysis of algorithms, computational geometry, and logics of programming. Other work explores new ground motivated by problems that arise in our systems research. We have a strong commitment to communicating our results; exposing and testing our ideas in the research and development communities leads to improved understanding. Our research report series supplements publication in professional journals and conferences. We seek users for our prototype systems among those with whom we have common interests, and we encourage collaboration with university researchers. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the Systems Research Center. All rights reserved. Authors' abstract We describe a block-sorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block of input …

...read moreread less

2,753 citations

"Aligning Short Sequencing Reads wit..." refers methods in this paper

...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994) to achieve both speed and space efficiency....
[...]
...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which in turn uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994)....
[...]

Journal Article•DOI•

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

[...]

Peter J. A. Cock¹, Christopher J. Fields², Naohisa Goto², Michael Heuer², Peter M. Rice² - Show less +1 more•Institutions (2)

University of Dundee¹, University of Illinois at Urbana–Champaign²

01 Apr 2010-Nucleic Acids Research

TL;DR: The FASTQ format is defined, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS.

...read moreread less

Abstract: FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.

...read moreread less

1,289 citations

"Aligning Short Sequencing Reads wit..." refers methods in this paper

...Reads can be in the FASTA format (see APPENDIX 1B for information on FASTA), FASTQ format (Cock et al., 2010), or in a raw one-sequence-per-line format....
[...]

Proceedings Article•DOI•

Opportunistic data structures with applications

[...]

Paolo Ferragina, Giovanni Manzini

12 Nov 2000

TL;DR: A data structure whose space occupancy is a function of the entropy of the underlying data set is devised, which achieves sublinear space and sublinear query time complexity and is shown how to plug into the Glimpse tool.

...read moreread less

Abstract: We address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because text T[1,u] is stored using O(H/sub k/(T))+o(1) bits per input symbol in the worst case, where H/sub k/(T) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P[1,p], the opportunistic data structure allows to search for the occurrences of P in T in O(p+occlog/sup /spl epsiv//u) time (for any fixed /spl epsiv/>0). If data are uncompressible we achieve the best space bound currently known (Grossi and Vitter, 2000); on compressible data our solution improves the succinct suffix array of (Grossi and Vitter, 2000) and the classical suffix tree and suffix array data structures either in space or in query time or both. We also study our opportunistic data structure in a dynamic setting and devise a variant achieving effective search and update time bounds. Finally, we show how to plug our opportunistic data structure into the Glimpse tool (Manber and Wu, 1994). The result is an indexing tool which achieves sublinear space and sublinear query time complexity.

...read moreread less

1,188 citations

"Aligning Short Sequencing Reads wit..." refers methods in this paper

...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which in turn uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994)....
[...]
...Index files can then be used by Bowtie to align reads to the reference genome....
[...]
...Indexes are compressed in the zip format....
[...]
...Indexes for commonly used reference genomes are also available for download from the Bowtie Web site at http://bowtie-bio.sf.net....
[...]
...The Bowtie index is a refinement of the FM Index (Ferragina and Manzini, 2000), which uses the Burrows-Wheeler Transform (Burrows and Wheeler, 1994) to achieve both speed and space efficiency....
[...]