Home
/
Authors
/
Jon M. Sorenson

Author

Jon M. Sorenson

Other affiliations: Lawrence Berkeley National Laboratory, University of California, Berkeley, Applied Biosystems

Bio: Jon M. Sorenson is an academic researcher from Pacific Biosciences. The author has contributed to research in topics: Protein folding & Nucleic acid. The author has an hindex of 25, co-authored 41 publications receiving 7094 citations. Previous affiliations of Jon M. Sorenson include Lawrence Berkeley National Laboratory & University of California, Berkeley.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Real-Time DNA Sequencing from Single Polymerase Molecules

[...]

John Eid¹, Adrian Fehr¹, Jeremy Gray¹, Khai Luong¹, John Lyle¹, Geoff Otto¹, Paul Peluso¹, David R. Rank¹, Primo Baybayan¹, Brad Bettman¹, Arkadiusz Bibillo¹, Keith Bjornson¹, Bidhan Chaudhuri¹, Fred Christians¹, Ronald L. Cicero¹, Sonya Clark¹, Ravindra V. Dalal¹, Alex DeWinter¹, John Dixon¹, Mathieu Foquet¹, Alfred Gaertner¹, Paul Hardenbol¹, Cheryl Heiner¹, Kevin Hester¹, David P. Holden¹, Gregory J. Kearns¹, Xiangxu Kong¹, Ronald Kuse¹, Yves Lacroix¹, Steven Lin¹, Paul Lundquist¹, Congcong Ma¹, Patrick Marks¹, Mark Maxham¹, Devon Murphy¹, Insil Park¹, Thang Pham¹, Michael Phillips¹, Joy Roy¹, Robert Sebra¹, Gene Shen¹, Jon M. Sorenson¹, Austin B. Tomaney¹, Kevin Travers¹, Mark Trulson¹, John Vieceli¹, Jeffrey Wegener¹, Dawn Wu¹, Alicia Yang¹, Denis Zaccarin¹, Peter Zhao¹, Frank Zhong¹, Jonas Korlach¹, Stephen Turner¹ - Show less +50 more•Institutions (1)

Pacific Biosciences¹

02 Jan 2009-Science

TL;DR: Single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs) are presented.

...read moreread less

Abstract: We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.

...read moreread less

3,346 citations

Journal Article•DOI•

The origin of the Haitian cholera outbreak strain.

[...]

Chen-Shan Chin¹, Jon M. Sorenson¹, Jason B. Harris², William P. Robins², Richelle C. Charles², Roger R. Jean-Charles, James H. Bullard¹, Dale R. Webster¹, Andrew Kasarskis¹, Paul Peluso¹, Ellen E. Paxinos¹, Yoshiharu Yamaichi³, Stephen B. Calderwood², John J. Mekalanos², Eric E. Schadt¹, Matthew K. Waldor³, Matthew K. Waldor⁴ - Show less +13 more•Institutions (4)

Pacific Biosciences¹, Harvard University², Brigham and Women's Hospital³, Howard Hughes Medical Institute⁴

06 Jan 2011-The New England Journal of Medicine

TL;DR: The Haitian epidemic is probably the result of the introduction, through human activity, of a V. cholerae strain from a distant geographic source, and analysis of genomic variation of the Haitian isolates reveals a more distant relationship with circulating South American isolates.

...read moreread less

Abstract: Background Although cholera has been present in Latin America since 1991, it had not been epidemic in Haiti for at least 100 years. Recently, however, there has been a severe outbreak of cholera in Haiti. Methods We used third-generation single-molecule real-time DNA sequencing to determine the genome sequences of 2 clinical Vibrio cholerae isolates from the current outbreak in Haiti, 1 strain that caused cholera in Latin America in 1991, and 2 strains isolated in South Asia in 2002 and 2008. Using primary sequence data, we compared the genomes of these 5 strains and a set of previously obtained partial genomic sequences of 23 diverse strains of V. cholerae to assess the likely origin of the cholera outbreak in Haiti. Results Both single-nucleotide variations and the presence and structure of hypervariable chromosomal elements indicate that there is a close relationship between the Haitian isolates and variant V. cholerae El Tor O1 strains isolated in Bangladesh in 2002 and 2008. In contrast, analysis of ...

...read moreread less

686 citations

Journal Article•DOI•

Computational solutions to large-scale data management and analysis

[...]

Eric E. Schadt¹, Michael D. Linderman², Jon M. Sorenson¹, Lawrence Lee¹, Garry P. Nolan² - Show less +1 more•Institutions (2)

Pacific Biosciences¹, Stanford University²

01 Sep 2010-Nature Reviews Genetics

TL;DR: How to master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle the authors' big data problems is discussed.

...read moreread less

Abstract: Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems.

...read moreread less

612 citations

Journal Article•DOI•

An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge.

[...]

Catherine A. Brownstein¹, Alan H. Beggs¹, Nils Homer, Barry Merriman² +207 more•Institutions (53)

25 Mar 2014-Genome Biology

TL;DR: The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases and reveals a general convergence of practices on most elements of the analysis and interpretation process.

...read moreread less

Abstract: Background There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.

...read moreread less

429 citations

Journal Article•DOI•

What can x-ray scattering tell us about the radial distribution functions of water?

[...]

Jon M. Sorenson, Greg L. Hura, Robert M. Glaeser, Teresa Head-Gordon

09 Nov 2000-Journal of Chemical Physics

TL;DR: In this paper, an analysis of the Advanced Light Source (ALS) x-ray scattering experiment on pure liquid water at ambient temperature and pressure described in the preceding article is presented.

...read moreread less

Abstract: We present an analysis of the Advanced Light Source (ALS) x-ray scattering experiment on pure liquid water at ambient temperature and pressure described in the preceding article. The present study discusses the extraction of radial distribution functions from the x-ray scattering of molecular fluids. It is proposed that the atomic scattering factors used to model water be modified to include the changes in the intramolecular electron distribution caused by chemical bonding effects. Based on this analysis we present a gOO(r) for water consistent with our recent experimental data gathered at the ALS, which differs in some aspects from the gOO(r) reported by other x-ray and neutron scattering experiments. Our gOO(r) exhibits a taller and sharper first peak, and systematic shifts in all peak positions to smaller r. Based on experimental uncertainties, we discuss what features of gOO(r) should be reproduced by classical simulations of nonpolarizable and polarizable water models, as well as ab initio simulations of water, at ambient conditions. We directly compare many water models and simulations to the present data, and discuss possible improvements in both classical and ab initio simulation approaches in the future.

...read moreread less

402 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fast and accurate short read alignment with Burrows–Wheeler transform

[...]

Heng Li¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jul 2009-Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

...read moreread less

43,862 citations

Journal Article•DOI•

Sequencing technologies-the next generation

[...]

Michael L. Metzker¹•Institutions (1)

Baylor College of Medicine¹

01 Jan 2010-Nature Reviews Genetics

TL;DR: A technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments is presented.

...read moreread less

Abstract: Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.

...read moreread less

7,023 citations

Journal Article•DOI•

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis

[...]

Brian J. Haas¹, Alexie Papanicolaou², Moran Yassour³, Moran Yassour⁴, Manfred Grabherr⁵, Philip D. Blood⁶, Joshua C. Bowden², M. B. Couger⁷, David Eccles⁸, Bo Li⁹, Matthias Lieber¹⁰, Matthew D. MacManes¹¹, Michael Ott², Joshua Orvis, Nathalie Pochet¹², Nathalie Pochet³, Francesco Strozzi¹³, Nathan T. Weeks¹⁴, Rick Westerman¹⁵, Thomas William, Colin N. Dewey⁹, Robert Henschel¹⁶, Richard D. LeDuc¹⁶, Nir Friedman⁴, Aviv Regev³ - Show less +21 more•Institutions (16)

Broad Institute¹, Commonwealth Scientific and Industrial Research Organisation², Massachusetts Institute of Technology³, Hebrew University of Jerusalem⁴, Science for Life Laboratory⁵, Pittsburgh Supercomputing Center⁶, Oklahoma State University–Stillwater⁷, Griffith University⁸, University of Wisconsin-Madison⁹, Dresden University of Technology¹⁰, California Institute for Quantitative Biosciences¹¹, Flanders Institute for Biotechnology¹², Parco Tecnologico Padano¹³, United States Department of Agriculture¹⁴, Purdue University¹⁵, Indiana University¹⁶

01 Aug 2013-Nature Protocols

TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.

...read moreread less

Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

...read moreread less

6,369 citations

Journal Article•DOI•

Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies

[...]

Anna Klindworth¹, Elmar Pruesse², Timmy Schweer², Jörg Peplies², Christian Quast², Matthias Horn², Frank Oliver Glöckner² - Show less +3 more•Institutions (2)

Max Planck Society¹, Jacobs University Bremen²

01 Jan 2013-Nucleic Acids Research

TL;DR: The results of this study may be used as a guideline for selecting primer pairs with the best overall coverage and phylum spectrum for specific applications, therefore reducing the bias in PCR-based microbial diversity studies.

...read moreread less

Abstract: 16S ribosomal RNA gene (rDNA) amplicon analysis remains the standard approach for the cultivation-independent investigation of microbial diversity. The accuracy of these analyses depends strongly on the choice of primers. The overall coverage and phylum spectrum of 175 primers and 512 primer pairs were evaluated in silico with respect to the SILVA 16S/18S rDNA non-redundant reference dataset (SSURef 108 NR). Based on this evaluation a selection of 'best available' primer pairs for Bacteria and Archaea for three amplicon size classes (100-400, 400-1000, ≥ 1000 bp) is provided. The most promising bacterial primer pair (S-D-Bact-0341-b-S-17/S-D-Bact-0785-a-A-21), with an amplicon size of 464 bp, was experimentally evaluated by comparing the taxonomic distribution of the 16S rDNA amplicons with 16S rDNA fragments from directly sequenced metagenomes. The results of this study may be used as a guideline for selecting primer pairs with the best overall coverage and phylum spectrum for specific applications, therefore reducing the bias in PCR-based microbial diversity studies.

...read moreread less

5,346 citations

Journal Article•DOI•

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

[...]

Sergey Koren¹, Brian P. Walenz¹, Konstantin Berlin², Jason R. Miller³, Nicholas H. Bergman, Adam M. Phillippy¹ - Show less +2 more•Institutions (3)

National Institutes of Health¹, Invincea², J. Craig Venter Institute³

15 Mar 2017-Genome Research

TL;DR: Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences, is presented, demonstrating that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences or Oxford Nanopore technologies.

...read moreread less

Abstract: Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

...read moreread less

4,806 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse