Home
/
Authors
/
Andrey D. Prjibelski

Author

Andrey D. Prjibelski

Other affiliations: University of California, San Diego, Saint Petersburg Academic University

Bio: Andrey D. Prjibelski is an academic researcher from Saint Petersburg State University. The author has contributed to research in topics: Sequence assembly & Medicine. The author has an hindex of 12, co-authored 24 publications receiving 14154 citations. Previous affiliations of Andrey D. Prjibelski include University of California, San Diego & Saint Petersburg Academic University.

Topics: Sequence assembly, Medicine, Biology, Nanopore sequencing, Computational biology ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

[...]

Anton Bankevich¹, Sergey Nurk, Dmitry Antipov, Alexey Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, Pavel A. Pevzner - Show less +12 more•Institutions (1)

Saint Petersburg Academic University¹

07 May 2012-Journal of Computational Biology

TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

...read moreread less

16,859 citations

Journal Article•DOI•

Assembling Single-Cell Genomes and Mini-Metagenomes From Chimeric MDA Products

[...]

Sergey Nurk¹, Anton Bankevich, Dmitry Antipov, Alexey Gurevich, Anton Korobeynikov, Alla Lapidus, Andrey D. Prjibelski, Alex Pyshkin, Alexander Sirotkin, Yakov Sirotkin, Ramunas Stepanauskas, Scott Clingenpeel, Tanja Woyke, Jeffrey S. McLean, Roger S. Lasken, Glenn Tesler, Max A. Alekseyev, Pavel A. Pevzner - Show less +14 more•Institutions (1)

Saint Petersburg Academic University¹

04 Oct 2013-Journal of Computational Biology

TL;DR: Applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells and further sequences all genomes from the mini-metagenome at once.

...read moreread less

Abstract: Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing “microbial dark matter” that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently deve...

...read moreread less

1,067 citations

Journal Article•DOI•

Using SPAdes De Novo Assembler

[...]

Andrey D. Prjibelski¹, Dmitry Antipov¹, Dmitry Meleshko¹, Alla Lapidus¹, Anton Korobeynikov¹ - Show less +1 more•Institutions (1)

Saint Petersburg State University¹

01 Jun 2020-Current protocols in human genetics

TL;DR: Protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole‐genome sequencing and metagenomic datasets are presented.

...read moreread less

Abstract: SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.

...read moreread less

663 citations

Journal Article•DOI•

Versatile genome assembly evaluation with QUAST-LG.

[...]

Alla Mikheenko¹, Andrey D. Prjibelski¹, Vladislav Saveliev¹, Dmitry Antipov¹, Alexey Gurevich¹ - Show less +1 more•Institutions (1)

Saint Petersburg State University¹

01 Jul 2018-Bioinformatics

TL;DR: This manuscript demonstrates performance of the state‐of‐the‐art genome assembly software on six eukaryotic datasets sequenced using different technologies and introduces a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness.

...read moreread less

Abstract: Motivation The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference. Availability and implementation http://cab.spbu.ru/software/quast-lg. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

562 citations

Journal Article•DOI•

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data.

[...]

Elena Bushmanova¹, Dmitry Antipov¹, Alla Lapidus¹, Andrey D. Prjibelski¹•Institutions (1)

Saint Petersburg State University¹

01 Sep 2019-GigaScience

TL;DR: The novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler, typically outperforms other assemblers by such important property as the number of assembled genes and isoforms and at the same time has higher accuracy statistics on average comparing to the closest competitors.

...read moreread less

Abstract: Background The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. Results Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. Conclusions Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.

...read moreread less

297 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

QUAST: quality assessment tool for genome assemblies

[...]

Alexey Gurevich¹, Vladislav Saveliev¹, Nikolay Vyahhi¹, Glenn Tesler¹•Institutions (1)

University of California, San Diego¹

15 Apr 2013-Bioinformatics

TL;DR: This tool improves on leading assembly comparison software with new ideas and quality metrics, and can evaluate assemblies both with a reference genome, as well as without a reference.

...read moreread less

Abstract: Summary: Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST—a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website.

...read moreread less

5,757 citations

Journal Article•DOI•

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

[...]

Dinghua Li¹, Chi-Man Liu¹, Ruibang Luo¹, Kunihiko Sadakane¹, Tak-Wah Lam¹ - Show less +1 more•Institutions (1)

National Institute of Informatics¹

15 May 2015-Bioinformatics

TL;DR: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner and generated a three-time larger assembly, with longer contig N50 and average contig length.

...read moreread less

Abstract: Summary: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., no pre-processing like partitioning and normalization was needed. When compared with previous methods (Chikhi and Rizk, 2012; Howe, et al., 2014) on assembling the soil data, MEGAHIT generated a 3-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a 4-fold improvement . Availability: The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license. Contact: rb@l3-bioinfo.com, twlam@cs.hku.hk

...read moreread less

3,634 citations

Posted Content•

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

[...]

Dinghua Li¹, Chi-Man Liu¹, Ruibang Luo¹, Kunihiko Sadakane¹, Tak-Wah Lam¹ - Show less +1 more•Institutions (1)

National Institute of Informatics¹

25 Sep 2014-arXiv: Genomics

TL;DR: MEGAHIT as mentioned in this paper is a NGS de novo assembler for assembling large and complex metagenomics data in a time and cost-efficient manner, which avoids preprocessing like partitioning and normalization, which might compromise on result integrity.

...read moreread less

Abstract: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at this https URL under GPLv3 license.

...read moreread less

2,673 citations

Journal Article•DOI•

MetaSPAdes: A new versatile metagenomic assembler

[...]

Sergey Nurk¹, Dmitry Meleshko¹, Anton Korobeynikov¹, Pavel A. Pevzner¹, Pavel A. Pevzner² - Show less +1 more•Institutions (2)

Saint Petersburg State University¹, University of California, San Diego²

01 May 2017-Genome Research

TL;DR: MetaSPAdes as mentioned in this paper addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes.

...read moreread less

Abstract: While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.

...read moreread less

2,295 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse