The impact of third generation genomic technologies on plant genome assembly.

doi:10.1016/J.PBI.2017.02.002

Home
/
Papers
/
The impact of third generation genomic technologies on plant genome assembly.

Journal Article•DOI•

The impact of third generation genomic technologies on plant genome assembly.

Wen-Biao Jiao¹, Korbinian Schneeberger¹•Institutions (1)

Max Planck Society¹

01 Apr 2017-Current Opinion in Plant Biology (Elsevier Current Trends)-Vol. 36, pp 64-70

TL;DR: Since the introduction of next generation sequencing, plant genome assembly projects do not need to rely on dedicated research facilities or community-wide consortia anymore, even individual research groups can sequence and assemble the genomes they are interested in.

read less

About: This article is published in Current Opinion in Plant Biology.The article was published on 2017-04-01 and is currently open access. It has received 170 citations till now. The article focuses on the topics: Hybrid genome assembly & Sequence assembly.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps

[...]

Caroline Belser¹, Benjamin Istace¹, Erwan Denis¹, Marion Dubarry¹, Franc-Christophe Baurens², Cyril Falentin³, Mathieu Genete⁴, Wahiba Berrabah¹, Anne-Marie Chèvre³, Régine Delourme³, Gwenaëlle Deniot³, Philippe Duffé³, Stefan Engelen¹, Arnaud Lemainque¹, Maria J. Manzanares-Dauleux³, Guillaume Martin², Jérôme Morice³, Benjamin Noel¹, Xavier Vekemans⁴, Angélique D'Hont², Mathieu Rousseau-Gueutin³, Valérie Barbe¹, Corinne Cruaud¹, Patrick Wincker⁵, Jean-Marc Aury¹ - Show less +21 more•Institutions (5)

French Alternative Energies and Atomic Energy Commission¹, University of Montpellier², University of Rennes³, university of lille⁴, Université Paris-Saclay⁵

01 Nov 2018-Nature plants

TL;DR: A strategy based on long reads (MinION or PromethION sequencers) and optical maps (Saphyr system) that can produce chromosome-level assemblies is described and demonstrated applicability by generating high-quality genome sequences for two new dicotyledon morphotypes.

...read moreread less

Abstract: Plant genomes are often characterized by a high level of repetitiveness and polyploid nature. Consequently, creating genome assemblies for plant genomes is challenging. The introduction of short-read technologies 10 years ago substantially increased the number of available plant genomes. Generally, these assemblies are incomplete and fragmented, and only a few are at the chromosome scale. Recently, Pacific Biosciences and Oxford Nanopore sequencing technologies were commercialized that can sequence long DNA fragments (kilobases to megabase) and, using efficient algorithms, provide high-quality assemblies in terms of contiguity and completeness of repetitive regions1–4. However, even though genome assemblies based on long reads exhibit high contig N50s (>1 Mb), these methods are still insufficient to decipher genome organization at the chromosome level. Here, we describe a strategy based on long reads (MinION or PromethION sequencers) and optical maps (Saphyr system) that can produce chromosome-level assemblies and demonstrate applicability by generating high-quality genome sequences for two new dicotyledon morphotypes, Brassica rapa Z1 (yellow sarson) and Brassica oleracea HDEM (broccoli), and one new monocotyledon, Musa schizocarpa (banana). All three assemblies show contig N50s of >5 Mb and contain scaffolds that represent entire chromosomes or chromosome arms. Assembling genomes to chromosome scale remains a challenge. Now, a study reports a strategy based on nanopore long reads and optical maps and uses it to produce high-quality chromosome-scale assemblies for the genomes of yellow sarson, broccoli and banana.

...read moreread less

276 citations

Journal Article•DOI•

RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms.

[...]

Zhaodong Hao¹, Zhaodong Hao², Dekang Lv³, Ying Ge³, Jisen Shi², Dolf Weijers¹, Guangchuang Yu⁴, Jinhui Chen² - Show less +4 more•Institutions (4)

Wageningen University and Research Centre¹, Nanjing Forestry University², Dalian Medical University³, Southern Medical University⁴

20 Jan 2020-PeerJ

TL;DR: The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.

...read moreread less

Abstract: Background Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, genomic synteny, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly, or have limited application scenarios. As more and more non-model species are sequenced with chromosome-level assembly being available, tools that can generate idiograms for a broad range of species and be capable of visualizing more data types are needed to help better understanding fundamental genome characteristics. Results The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.

...read moreread less

197 citations

Journal Article•DOI•

SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies.

[...]

Manish Goel¹, Han Sun¹, Wen-Biao Jiao¹, Korbinian Schneeberger², Korbinian Schneeberger¹ - Show less +1 more•Institutions (2)

Max Planck Society¹, Ludwig Maximilian University of Munich²

16 Dec 2019-Genome Biology

TL;DR: SyRI is presented, a pairwise whole-genome comparison tool for chromosome-level assemblies that starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearrange regions.

...read moreread less

Abstract: Genomic differences range from single nucleotide differences to complex structural variations. Current methods typically annotate sequence differences ranging from SNPs to large indels accurately but do not unravel the full complexity of structural rearrangements, including inversions, translocations, and duplications, where highly similar sequence changes in location, orientation, or copy number. Here, we present SyRI, a pairwise whole-genome comparison tool for chromosome-level assemblies. SyRI starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearranged regions. This distinction is important as rearranged regions are inherited differently compared to syntenic regions.

...read moreread less

188 citations

Cites background from "The impact of third generation geno..."

...However, despite recent technological improvements to simplify the generation of whole-genome de novo assemblies [8], there are so far only a few tools which use whole-genome assemblies as the basis for the identification of genomic differences [9]....
[...]

Journal Article•DOI•

De Novo Assembly of a New Solanum pennellii Accession Using Nanopore Sequencing.

[...]

Maximilian Schmidt¹, Alexander Vogel¹, Alisandra K. Denton¹, Benjamin Istace², Alexandra Wormit¹, Henri van de Geest, Marie E. Bolger³, Saleh Alseekh⁴, Janina Maß³, Christian Pfaff³, Ulrich Schurr³, Roger T. Chetelat⁵, Florian Maumus⁶, Jean-Marc Aury², Sergey Koren⁷, Alisdair R. Fernie⁴, Daniel Zamir⁸, Anthony Bolger¹, Bjoern Usadel³, Bjoern Usadel¹ - Show less +16 more•Institutions (8)

RWTH Aachen University¹, Commissariat à l'énergie atomique et aux énergies alternatives², Forschungszentrum Jülich³, Max Planck Society⁴, University of California, Davis⁵, Université Paris-Saclay⁶, National Institutes of Health⁷, Hebrew University of Jerusalem⁸

01 Oct 2017-The Plant Cell

TL;DR: The generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species Solanum pennellii indicates that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.

...read moreread less

Abstract: Updates in nanopore technology have made it possible to obtain gigabases of sequence data. Prior to this, nanopore sequencing technology was mainly used to analyze microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species Solanum pennellii. We describe the assembly of its genome to a contig N50 of 2.5 MB. The assembly pipeline comprised initial read correction with Canu and assembly with SMARTdenovo. The resulting raw nanopore-based de novo genome is structurally highly similar to that of the reference S. pennellii LA716 accession but has a high error rate and was rich in homopolymer deletions. After polishing the assembly with Illumina reads, we obtained an error rate of <0.02% when assessed versus the same Illumina data. We obtained a gene completeness of 96.53%, slightly surpassing that of the reference S. pennellii. Taken together, our data indicate that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.

...read moreread less

179 citations

Journal Article•DOI•

Plant Phenotyping: Past, Present, and Future

[...]

Roland Pieruschka¹, U. Schurr¹•Institutions (1)

Forschungszentrum Jülich¹

26 Mar 2019

TL;DR: The technological advancement that laid the foundation for the development of phenotyping centers is sketched and the upcoming challenges for further advancement of plant phenotypes specifically with respect to standardization of data acquisition and reusability are evaluated.

...read moreread less

Abstract: A plant develops the dynamic phenotypes from the interaction of the plant with the environment. Understanding these processes that span plant's lifetime in a permanently changing environment is essential for the advancement of basic plant science and its translation into application including breeding and crop management. The plant research community was thus confronted with the need to accurately measure diverse traits of an increasingly large number of plants to help plants to adapt to resource-limiting environment and low-input agriculture. In this overview, we outline the development of plant phenotyping as a multidisciplinary field. We sketch the technological advancement that laid the foundation for the development of phenotyping centers and evaluate the upcoming challenges for further advancement of plant phenotyping specifically with respect to standardization of data acquisition and reusability. Finally, we describe the development of the plant phenotyping community as an essential step to integrate the community and effectively use the emerging synergies.

...read moreread less

170 citations

Cites background from "The impact of third generation geno..."

...The sequencing technology combined with data analysis of genome has revolutionized our understanding of biology in the last two decades [2, 3] and—in many cases today—may allow us to predict the performance based on the genetic constitution of an organism [4]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

[...]

Arabidopsis Genome Initiative¹•Institutions (1)

J. Craig Venter Institute¹

14 Dec 2000-Nature

TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

...read moreread less

8,742 citations

Journal Article•DOI•

Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

[...]

Erez Lieberman Aiden¹, Nynke L. van Berkum², Louise Williams¹, Maxim Imakaev¹, Tobias Ragoczy³, Tobias Ragoczy⁴, Agnes Telling⁴, Agnes Telling³, Ido Amit¹, Bryan R. Lajoie², Peter J. Sabo⁴, Michael O. Dorschner⁴, Richard Sandstrom⁴, Bradley E. Bernstein¹, Bradley E. Bernstein⁵, Michaël Bender⁴, Mark Groudine³, Mark Groudine⁴, Andreas Gnirke¹, John A. Stamatoyannopoulos⁴, Leonid A. Mirny¹, Eric S. Lander¹, Eric S. Lander⁵, Job Dekker² - Show less +20 more•Institutions (5)

Massachusetts Institute of Technology¹, University of Massachusetts Medical School², Fred Hutchinson Cancer Research Center³, University of Washington⁴, Harvard University⁵

09 Oct 2009-Science

TL;DR: Hi-C is described, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing and demonstrates the power of Hi-C to map the dynamic conformations of entire genomes.

...read moreread less

Abstract: We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.

...read moreread less

7,180 citations

"The impact of third generation geno..." refers methods in this paper

...Another elegant solution to the challenges of chromosome-scale assembly is based on chromosome conformation capture sequencing (Hi-C), a method originally developed to study the three-dimensional folding of the genome [54]....
[...]

Journal Article•DOI•

The B73 Maize Genome: Complexity, Diversity, and Dynamics

[...]

Patrick S. Schnable¹, Doreen Ware², Robert S. Fulton³, Joshua C. Stein² +156 more•Institutions (18)

20 Nov 2009-Science

TL;DR: The sequence of the maize genome reveals it to be the most complex genome known to date and the correlation of methylation-poor regions with Mu transposon insertions and recombination and how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state is reported.

...read moreread less

Abstract: We report an improved draft nucleotide sequence of the 2.3-gigabase genome of maize, an important crop plant and model for biological research. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome. These were responsible for the capture and amplification of numerous gene fragments and affect the composition, sizes, and positions of centromeres. We also report on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and/or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state. These analyses inform and set the stage for further investigations to improve our understanding of the domestication and agricultural improvements of maize.

...read moreread less

3,761 citations

Journal Article•DOI•

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

[...]

Chen-Shan Chin¹, David Alexander¹, Patrick Marks¹, Aaron Klammer¹, James P Drake¹, Cheryl Heiner¹, Alicia Clum², Alex Copeland², John Huddleston³, Evan E. Eichler³, Stephen Turner¹, Jonas Korlach¹ - Show less +8 more•Institutions (3)

Pacific Biosciences¹, Joint Genome Institute², University of Washington³

01 Jun 2013-Nature Methods

TL;DR: This work presents a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing.

...read moreread less

Abstract: We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

...read moreread less

3,647 citations

"The impact of third generation geno..." refers background in this paper

...Although raw reads can have sequencing error rates of up to 15%, correction with short sequencing reads [23,24] or selfcorrection with sufficient sequencing data [25] enables genome assemblies with a sequence accuracy of over 99....
[...]

Journal Article•DOI•

The map-based sequence of the rice genome

[...]

Takashi Matsumoto¹, Jianzhong Wu¹, Hiroyuki Kanamori¹, Yuichi Katayose¹ +262 more•Institutions (25)

11 Aug 2005-Nature

TL;DR: A map-based, finished quality sequence that covers 95% of the 389 Mb rice genome, including virtually all of the euchromatin and two complete centromeres, and finds evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes.

...read moreread less

Abstract: Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.

...read moreread less

3,423 citations