scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The genome of the cucumber, Cucumis sativus L.

01 Dec 2009-Nature Genetics (Nature Publishing Group)-Vol. 41, Iss: 12, pp 1275-1281
TL;DR: This study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo, and identifies 686 gene clusters related to phloem function.
Abstract: Cucumber is an economically important crop as well as a model system for sex determination studies and plant vascular biology. Here we report the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequencing technologies to obtain 72.2-fold genome coverage. The absence of recent whole-genome duplication, along with the presence of few tandem duplications, explains the small number of genes in the cucumber. Our study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo. The sequenced cucumber genome affords insight into traits such as its sex expression, disease resistance, biosynthesis of cucurbitacin and 'fresh green' odor. We also identify 686 gene clusters related to phloem function. The cucumber genome provides a valuable resource for developing elite cultivars and for studying the evolution and function of the plant vascular system.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Abstract: Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

2,760 citations


Cites methods from "The genome of the cucumber, Cucumis..."

  • ...Currently, using the Illumina GA sequencing technology and our short read assembler presented here, we have sequenced and assembled nearly a dozen plant and animal genomes, including the panda (Li et al. 2009d), duck, potato, cucumber (Huang et al. 2009), watermelon, and others....

    [...]

  • ...2009d), duck, potato, cucumber (Huang et al. 2009), watermelon, and others....

    [...]

Journal ArticleDOI
TL;DR: The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR.
Abstract: We present an in silico approach for the reconstruction of complete mitochondrial genomes of nonmodel organisms directly from next-generation sequencing (NGS) data—mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling (Thymallus thymallus) and Rainbow trout (Oncorhynchus mykiss). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in <24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/ chrishah/MITObim.

1,604 citations


Cites methods from "The genome of the cucumber, Cucumis..."

  • ...The parasites’ nuclear genome size was estimated based on the k-mer frequency distributions of the entire data sets, respectively, as demonstrated in previous studies (44,45)....

    [...]

Journal ArticleDOI
TL;DR: A new generation of single-molecule sequencing technologies (third-generation sequencing) that is emerging to fill this space, with the potential for dramatically longer read lengths, shorter time to result and lower overall cost.
Abstract: First- and second-generation sequencing technologies have led the way in revolutionizing the field of genomics and beyond, motivating an astonishing number of scientific advances, including enabling a more complete understanding of whole genome sequences and the information encoded therein, a more complete characterization of the methylome and transcriptome and a better understanding of interactions between proteins and DNA. Nevertheless, there are sequencing applications and aspects of genome biology that are presently beyond the reach of current sequencing technologies, leaving fertile ground for additional innovation in this space. In this review, we describe a new generation of single-molecule sequencing technologies (third-generation sequencing) that is emerging to fill this space, with the potential for dramatically longer read lengths, shorter time to result and lower overall cost.

882 citations

Journal ArticleDOI
TL;DR: Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis.
Abstract: Oranges are an important nutritional source for human health and have immense economic value Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis) The assembled sequence covers 873% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements We predicted 29,445 protein-coding genes, half of which are in the heterozygous state With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future

801 citations

Journal ArticleDOI
TL;DR: The genome size of the hot pepper was approximately fourfold larger than that of its close relative tomato, and the genome showed an accumulation of Gypsy and Caulimoviridae family elements.
Abstract: Doil Choi and colleagues report the genome sequence of the hot pepper, Capsicum annuum, as well as the resequencing of two cultivated peppers and a wild species, Capsicum chinense. Comparative genomic analysis across Solanaceae provides insights into genome expansion, pungency, ripening and disease resistance in hot peppers.

780 citations

References
More filters
Journal ArticleDOI
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

37,524 citations


"The genome of the cucumber, Cucumis..." refers methods in this paper

  • ...After the identification of syntenic blocks, the pairwise protein alignments for each gene pair were first constructed with MUSCL...

    [...]

Journal ArticleDOI
TL;DR: A program is described, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases.
Abstract: We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.

9,629 citations

Journal ArticleDOI
14 Dec 2000-Nature
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

8,742 citations

Journal ArticleDOI
TL;DR: A new statistical method for estimating divergence dates of species from DNA sequence data by a molecular clock approach is developed, and this dating may pose a problem for the widely believed hypothesis that the bipedal creatureAustralopithecus afarensis, which lived some 3.7 million years ago, was ancestral to man and evolved after the human-ape splitting.
Abstract: A new statistical method for estimating divergence dates of species from DNA sequence data by a molecular clock approach is developed. This method takes into account effectively the information contained in a set of DNA sequence data. The molecular clock of mitochondrial DNA (mtDNA) was calibrated by setting the date of divergence between primates and ungulates at the Cretaceous-Tertiary boundary (65 million years ago), when the extinction of dinosaurs occurred. A generalized least-squares method was applied in fitting a model to mtDNA sequence data, and the clock gave dates of 92.3 +/- 11.7, 13.3 +/- 1.5, 10.9 +/- 1.2, 3.7 +/- 0.6, and 2.7 +/- 0.6 million years ago (where the second of each pair of numbers is the standard deviation) for the separation of mouse, gibbon, orangutan, gorilla, and chimpanzee, respectively, from the line leading to humans. Although there is some uncertainty in the clock, this dating may pose a problem for the widely believed hypothesis that the pipedal creature Australopithecus afarensis, which lived some 3.7 million years ago at Laetoli in Tanzania and at Hadar in Ethiopia, was ancestral to man and evolved after the human-ape splitting. Another likelier possibility is that mtDNA was transferred through hybridization between a proto-human and a proto-chimpanzee after the former had developed bipedalism.

8,124 citations


"The genome of the cucumber, Cucumis..." refers methods in this paper

  • ...4DTv was then calculated on concatenated nucleotide alignments with HKY substitution model...

    [...]

Journal ArticleDOI
TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Abstract: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at http://www.tigr.org/software/mummer.

4,886 citations


"The genome of the cucumber, Cucumis..." refers methods in this paper

  • ...Cucumber genome sequences were aligned with melon BAC sequences using NUCmer, a program in the MUMmer packag...

    [...]

Related Papers (5)
15 Sep 2006-Science
Gerald A. Tuskan, Gerald A. Tuskan, Stephen P. DiFazio, Stephen P. DiFazio, Stefan Jansson, Joerg Bohlmann, Igor V. Grigoriev, Uffe Hellsten, Nicholas H. Putnam, Steven G. Ralph, Stephane Rombauts, Asaf Salamov, Jacquie Schein, Lieven Sterck, Andrea Aerts, Rishikeshi Bhalerao, Rishikesh P. Bhalerao, Damien Blaudez, Wout Boerjan, Annick Brun, Amy M. Brunner, Victor Busov, Malcolm M. Campbell, John E. Carlson, Michel Chalot, Jarrod Chapman, G.-L. Chen, Dawn Cooper, Pedro M. Coutinho, Jérémy Couturier, Sarah F. Covert, Quentin C. B. Cronk, R. Cunningham, John M. Davis, Sven Degroeve, Annabelle Déjardin, Claude W. dePamphilis, John C. Detter, Bill Dirks, Inna Dubchak, Inna Dubchak, Sébastien Duplessis, Jürgen Ehlting, Brian E. Ellis, Karla C Gendler, David Goodstein, Michael Gribskov, Jane Grimwood, Andrew Groover, Lee E. Gunter, Björn Hamberger, Berthold Heinze, Yrjö Helariutta, Yrjö Helariutta, Yrjö Helariutta, Bernard Henrissat, D. Holligan, Robert A. Holt, Wenyu Huang, N. Islam-Faridi, Steven J.M. Jones, M. Jones-Rhoades, Richard A. Jorgensen, Chandrashekhar P. Joshi, Jaakko Kangasjärvi, Jan Karlsson, Colin T. Kelleher, Robert Kirkpatrick, Matias Kirst, Annegret Kohler, Udaya C. Kalluri, Frank W. Larimer, Jim Leebens-Mack, Jean-Charles Leplé, Philip F. LoCascio, Y. Lou, Susan Lucas, Francis Martin, Barbara Montanini, Carolyn A. Napoli, David R. Nelson, C D Nelson, Kaisa Nieminen, Ove Nilsson, V. Pereda, Gary F. Peter, Ryan N. Philippe, Gilles Pilate, Alexander Poliakov, J. Razumovskaya, Paul G. Richardson, Cécile Rinaldi, Kermit Ritland, Pierre Rouzé, D. Ryaboy, Jeremy Schmutz, J. Schrader, Bo Segerman, H. Shin, Asim Siddiqui, Fredrik Sterky, Astrid Terry, Chung-Jui Tsai, Edward C. Uberbacher, Per Unneberg, Jorma Vahala, Kerr Wall, Susan R. Wessler, Guojun Yang, T. Yin, Carl J. Douglas, Marco A. Marra, Göran Sandberg, Y. Van de Peer, Daniel S. Rokhsar, Daniel S. Rokhsar