scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A draft sequence of the rice genome (Oryza sativa L. ssp indica)

TL;DR: A draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp.indica, by whole-genome shotgun sequencing is produced, with a large proportion of rice genes with no recognizable homologs due to a gradient in the GC content of rice coding sequences.
Abstract: We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC-content of rice coding sequences.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
14 Dec 2000-Nature
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

8,742 citations

Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,643 citations

Journal ArticleDOI
Takashi Matsumoto1, Jianzhong Wu1, Hiroyuki Kanamori1, Yuichi Katayose1  +262 moreInstitutions (25)
11 Aug 2005-Nature
TL;DR: A map-based, finished quality sequence that covers 95% of the 389 Mb rice genome, including virtually all of the euchromatin and two complete centromeres, and finds evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes.
Abstract: Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.

3,423 citations

Journal ArticleDOI
TL;DR: Attention is drawn to the perception and signalling processes (chemical and hydraulic) of water deficits, which are essential for a holistic understanding of plant resistance to stress, which is needed to improve crop management and breeding techniques.
Abstract: In the last decade, our understanding of the processes underlying plant response to drought, at the molecular and whole-plant levels, has rapidly progressed. Here, we review that progress. We draw attention to the perception and signalling processes (chemical and hydraulic) of water deficits. Knowledge of these processes is essential for a holistic understanding of plant resistance to stress, which is needed to improve crop management and breeding techniques. Hundreds of genes that are induced under drought have been identified. A range of tools, from gene expression patterns to the use of transgenic plants, is being used to study the specific function of these genes and their role in plant acclimation or adaptation to water deficit. However, because plant responses to stress are complex, the functions of many of the genes are still unknown. Many of the traits that explain plant adaptation to drought - such as phenology, root size and depth, hydraulic conductivity and the storage of reserves - are those associated with plant development and structure, and are constitutive rather than stress induced. But a large part of plant resistance to drought is the ability to get rid of excess radiation, a concomitant stress under natural conditions. The nature of the mechanisms responsible for leaf photoprotection, especially those related to thermal dissipation, and oxidative stress are being actively researched. The new tools that operate at molecular, plant and ecosystem levels are revolutionising our understanding of plant response to drought, and our ability to monitor it. Techniques such as genome-wide tools, proteomics, stable isotopes and thermal or fluorescence imaging may allow the genotype-phenotype gap to be bridged, which is essential for faster progress in stress biology research.

3,287 citations


Cites background from "A draft sequence of the rice genome..."

  • ...…of model plant genome-sequencing projects (Arabidopsis was the first, and very recently draft sequences of the rice genome for both indica (Yu et al. 2002) and japonica (Goff et al. 2002) subspecies were published) has enabled investigations into global transcriptome responses to drought,…...

    [...]

Journal ArticleDOI
TL;DR: WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results, designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of Go annotation results.
Abstract: Unified, structured vocabularies and classifications freely provided by the Gene Ontology (GO) Consortium are widely accepted in most of the large scale gene annotation projects. Consequently, many tools have been created for use with the GO ontologies. WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results. Different from other commercial software for creating chart, WEGO is designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of GO annotation results. WEGO has been used widely in many important biological research projects, such as the rice genome project and the silkworm genome project. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO, along with the two other tools, namely External to GO Query and GO Archive Query, are freely available for all users at http://wego.genomics.org.cn. There are two available mirror sites at http://wego2.genomics.org.cn and http://wego.genomics.com.cn. Any suggestions are welcome at wego@genomics.org.cn.

2,460 citations


Cites background from "A draft sequence of the rice genome..."

  • ...WEGO has been applied in many important biological research studies, such as the comparative genomics study between the rice genome and the Arabidopsis genome ( 14 ,15) and the silkworm genome analysis (16)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations

Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations

Journal ArticleDOI
14 Dec 2000-Nature
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

8,742 citations

Journal ArticleDOI
TL;DR: In this article, a base-calling program for automated sequencer traces, phred, with improved accuracy was proposed. But it was not shown to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.
Abstract: The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.

7,627 citations

Related Papers (5)
15 Sep 2006-Science
Gerald A. Tuskan, Gerald A. Tuskan, Stephen P. DiFazio, Stephen P. DiFazio, Stefan Jansson, Joerg Bohlmann, Igor V. Grigoriev, Uffe Hellsten, Nicholas H. Putnam, Steven G. Ralph, Stephane Rombauts, Asaf Salamov, Jacquie Schein, Lieven Sterck, Andrea Aerts, Rishikeshi Bhalerao, Rishikesh P. Bhalerao, Damien Blaudez, Wout Boerjan, Annick Brun, Amy M. Brunner, Victor Busov, Malcolm M. Campbell, John E. Carlson, Michel Chalot, Jarrod Chapman, G.-L. Chen, Dawn Cooper, Pedro M. Coutinho, Jérémy Couturier, Sarah F. Covert, Quentin C. B. Cronk, R. Cunningham, John M. Davis, Sven Degroeve, Annabelle Déjardin, Claude W. dePamphilis, John C. Detter, Bill Dirks, Inna Dubchak, Inna Dubchak, Sébastien Duplessis, Jürgen Ehlting, Brian E. Ellis, Karla C Gendler, David Goodstein, Michael Gribskov, Jane Grimwood, Andrew Groover, Lee E. Gunter, Björn Hamberger, Berthold Heinze, Yrjö Helariutta, Yrjö Helariutta, Yrjö Helariutta, Bernard Henrissat, D. Holligan, Robert A. Holt, Wenyu Huang, N. Islam-Faridi, Steven J.M. Jones, M. Jones-Rhoades, Richard A. Jorgensen, Chandrashekhar P. Joshi, Jaakko Kangasjärvi, Jan Karlsson, Colin T. Kelleher, Robert Kirkpatrick, Matias Kirst, Annegret Kohler, Udaya C. Kalluri, Frank W. Larimer, Jim Leebens-Mack, Jean-Charles Leplé, Philip F. LoCascio, Y. Lou, Susan Lucas, Francis Martin, Barbara Montanini, Carolyn A. Napoli, David R. Nelson, C D Nelson, Kaisa Nieminen, Ove Nilsson, V. Pereda, Gary F. Peter, Ryan N. Philippe, Gilles Pilate, Alexander Poliakov, J. Razumovskaya, Paul G. Richardson, Cécile Rinaldi, Kermit Ritland, Pierre Rouzé, D. Ryaboy, Jeremy Schmutz, J. Schrader, Bo Segerman, H. Shin, Asim Siddiqui, Fredrik Sterky, Astrid Terry, Chung-Jui Tsai, Edward C. Uberbacher, Per Unneberg, Jorma Vahala, Kerr Wall, Susan R. Wessler, Guojun Yang, T. Yin, Carl J. Douglas, Marco A. Marra, Göran Sandberg, Y. Van de Peer, Daniel S. Rokhsar, Daniel S. Rokhsar