scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

14 Dec 2000-Nature (Nature Publishing Group)-Vol. 408, Iss: 6814, pp 796-815
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
16 Jan 2003-Nature
TL;DR: It is found that genes of similar functions are clustered in distinct, multi-megabase regions of individual chromosomes; genes in these regions tend to share transcriptional profiles.
Abstract: A principal challenge currently facing biologists is how to connect the complete DNA sequence of an organism to its development and behaviour. Large-scale targeted-deletions have been successful in defining gene functions in the single-celled yeast Saccharomyces cerevisiae, but comparable analyses have yet to be performed in an animal. Here we describe the use of RNA interference to inhibit the function of ∼86% of the 19,427 predicted genes of C. elegans. We identified mutant phenotypes for 1,722 genes, about two-thirds of which were not previously associated with a phenotype. We find that genes of similar functions are clustered in distinct, multi-megabase regions of individual chromosomes; genes in these regions tend to share transcriptional profiles. Our resulting data set and reusable RNAi library of 16,757 bacterial clones will facilitate systematic analyses of the connections among gene sequence, chromosomal location and gene function in C. elegans.

3,529 citations

Journal ArticleDOI
Takashi Matsumoto1, Jianzhong Wu1, Hiroyuki Kanamori1, Yuichi Katayose1  +262 moreInstitutions (25)
11 Aug 2005-Nature
TL;DR: A map-based, finished quality sequence that covers 95% of the 389 Mb rice genome, including virtually all of the euchromatin and two complete centromeres, and finds evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes.
Abstract: Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.

3,423 citations

Journal ArticleDOI
26 Aug 2007-Nature
TL;DR: A high-quality draft of the genome sequence of grapevine is obtained from a highly homozygous genotype, revealing the contribution of three ancestral genomes to the grapevine haploid content and explaining the chronology of previously described whole-genome duplication events in the evolution of flowering plants.
Abstract: The analysis of the first plant genomes provided unexpected evidence for genome duplication events in species that had previously been considered as true diploids on the basis of their genetics. These polyploidization events may have had important consequences in plant evolution, in particular for species radiation and adaptation and for the modulation of functional capacities. Here we report a high-quality draft of the genome sequence of grapevine (Vitis vinifera) obtained from a highly homozygous genotype. The draft sequence of the grapevine genome is the fourth one produced so far for flowering plants, the second for a woody species and the first for a fruit crop (cultivated for both fruit and beverage). Grapevine was selected because of its important place in the cultural heritage of humanity beginning during the Neolithic period. Several large expansions of gene families with roles in aromatic features are observed. The grapevine genome has not undergone recent genome duplication, thus enabling the discovery of ancestral traits and features of the genetic organization of flowering plants. This analysis reveals the contribution of three ancestral genomes to the grapevine haploid content. This ancestral arrangement is common to many dicotyledonous plants but is absent from the genome of rice, which is a monocotyledon. Furthermore, we explain the chronology of previously described whole-genome duplication events in the evolution of flowering plants.

3,311 citations

Journal ArticleDOI
TL;DR: Widespread changes in the expression of genes encoding receptor kinases, transcription factors, components of signalling pathways, proteins involved in post-translational modification and turnover, and proteins involved with the synthesis and sensing of cytokinins, abscisic acid and ethylene revealing large-scale rewiring of the regulatory network is an early response to sugar depletion are revealed.
Abstract: MAPMAN is a user-driven tool that displays large data sets onto diagrams of metabolic pathways or other processes. SCAVENGER modules assign the measured parameters to hierarchical categories (formed 'BINs', 'subBINs'). A first build of TRANSCRIPTSCAVENGER groups genes on the Arabidopsis Affymetrix 22K array into >200 hierarchical categories, providing a breakdown of central metabolism (for several pathways, down to the single enzyme level), and an overview of secondary metabolism and cellular processes. METABOLITESCAVENGER groups hundreds of metabolites into pathways or groups of structurally related compounds. An IMAGEANNOTATOR module uses these groupings to organise and display experimental data sets onto diagrams of the users' choice. A modular structure allows users to edit existing categories, add new categories and develop SCAVENGER modules for other sorts of data. MAPMAN is used to analyse two sets of 22K Affymetrix arrays that investigate the response of Arabidopsis rosettes to low sugar: one investigates the response to a 6-h extension of the night, and the other compares wild-type Columbia-0 (Col-0) and the starchless pgm mutant (plastid phosphoglucomutase) at the end of the night. There were qualitatively similar responses in both treatments. Many genes involved in photosynthesis, nutrient acquisition, amino acid, nucleotide, lipid and cell wall synthesis, cell wall modification, and RNA and protein synthesis were repressed. Many genes assigned to amino acid, nucleotide, lipid and cell wall breakdown were induced. Changed expression of genes for trehalose metabolism point to a role for trehalose-6-phosphate (Tre6P) as a starvation signal. Widespread changes in the expression of genes encoding receptor kinases, transcription factors, components of signalling pathways, proteins involved in post-translational modification and turnover, and proteins involved in the synthesis and sensing of cytokinins, abscisic acid (ABA) and ethylene revealing large-scale rewiring of the regulatory network is an early response to sugar depletion.

3,067 citations

Journal ArticleDOI
TL;DR: This work suggests that equally important in a wide range of conditions are processes involving the management of Na(+) movements within the plant, and requires more knowledge of cell-specific transport processes and the consequences of manipulation of transporters and signalling elements in specific cell types.

2,998 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: A program is described, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases.
Abstract: We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.

9,629 citations

Journal ArticleDOI
05 Sep 1997-Science
TL;DR: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented and reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident.
Abstract: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer.

7,723 citations

Journal ArticleDOI
TL;DR: This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references.

6,603 citations

Journal ArticleDOI
24 Mar 2000-Science
TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

6,180 citations


"Analysis of the genome sequence of ..." refers background or methods in this paper

  • ...Gene ®nding involved three steps: (1) analysis of BAC sequences using a computational gene ®nder; (2) alignment of the sequence to the protein and EST databases; (3) assignment of functions to each of the genes....

    [...]

  • ...The Arabidopsis genome has a wealth of class I (2,109) and II (2,203) elements, including several new groups (1,209 elements; Supplementary Information Table 4)....

    [...]

Related Papers (5)