scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

14 Dec 2000-Nature (Nature Publishing Group)-Vol. 408, Iss: 6814, pp 796-815
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: GMAP, a standalone program for mapping and aligning cDNA sequences to a genome with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets, demonstrates a several-fold increase in speed over existing programs.
Abstract: Motivation: We introduce gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. Results: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, gmap identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, gmap provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, gmap performed comparably with GeneSeqer. In these experiments, gmap demonstrated a several-fold increase in speed over existing programs. Availability: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap Contact: [email protected] Supplementary information: http://www.gene.com/share/gmap

2,058 citations

Journal ArticleDOI
TL;DR: Abscisic acid regulates many agronomically important aspects of plant development, including the synthesis of seed storage proteins and lipids, the promotion of seed desiccation tolerance and dormancy, and the inhibition of the phase transitions from embryonic to germinative growth and from.
Abstract: Abscisic acid (ABA) regulates many agronomically important aspects of plant development, including the synthesis of seed storage proteins and lipids, the promotion of seed desiccation tolerance and dormancy, and the inhibition of the phase transitions from embryonic to germinative growth and from

2,039 citations

Journal ArticleDOI
TL;DR: Detailed molecular characterization of individual gene families, computational analysis of genomic sequences and population genetic modeling can all be used to help uncover the mechanisms behind the evolution by gene duplication.
Abstract: The importance of gene duplication in supplying raw genetic material to biological evolution has been recognized since the 1930s. Recent genomic sequence data provide substantial evidence for the abundance of duplicated genes in all organisms surveyed. But how do newly duplicated genes survive and acquire novel functions, and what role does gene duplication play in the evolution of genomes and organisms? Detailed molecular characterization of individual gene families, computational analysis of genomic sequences and population genetic modeling can all be used to help us uncover the mechanisms behind the evolution by gene duplication.

2,030 citations

Journal ArticleDOI
TL;DR: New insights have been gained into how silencing in eukaryotic cells has been co-opted to serve essential functions in 'host' cells, highlighting the importance of TEs in the epigenetic regulation of the genome.
Abstract: Overlapping epigenetic mechanisms have evolved in eukaryotic cells to silence the expression and mobility of transposable elements (TEs). Owing to their ability to recruit the silencing machinery, TEs have served as building blocks for epigenetic phenomena, both at the level of single genes and across larger chromosomal regions. Important progress has been made recently in understanding these silencing mechanisms. In addition, new insights have been gained into how this silencing has been co-opted to serve essential functions in 'host' cells, highlighting the importance of TEs in the epigenetic regulation of the genome.

1,823 citations

Journal ArticleDOI
Xiaowu Wang1, Hanzhong Wang, Jun Wang2, Jun Wang3, Jun Wang4, Rifei Sun, Jian Wu, Shengyi Liu, Yinqi Bai2, Jeong-Hwan Mun5, Ian Bancroft6, Feng Cheng, Sanwen Huang, Xixiang Li, Wei Hua, Junyi Wang2, Xiyin Wang7, Xiyin Wang8, Michael Freeling9, J. Chris Pires10, Andrew H. Paterson7, Boulos Chalhoub, Bo Wang2, Alice Hayward11, Alice Hayward12, Andrew G. Sharpe13, Beom-Seok Park5, Bernd Weisshaar14, Binghang Liu2, Bo Li2, Bo Liu, Chaobo Tong, Chi Song2, Chris Duran15, Chris Duran11, Chunfang Peng2, Geng Chunyu2, Chushin Koh13, Chuyu Lin2, David Edwards15, David Edwards11, Desheng Mu2, Di Shen, Eleni Soumpourou6, Fei Li, Fiona Fraser6, Gavin C. Conant10, Gilles Lassalle16, Graham J.W. King4, Guusje Bonnema17, Haibao Tang9, Haiping Wang, Harry Belcram, Heling Zhou2, Hideki Hirakawa, Hiroshi Abe, Hui Guo7, Hui Wang, Huizhe Jin7, Isobel A. P. Parkin18, Jacqueline Batley11, Jacqueline Batley12, Jeong-Sun Kim5, Jérémy Just, Jianwen Li2, Jiaohui Xu2, Jie Deng, Jin A Kim5, Jingping Li7, Jingyin Yu, Jinling Meng19, Jinpeng Wang8, Jiumeng Min2, Julie Poulain20, Katsunori Hatakeyama, Kui Wu2, Li Wang8, Lu Fang, Martin Trick6, Matthew G. Links18, Meixia Zhao, Mina Jin5, Nirala Ramchiary21, Nizar Drou22, Paul J. Berkman11, Paul J. Berkman15, Qingle Cai2, Quanfei Huang2, Ruiqiang Li2, Satoshi Tabata, Shifeng Cheng2, Shu Zhang2, Shujiang Zhang, Shunmou Huang, Shusei Sato, Silong Sun, Soo-Jin Kwon5, Su-Ryun Choi21, Tae-Ho Lee7, Wei Fan2, Xiang Zhao2, Xu Tan7, Xun Xu2, Yan Wang, Yang Qiu, Ye Yin2, Yingrui Li2, Yongchen Du, Yongcui Liao, Yong Pyo Lim21, Yoshihiro Narusaka, Yupeng Wang8, Zhenyi Wang8, Zhenyu Li2, Zhiwen Wang2, Zhiyong Xiong10, Zhonghua Zhang 
TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.
Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.

1,811 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: A program is described, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases.
Abstract: We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.

9,629 citations

Journal ArticleDOI
05 Sep 1997-Science
TL;DR: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented and reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident.
Abstract: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer.

7,723 citations

Journal ArticleDOI
TL;DR: This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references.

6,603 citations

Journal ArticleDOI
24 Mar 2000-Science
TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

6,180 citations


"Analysis of the genome sequence of ..." refers background or methods in this paper

  • ...Gene ®nding involved three steps: (1) analysis of BAC sequences using a computational gene ®nder; (2) alignment of the sequence to the protein and EST databases; (3) assignment of functions to each of the genes....

    [...]

  • ...The Arabidopsis genome has a wealth of class I (2,109) and II (2,203) elements, including several new groups (1,209 elements; Supplementary Information Table 4)....

    [...]

Related Papers (5)