scispace - formally typeset

Journal ArticleDOI

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

14 Dec 2000-Nature (Nature Publishing Group)-Vol. 408, Iss: 6814, pp 796-815

TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Topics: Genome evolution (63%), Genome (61%), Arabidopsis (61%), The Arabidopsis Information Resource (60%), Arabidopsis arenosa (58%)
Citations
More filters

Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

11,645 citations


Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,369 citations


Journal ArticleDOI
01 Aug 2003-Science
TL;DR: Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels, and insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.
Abstract: Over 225,000 independent Agrobacterium transferred DNA (T-DNA) insertion events in the genome of the reference plant Arabidopsis thaliana have been created that represent near saturation of the gene space. The precise locations were determined for more than 88,000 T-DNA insertions, which resulted in the identification of mutations in more than 21,700 of the approximately 29,454 predicted Arabidopsis genes. Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels. Insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.

4,996 citations


Journal ArticleDOI
Malcolm J. Gardner1, Neil Hall1, Eula Fung1, Owen White1  +41 moreInstitutions (1)
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,082 citations


Journal ArticleDOI
05 Apr 2002-Science
TL;DR: A draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp.indica, by whole-genome shotgun sequencing is produced, with a large proportion of rice genes with no recognizable homologs due to a gradient in the GC content of rice coding sequences.
Abstract: We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC-content of rice coding sequences.

3,918 citations


References
More filters

Journal ArticleDOI
Stephen F. Altschul1, Warren Gish1, Webb Miller2, Eugene W. Myers3  +1 moreInstitutions (3)
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Abstract: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straight-forward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

81,150 citations


Journal ArticleDOI
Todd M. Lowe1, Sean R. Eddy1Institutions (1)
TL;DR: A program is described, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases.
Abstract: We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.

8,707 citations


Journal ArticleDOI
05 Sep 1997-Science
TL;DR: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented and reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident.
Abstract: The 4,639,221-base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer.

7,421 citations


Journal ArticleDOI
TL;DR: This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references.
Abstract: To facilitate understanding of, and access to, the information available for protein structures, we have constructed the Structural Classification of Proteins (scop) database. This database provides a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure. It also provides for each entry links to co-ordinates, images of the structure, interactive viewers, sequence data and literature references. Two search facilities are available. The homology search permits users to enter a sequence and obtain a list of any structures to which it has significant levels of sequence similarity. The key word search finds, for a word entered by the user, matches from both the text of the scop database and the headers of Brookhaven Protein Databank structure files. The database is freely accessible on World Wide Web (WWW) with an entry point to URL http://scop.mrc-lmb.cam.ac.uk/scop/ scop: an old English poet or minstrel (Oxford English Dictionary); ckon: pile, accumulation (Russian Dictionary).

6,482 citations


Journal ArticleDOI
24 Mar 2000-Science
TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

5,958 citations


"Analysis of the genome sequence of ..." refers background or methods in this paper

  • ...Gene ®nding involved three steps: (1) analysis of BAC sequences using a computational gene ®nder; (2) alignment of the sequence to the protein and EST databases; (3) assignment of functions to each of the genes....

    [...]

  • ...The Arabidopsis genome has a wealth of class I (2,109) and II (2,203) elements, including several new groups (1,209 elements; Supplementary Information Table 4)....

    [...]


Network Information
Related Papers (5)
05 Apr 2002, Science

Stephen A. Goff, Darrell O. Ricke +53 more

01 Oct 1990, Journal of Molecular Biology

Stephen F. Altschul, Warren Gish +3 more

15 Feb 2001, Nature

Eric S. Lander, Lauren Linton +247 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20228
2021257
2020248
2019296
2018266
2017279