scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

14 Dec 2000-Nature (Nature Publishing Group)-Vol. 408, Iss: 6814, pp 796-815
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations

Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,643 citations

Journal ArticleDOI
01 Aug 2003-Science
TL;DR: Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels, and insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.
Abstract: Over 225,000 independent Agrobacterium transferred DNA (T-DNA) insertion events in the genome of the reference plant Arabidopsis thaliana have been created that represent near saturation of the gene space. The precise locations were determined for more than 88,000 T-DNA insertions, which resulted in the identification of mutations in more than 21,700 of the approximately 29,454 predicted Arabidopsis genes. Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels. Insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.

5,227 citations

Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,312 citations

Journal ArticleDOI
TL;DR: New features have been implemented to search for plant cis-acting regulatory elements in a query sequence and links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes.
Abstract: PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.

4,184 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a meta-analyses of the response of the immune system to canine coronavirus to infectious disease and shows clear patterns in response to antibiotics and in particular the immune response to EMT.

896 citations

Journal ArticleDOI
TL;DR: The complete sequence of the mitochondrial DNA in the model plant species Arabidopsis thaliana is determined, affording access to the first of its three genomes.
Abstract: We have determined the complete sequence of the mitochondrial DNA in the model plant species Arabidopsis thaliana, affording access to the first of its three genomes. The 366,924 nucleotides code for 57 identified genes, which cover only 10% of the genome. Introns in these genes add about 8%, open reading frames larger than 100 amino acids represent 10% of the genome, duplications account for 7%, remnants of retrotransposons of nuclear origin contribute 4% and integrated plastid sequences amount to 1%-leaving 60% of the genome unaccounted for. With the significant contribution of duplications, imported foreign DNA and the extensive background of apparently functionless sequences, the mosaic structure of the Arabidopsis thaliana mitochondrial genome features many aspects of size-relaxed nuclear genomes.

893 citations

Journal ArticleDOI
TL;DR: The Toll-Dorsal pathway in Drosophila and the interleukin-1 receptor-NF-kappa B pathway in mammals are homologous signal transduction pathways that mediate several different biological responses.
Abstract: The Toll-Dorsal pathway in Drosophila and the interleukin-1 receptor (IL-1R)-NF-kappa B pathway in mammals are homologous signal transduction pathways that mediate several different biological responses. In Drosophila, genetic analysis of dorsal-ventral patterning of the embryo has defined the series of genes that mediate the Toll-Dorsal pathway. Binding of extracellular ligand activates the transmembrane receptor Toll, which requires the novel protein Tube to activate the cytoplasmic serine/threonine kinase Pelle. Pelle activity controls the degradation of the Cactus protein, which is present in a cytoplasmic complex with the Dorsal protein. Once Cactus is degraded in response to signal, Dorsal is free to move into the nucleus where it regulates transcription of specific target genes. The Toll, tube, pelle, cactus, and dorsal genes also appear to be involved in Drosophila immune response. Because the IL-1R-NF-kappa B pathway plays a role in vertebrate innate immunity and because plant homologues of the Toll-Dorsal pathway are important in plant disease resistance, it is likely that this pathway arose before the divergence of plants and animals as a defense against pathogens.

846 citations

Journal ArticleDOI
13 Apr 2000-Nature
TL;DR: It is shown that the closely related SHATTERPROOF (SHP1) and SHP2 ) MADS-box genes are required for fruit dehiscence in Arabidopsis, and that further analysis of the molecular events underlying fruit deHiscence may allow genetic manipulation of pod shatter in crop plants.
Abstract: The fruit, which mediates the maturation and dispersal of seeds, is a complex structure unique to flowering plants. Seed dispersal in plants such as Arabidopsis occurs by a process called fruit dehiscence, or pod shatter. Few studies have focused on identifying genes that regulate this process, in spite of the agronomic value of controlling seed dispersal in crop plants such as canola. Here we show that the closely related SHATTERPROOF (SHP1) and SHATTERPROOF2 (SHP2) MADS-box genes are required for fruit dehiscence in Arabidopsis. Moreover, SHP1 and SHP2 are functionally redundant, as neither single mutant displays a novel phenotype. Our studies of shp1 shp2 fruit, and of plants constitutively expressing SHP1 and SHP2, show that these two genes control dehiscence zone differentiation and promote the lignification of adjacent cells. Our results indicate that further analysis of the molecular events underlying fruit dehiscence may allow genetic manipulation of pod shatter in crop plants.

833 citations

Related Papers (5)