scispace - formally typeset
Search or ask a question
Author

R. Wambutt

Bio: R. Wambutt is an academic researcher. The author has contributed to research in topics: Genome project & Gene. The author has an hindex of 6, co-authored 6 publications receiving 1550 citations.

Papers
More filters
Journal ArticleDOI
29 Jan 1998-Nature
TL;DR: Analysis of the sequence revealed an average gene density of one gene every 4.8 kilobases, and 54% of the predicted genes had significant similarity to known genes, and other interesting features were found, such as the sequence of a disease-resistance gene locus, the distribution of retroelements, and the frequent occurrence of clustered gene families.
Abstract: The plant Arabidopsis thaliana (Arabidopsis) has become an important model species for the study of many aspects of plant biology. The relatively small size of the nuclear genome and the availability of extensive physical maps of the five chromosomes provide a feasible basis for initiating sequencing of the five chromosomes. The YAC (yeast artificial chromosome)-based physical map of chromosome 4 was used to construct a sequence-ready map of cosmid and BAC (bacterial artificial chromosome) clones covering a 1.9-megabase (Mb) contiguous region, and the sequence of this region is reported here. Analysis of the sequence revealed an average gene density of one gene every 4.8 kilobases (kb), and 54% of the predicted genes had significant similarity to known genes. Other interesting features were found, such as the sequence of a disease-resistance gene locus, the distribution of retroelements, the frequent occurrence of clustered gene families, and the sequence of several classes of genes not previously encountered in plants.

832 citations

Journal ArticleDOI
Klaus F. X. Mayer1, C. Schüller1, R. Wambutt, George Murphy2  +230 moreInstitutions (21)
16 Dec 1999-Nature
TL;DR: Analysis of 17.38 megabases of unique sequence, representing about 17% of the Arabidopsis genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements.
Abstract: The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.

411 citations

Journal ArticleDOI
TL;DR: The sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame were reported, finding a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted.
Abstract: With the complete human genomic sequence being unraveled, the focus will shift to gene identification and to the functional analysis of gene products. The generation of a set of cDNAs, both sequences and physical clones, which contains the complete and noninterrupted protein coding regions of all human genes will provide the indispensable tools for the systematic and comprehensive analysis of protein function to eventually understand the molecular basis of man. Here we report the sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame. Assignment to functional categories was possible for 52% (259) of the encoded proteins, the remaining fraction having no similarities with known proteins. By aligning the cDNA sequences with the sequences of the finished chromosomes 21 and 22 we identified a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted. Three of these genes appear to be present in several copies. We conclude that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes. The set of 500 novel cDNAs, and another 1000 full-coding cDNAs of known transcripts we have identified, adds up to cDNA representations covering 2%–5 % of all human genes. We thus substantially contribute to the generation of a gene catalog, consisting of both full-coding cDNA sequences and clones, which should be made freely available and will become an invaluable tool for detailed functional studies. [The sequence data described in this paper have been submitted to the EMBL database under the accession nos. given in Table ​Table22.] Table 2 Functional Classification of Individual cDNAsa The recent past has witnessed major advances in the determination of the sequence of the human genome (Dunham et al. 1999; Hattori et al. 2000). Although the whole genomic sequence will be completely unraveled in the near future (Collins et al. 1998), the identification of genes and the deciphering of gene structures will extend for a prolonged time, and cDNA sequences will continue to be invaluable tools for this adventure, especially in view of alternative splicing. The primary focus will shift to the functional analysis of the genes and their protein products to finally understand the molecular basis of human life. Current estimates vary between 29,000 and >70,000 genes to constitute the protein coding repertoire of the human genome (Fields et al. 1994; Ewing and Green 2000; Liang et al. 2000; Roest Crollius et al. 2000). However, thus far only some 11,000 cDNA sequences have been deposited in public databases, which are supposed to contain the complete protein coding open reading frame (ORF). The majority of the respective cDNA clones are most likely not accessible. The generation of a physical clone set representing all human genes that should be made freely accessible is consequently regarded to have an extremely high impact (Schuler 1997; Pruitt et al. 2000). This would permit the establishment of a catalog of clones to provide the resources needed in the proteomics era where the functions of proteins, their action in pathways, and the possible disease relation are deciphered. Until recently, the long-cDNA sequencing project carried out at the Kazusa Institute (Nomura et al. 1994; Nagase et al. 2000) Consortium had been the only systematic full-length cDNA sequencing project with a significant output of novel sequence information. The initiation of a new large-scale cDNA sequencing project has been announced lately that is coordinated by the National Institute of Health (Strausberg et al. 1999). We founded a cDNA Consortium in 1997 as part of the German Genome Project and aim at the characterization of the complete sequences of novel human transcripts at the cDNA level. Here, we report the sequences and analysis of 500 novel human cDNAs that all contain the complete protein coding region. These cDNAs constitute the most valuable essence of 30,000 clones that have been EST sequenced and 3630 fully sequenced cDNAs. Over 1000 cDNAs that cover the complete coding sequence of already known transcripts have been identified in the EST-sequenced clone set. All clones are made available through the Resource Center of the German Genome Project (RZPD).

185 citations

Journal ArticleDOI
TL;DR: The results demonstrate that conservation of the genome microstructure can be identified even between monocot and dicot species, and are consistent with the hypothesis that the Arabidopsis genome has undergone multiple duplication events.
Abstract: The nucleotide sequence was determined for a 340-kb segment of rice chromosome 2, revealing 56 putative protein-coding genes. This represents a density of one gene per 6.1 kb, which is higher than was reported for a previously sequenced segment of the rice genome. Sixteen of the putative genes were supported by matches to ESTs. The predicted products of 29 of the putative genes showed similarity to known proteins, and a further 17 genes showed similarity only to predicted or hypothetical proteins identified in genome sequence data. The region contains a few transposable elements: one retrotransposon, and one transposon. The segment of the rice genome studied had previously been identified as representing a part of rice chromosome 2 that may be homologous to a segment of Arabidopsis chromosome 4. We confirmed the conservation of gene content and order between the two genome segments. In addition, we identified a further four segments of the Arabidopsis genome that contain conserved gene content and order. In total, 22 of the 56 genes identified in the rice genome segment were represented in this set of Arabidopsis genome segments, with at least five genes present, in conserved order, in each segment. These data are consistent with the hypothesis that the Arabidopsis genome has undergone multiple duplication events. Our results demonstrate that conservation of the genome microstructure can be identified even between monocot and dicot species. However, the frequent occurrence of duplication, and subsequent microstructure divergence, within plant genomes may necessitate the integration of subsets of genes present in multiple redundant segments to deduce evolutionary relationships and identify orthologous genes.

73 citations

Patent
18 Aug 2000
TL;DR: In this article, the human cDNA sequence of a clone, the encoded protein sequence of clones, antibodies and variants thereof, are provided, and the disclosed sequence finds application in a number of ways, including use in profiling assays.
Abstract: Novel human cDNA sequence of a clones, the encoded protein sequence of a clones, antibodies and variants thereof, are provided. The disclosed sequence of a clones find application in a number of ways, including use in profiling assays. In this regard, various assemblages of nucleic acids or proteins are provided that are useful in providing large arrays of human material for implementing large-scale screening strategies. The disclosed sequence of a clones may also be used in formulating medicaments, treating various disorders and in certain diagnostic applications.

62 citations


Cited by
More filters
Journal ArticleDOI
14 Dec 2000-Nature
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

8,742 citations

Journal ArticleDOI
TL;DR: A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed and it is estimated that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%.

4,268 citations

Journal ArticleDOI
05 Apr 2002-Science
TL;DR: A draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp.indica, by whole-genome shotgun sequencing is produced, with a large proportion of rice genes with no recognizable homologs due to a gradient in the GC content of rice coding sequences.
Abstract: We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC-content of rice coding sequences.

4,064 citations

Journal ArticleDOI
15 Dec 2000-Science
TL;DR: The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms and reveals the evolutionary generation of diversity in the regulation of transcription.
Abstract: The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

2,582 citations

Journal ArticleDOI
TL;DR: The WRKY proteins are a superfamily of transcription factors with up to 100 representatives in Arabidopsis that appear to be involved in the regulation of various physio-logical programs that are unique to plants, including pathogen defense, senescence and trichome development.

2,447 citations