scispace - formally typeset
Search or ask a question

Showing papers by "Chris Sander published in 1997"


Journal ArticleDOI
01 May 1997-Proteins
TL;DR: In this paper, the same active site architecture was found for dihy-droorotases, allantoinases, hydantoinase, AMP-, adenine and cytosine deaminases, imid- azolonepropionase, aryldialkylphosphatase, chlorohydrolases, formylmethanofuran dehy- drogenases, and proteins involved in animal neuronal development.
Abstract: The recent determination of the three-dimensional structure of urease re- vealed striking similarities of enzyme architec- ture to adenosine deaminase and phosphotries- terase, evidence of a distant evolutionary relationship that had gone undetected by one- dimensional sequence comparisons. Here, based on an analysis of conservation patterns in three dimensions, we report the discovery of the same active-site architecture in an even larger set of enzymes involved primarily in nucleotide metabolism. As a consequence, we predict the three-dimensional fold and details of the active site architecture for dihy- droorotases, allantoinases, hydantoinases, AMP-, adenine and cytosine deaminases, imid- azolonepropionase, aryldialkylphosphatase, chlorohydrolases, formylmethanofuran dehy- drogenases, and proteins involved in animal neuronal development. Two member families are common to archaea, eubacteria, and eu- karyota. Thirteen other functions supported by the same structural motif and conserved chemi- cal mechanism apparently represent later adap- tations for different substrate specificities in different cellular contexts. Proteins 28:72-82, 1997 r1997 Wiley-Liss, Inc.

468 citations


Journal ArticleDOI
TL;DR: The FSSP database presents a continuously updated structural classification of three-dimensional protein folds that define useful test sets and a standard of truth for assessing the correctness of sequence-sequence or sequence-structure alignments.
Abstract: The FSSP database presents a continuously updated structural classification of three-dimensional protein folds. It is derived using an automatic structure comparison program (Dali) for the all-against-all comparison of over 6000 three-dimensional coordinate sets in the Protein Data Bank (PDB). Sequence-related protein families are covered by a representative set of 813 protein chains. Hierachical clustering based on structural similarities yields a fold tree that defines 253 fold classes. For each representative protein chain, there is a database entry containing structure-structure alignments with its structural neighbours in the PDB. The database is accessible online through World Wide Web browsers and by anonymous ftp (file transfer protocol). The overview of fold space and the individual data sets provide a rich source of information for the study of both divergent and convergent aspects of molecular evolution, and define useful test sets and a standard of truth for assessing the correctness of sequence-sequence or sequence-structure alignments.

444 citations


Journal ArticleDOI
TL;DR: It is hypothesised that existing 1D-3D threading methods essentially do not capture more than the fitness of an amino acid sequence for a particular 1D succession of secondary structure segments and residue solvent accessibility.

288 citations


Journal ArticleDOI
TL;DR: A new and-simple method is presented for judging the quality of a protein structure based on the distribution of backbone dihedral angles, resulting in a Ramachandran Z-score, expressing thequality of the Ramach andran plot relative to current state-of-the-art structures.
Abstract: Motivation: Statistical methods that compare observed and expected distributions of experimental observables provide powerful tools for the quality control of protein structures. The distribution of backbone dihedral angles ('Ramachandran plot') has often been used for such quality control, but without a firm statistical foundation. Results: A new and simple method is presented for judging the quality of a protein structure based on the distribution of backbone dihedral angles. Inputs to the method are 60 torsion angle distributions extracted from protein structures solved at high resolution; one for each combination of residue type and tri-state secondary structure. Output for a protein is a Ramachandran Z-score, expressing the quality of the Ramachandran plot relative to current state-of-the-art structures.

230 citations


Journal ArticleDOI
TL;DR: Substantial progress has recently been made in the availability of primary and added-value databases, in the development of algorithms and of network information services for genome analysis, and the pharmaceutical industry has greatly benefited from the accumulation of sequence data through the identification of targets and candidates for theDevelopment of drugs, vaccines, diagnostic markers and therapeutic proteins.

79 citations


Journal ArticleDOI
Liisa Holm1, Chris Sander1
TL;DR: How protein structure database searching can lead to evolutionary discoveries or the identification of new types of protein architecture is illustrated and the popular question of what constitutes a fold or fold class is revisited.

58 citations


Journal ArticleDOI
TL;DR: A modified SOM algorithm that includes a convergence test that dynamically controls the learning parameters to adapt them to the learning set instead of being fixed and externally optimized by trial and error is presented.
Abstract: Using a SOM (self-organizing map) we can classify sequences within a protein family into subgroups that generally correspond to biological subcategories. These maps tend to show sequence similarity as proximity in the map. Combining maps generated at different levels of resolution, the structure of relations in protein families can be captured that could not otherwise be represented in a single map. The underlying representation of maps enables us to retrieve characteristic sequence patterns for individual subgroups of sequences. Such patterns tend to correspond to functionally important regions. We present a modified SOM algorithm that includes a convergence test that dynamically controls the learning parameters to adapt them to the learning set instead of being fixed and externally optimized by trial and error. Given the variability of protein family size and distribution, the addition of this feature is necessary. The method is successfully tested with a number of families. The rab family of small GTPases is used to illustrate the performance of the method.

55 citations


Proceedings Article
21 Jun 1997
TL;DR: The semiautomatic prototype system significantly enhances the efficiency of unifying families of functionally related proteins in spite of long evolutionary distances.
Abstract: The structures of nearly a thousand sequence-unique proteins represent only 300 different 3D shapes, ls structural resemblance between proteins with little sequence similarity the result of physical convergence to favourable folding patterns, or does it reflect a memory of common evolutionary history? Separating these two processes is important for organizing genome data in terms of protein families and for theoretical approaches to protein structure prediction by fold recognition techniques. Achieving separation requires a combination of structure, sequence and functional analysis of proteins. For this purpose, we are developing a decision support system that scans heterogeneous protein sequence and structure related databases, and collects or calculates characters indicative of common functional constraints. The criteria include sequence homology, analysis of 3D clusters of conserved residues, conservation of active sites, and keyword analysis of biological function. Even without extensive refinement, application of a combination of these criteria to a test set representing all currently known protein structures yields 87% coverage with 7 % false positives, compared to 53 % coverage by only 1D sequence criteria. Thus, the semiautomatic prototype system significantly enhances the efficiency of unifying families of functionally related proteins in spite of long evolutionary distances.

44 citations


Journal ArticleDOI
01 Nov 1997-Yeast
TL;DR: It is concluded that the analysis of short open reading frames of the yeast genome leads to biologically interesting discoveries, even though the quantitative yield of new proteins is relatively low.
Abstract: We have analysed short open reading frames (between 150 and 300 base pairs long) of the yeast genome (Saccharomyces cerevisiae) with a two-step strategy. Thefirst step selects a candidate set of open reading frames from the DNA sequence based on statistical evaluation of DNA and protein sequence properties. The second step filters the candidate set by selecting open reading frames with high similarity to other known sequences (from any organism). As a result, we report ten new predicted proteins not present in the current sequence databases. These include a new alcohol dehydrogenase, a protein probably related to the cell cycle, as well as a homolog of the prokaryotic ribosomal protein L36 likely to be a mitochondrial ribosomal protein coded in the nuclear genome. We conclude that the analysis of short open reading frames leads to biologically interesting discoveries, even though the quantitative yield of new proteins is relatively low. ? 1997 John Wiley & Sons, Ltd.

25 citations


Journal ArticleDOI
15 Jun 1997-Yeast
TL;DR: Among the most interesting protein identifications in this DNA fragment are an inositol polyphosphatase, the second gene of this type found in yeast (homologous to the human OCRL gene involved in Lowe's syndrome), a new ADP ribosylation factor of the arf6 subfamily, the first protein containing three C2 domains, and an ORF similar to a Bacillus subtilis cell‐cycle related protein.
Abstract: We have determined the nucleotide sequence of 129,524 bases of yeast (Saccharomyces cerevisiae) chromosome XV. Sequence analysis revealed the presence of 59 non-overlapping open reading frames (ORFs) of length > 300 bp, three tRNA genes, four delta elements and one Ty-element. Among the 21 previously known yeast genes (36% of all ORFs in this fragment) were nucleoporin (NUP1), ras protein (RAS1), RNA polymerase III (RPC1) and elongation factor 2 (EF2). Further, 31 ORFs (53% of the total) were found to be homologous to known protein or DNA sequences, or sequence patterns. For seven ORFs (11% of the total) no homology was found. Among the most interesting protein identification in this DNA fragment are an inositol polyphosphatase, the second gene of this type found in yeast (homologous to the human OCRL gene involved in Lowe's syndrome), a new ADP ribosylation factor of the arf6 subfamily, the first protein containing three C2 domains, and an ORF similar to a Bacillus subtilis cell-cycle related protein. For each ORF detailed sequence analysis was carried out, with a full consideration of its biological function and pointing out key regions of interest for further functional analysis.

23 citations


Journal ArticleDOI
TL;DR: The performance of the GeneQuiz system is compared against the laboriously derived (but highly accurate) manual annotation of the previous attempts to assure potential users of the results that the quality of the analysis is high, despite the peculiar biochemical and phylogenetic disposition of this organism.
Abstract: With the completion of the genome sequence of Methanococcus jannaschii (Bult et al., 1996), computational analysis has revealed a number of interesting predicted functions for this organism. Although the success rate of the initial prediction was <40%, due to a conservative attitude towards possible over-interpretation (Venter, 1996), additional efforts to annotate the sequence follow, contributing a significant increase of functional assignments through a combination of different methods (Kyrpides et al., 1996). In our continuing effort to annotate the gene products for each complete genome (Casari et al., 1995), we have analyzed the full genomic sequence of M.jannaschii, and predicted gene function by sequence similarity. We use GeneQuiz, a system for large-scale sequence analysis (Scharf et al., 1994), which exploits the combination of a number of predictive methods with a rule-based engine that increases the success of predictions thanks to a collection of heuristics and a number of benchmarking cycles (Casari et al., 1996). We have maintained a strong interest in the study of Archaea (Ouzounis and Sander, 1992; Ouzounis et al., 1995), and the analysis of the M.jannaschii genome was greatly anticipated. The scope of this communication is 3-fold: (i) to compare the performance of the GeneQuiz system against the laboriously derived (but highly accurate) manual annotation of the previous attempts; (ii) to discuss some of the cases where GeneQuiz has succeeded or failed; (iii) to assure potential users of the results that the quality of the analysis is high, despite the peculiar biochemical and phylogenetic disposition of this organism. From a total of 1682 chromosomal open reading frames (ORFs), GeneQuiz has assigned function to 774 (46%) with high confidence ('clear' cases), and, with decreasing confidence, 118 (7%) with probable function ('tentative' cases), while 482 (29%) ORFs have a clear homolog whose function remains unknown. The remaining 308 cases (18%)

Journal ArticleDOI
01 Apr 1997