scispace - formally typeset
Search or ask a question

Showing papers by "Chris Sander published in 1995"



Journal ArticleDOI
TL;DR: The double cubic lattice method (DCLM) is an accurate and rapid approach for computing numerically molecular surface areas and the volume and compactness of molecular assemblies and for generating dot surfaces, and is the method of choice, especially for large molecular complexes and high point densities.
Abstract: The double cubic lattice method (DCLM) is an accurate and rapid approach for computing numerically molecular surface areas (such as the solvent accessible or van der Waals surface) and the volume and compactness of molecular assemblies and for generating dot surfaces. The algorithm has no special memory requirements and can be easily implemented. The computation speed is extremely high, making interactive calculation of surfaces, volumes, and dot surfaces for systems of 1000 and more atoms possible on single-processor workstations. The algorithm can be easily parallelized. The DCLM is an algorithmic variant of the approach proposed by Shrake and Rupley (J. Mol. Biol., 79, 351–371, 1973). However, the application of two cubic lattices—one for grouping neighboring atomic centers and the other for grouping neighboring surface dots of an atom—results in a drastic reduction of central processing unit (CPU) time consumption by avoiding redundant distance checks. This is most noticeable for compact conformations. For instance, the calculation of the solvent accessible surface area of the crystal conformation of bovine pancreatic trypsin inhibitor (entry 4PTI of the Brookhaven Protein Data Bank, 362-point sphere for all 454 nonhydrogen atoms) takes less than 1 second (on a single R3000 processor of an SGI 4D/480, about 5 MFLOP). The DCLM does not depend on the spherical point distribution applied. The quality of unit sphere tesselations is discussed. We propose new ways of subdivision based on the icosahedron and dodecahedron, which achieve constantly low ratios of longest to shortest arcs over the whole frequency range. The DCLM is the method of choice, especially for large molecular complexes and high point densities. Its speed has been compared to the fastest techniques known to the authors, and it was found to be superior, especially when also taking into account the small memory requirement and the flexibility of the algorithm. The program text may be obtained on request. © 1995 by John Wiley & Sons, Inc.

805 citations


Journal ArticleDOI
TL;DR: A novel method is presented that exploits conservation patterns for the prediction of functional residues in SH2 domains and in the conserved box of cyclins, using a simple but powerful representation of entire proteins, as well as sequence residues as vectors in a generalised ‘sequence space’.
Abstract: The biological activity of a protein typically depends on the presence of a small number of functional residues. Identifying these residues from the amino acid sequences alone would be useful. Classically, strictly conserved residues are predicted to be functional but often conservation patterns are more complicated. Here, we present a novel method that exploits such patterns for the prediction of functional residues. The method uses a simple but powerful representation of entire proteins, as well as sequence residues as vectors in a generalised 'sequence space'. Projection of these vectors onto a lower-dimensional space reveals groups of residues specific for particular subfamilies that are predicted to be directly involved in protein function. Based on the method we present testable predictions for sets of functional residues in SH2 domains and in the conserved box of cyclins.

428 citations



Journal ArticleDOI
TL;DR: This work shows that members of structural protein families have a low mutual PropSearch distance when the weights are optimized to discriminate maximally between structural families, and demonstrates the results of database searches using the PropSearch method.

181 citations


Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: This study focuses on replacing side chains as a subtask of model building by homology by choosing position‐specific rather than generalized rotamers and by sorting the residues that have to be modelled as a function of their freedom in rotamer space.
Abstract: In this study we concentrate on replacing side chains as a subtask of model building by homology. Two problems arise. How to determine potential low energy rotamers? And how to avoid the combinatorial explosion that results from the combination of many residues for which multiple good rotamers are predicted? We attempt to solve these problems by choosing position-specific rather than generalized rotamers and by sorting the residues that have to be modelled as a function of their freedom in rotamer space. The practical advantages of our method are the quality of the models for cases of high backbone similarity, the small amount of human intervention needed, and the fact that the method automatically estimates the reliability with which each residue has been modeled. Other methods described in this issue are probably more suitable if large backbone rearrangements or loop insertions and deletions need to be modeled. © 1995 Wiley-Liss, Inc.

142 citations


Journal ArticleDOI
01 Nov 1995-Proteins
TL;DR: Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system.
Abstract: Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system. For the Asilomar meeting, predictions for 13 proteins were generated automatically using the publicly available prediction method PHD. The results confirm the estimate of 72% three-state prediction accuracy. The fairly accurate predictions of secondary structure segments made the tool useful as a starting point for modeling of higher dimensional aspects of protein structure. © 1995 Wiley-Liss, Inc.

98 citations


Journal ArticleDOI
TL;DR: This survey is beginning to provide a detailed view of how M. capricolum manages to maintain essential cellular processes with a genome much smaller than that of its bacterial relatives.
Abstract: We report on the analysis of 214kb of the parasitic eubacterium Mycoplasma capricolum sequenced by genomic walking techniques. The 287 putative proteins detected to date represent about half of the estimated total number of 500 predicted for this organism. A large fraction of these (75%) can be assigned a likely function as a result of similarity searches. Several important features of the functional organization of this small genome are already apparent. Among these are (i) the expected relatively large number of enzymes involved in metabolic transport and activation, for efficient use of host cell nutrients; (ii) the presence of anabolic enzymes; (iii) the unexpected diversity of enzymes involved in DNA replication and repair; and (iv) a sizeable number of orthologues (82 so far) in Escherichia coli. This survey is beginning to provide a detailed view of how M. capricolum manages to maintain essential cellular processes with a genome much smaller than that of its bacterial relatives.

98 citations



Journal ArticleDOI
01 Jul 1995-Proteins
TL;DR: The proposed 3D model of TagD is plausible both structurally, with a well packed hydrophobic core, and functionally, as the most conserved residues cluster around the putative nucleotide binding site.
Abstract: The crystal structure of glycerol-3-phosphate cytidylyltransferase from B. subtilis (TagD) is about to be solved. Here, we report a testable structure prediction based on the identification by sequence analysis of a superfamily of functionally diverse but structurally similar nucleotide-binding enzymes. We predict that TagD is a member of this family. The most conserved region in this superfamily resembles the ATP-binding HiGH motif of class I aminoacyl-tRNA synthetases. The predicted secondary structure of cytidylyltransferase and its homologues is compatible with the alpha/beta topography of the class I aminoacyl-tRNA synthetases. The hypothesis of similarity of fold is strengthened by sequence-structure alignment and 3D model building using the known structure of tyrosyl tRNA synthetase as template. The proposed 3D model of TagD is plausible both structurally, with a well packed hydrophobic core, and functionally, as the most conserved residues cluster around the putative nucleotide binding site. If correct, the model would imply a very ancient evolutionary link between class I tRNA synthetases and the novel cytidylyltransferase superfamily.

89 citations


31 Dec 1995
TL;DR: This work presents a novel heuristic for identifying 3-D similarities between a query structure and the database of known protein structures, which is useful as a rapid preprocessor to a comprehensive protein structure database search system.
Abstract: There are far fewer classes of three-dimensional protein folds than sequence families but the problem of detecting three-dimensional similarities is NP-complete. We present a novel heuristic for identifying 3-D similarities between a query structure and the database of known protein structures. Many methods for structure alignment use a bottom-up approach, identifying first local matches and then solving a combinatorial problem in building up larger clusters of matching substructures. Here the top-down approach is to start with the global comparison and select a rough superimposition using a fast 3-D lookup of secondary structure motifs. The superimposition is then extended to an alignment of C{sup {alpha}} atoms by an iterative dynamic programming step. An all-against-all comparison of 385-representative proteins (150,000 pair comparisons) took 1 day of computer time on a single R8000 processor. In other words, one query structure is scanned against the database in a matter of minutes. The method is rated at 90% reliability at capturing statistically significant similarities. It is useful as a rapid preprocessor to a comprehensive protein structure database search system.

Proceedings Article
01 Jan 1995
TL;DR: In this article, a top-down approach is proposed to identify 3D similarities between a query structure and the database of known protein structures. But the problem of detecting three-dimensional similarities is NP-complete.
Abstract: There are far fewer classes of three-dimensional protein folds than sequence families but the problem of detecting three-dimensional similarities is NP-complete. We present a novel heuristic for identifying 3-D similarities between a query structure and the database of known protein structures. Many methods for structure alignment use a bottom-up approach, identifying first local matches and then solving a combinatorial problem in building up larger clusters of matching substructures. Here, the top-down approach is to start with the global comparison and select a rough superimposition using a fast 3-D lookup of secondary structure motifs. The superimposition is then extended to an alignment of C alpha atoms by an iterative dynamic programming step. An all-against-all comparison of 385 representative proteins (150,000 pair comparisons) took 1 day of computer time on a single R8000 processor. In other words, one query structure is scanned against the database in a matter of minutes. The method is rated at 90% reliability at capturing statistically significant similarities. It is useful as a rapid preprocessor to a comprehensive protein structure database search system.

Journal ArticleDOI
TL;DR: An unexpected similarity in three‐dimensional structure between glucosyltransferases involved in very different biochemical pathways, with interesting evolutionary and functional implications, is reported, derived from a common ancient evolutionary ancestor of the two enzymes.
Abstract: We report here an unexpected similarity in three-dimensional structure between glucosyltransferases involved in very different biochemical pathways, with interesting evolutionary and functional implications. One is the DNA modifying enzyme beta-glucosyltransferase from bacteriophage T4, alias UDP-glucose:5-hydroxymethyl-cytosine beta-glucosyltransferase. The other is the metabolic enzyme glycogen phosphorylase, alias 1.4-alpha-D-glucan:orthophosphate alpha-glucosyltransferase. Structural alignment revealed that the entire structure of beta-glucosyltransferase is topographically equivalent to the catalytic core of the much larger glycogen phosphorylase. The match includes two domains in similar relative orientation and connecting helices, with a positional root-mean-square deviation of only 3.4 A for 256 C alpha atoms. An interdomain rotation seen in the R- to T-state transition of glycogen phosphorylase is similar to that observed in beta-glucosyltransferase on substrate binding. Although not a single functional residue is identical, there are striking similarities in the spatial arrangement and in the chemical nature of the substrates. The functional analogies are (beta-glucosyltransferase-glycogen phosphorylase): ribose ring of UDP-pyridoxal ring of pyridoxal phosphate co-enzyme; phosphates of UDP-phosphate of co-enzyme and reactive orthophosphate; glucose unit transferred to DNA-terminal glucose unit extracted from glycogen. We anticipate the discovery of additional structurally conserved members of the emerging glucosyltransferase superfamily derived from a common ancient evolutionary ancestor of the two enzymes.

Journal ArticleDOI
TL;DR: It is shown that the putative laminin receptor family of eukaryotes and an archaean homologue belong to the previously characterized ribosomal protein family S2 from eubacteria, suggesting that archaea seem to have a mode of expression of genetic information rather similar to eUKaryotes, while eub bacteria may have proceeded into unique ways of transcription and translation.
Abstract: In a quest for novel functions in archaea, all archaean hypothetical open reading frames (ORFs), as annotated in the Swiss-Prot protein sequence database, were used to search the latest databases for the identification of characterized homologues. Of the 95 hypothetical archaean ORFs, 25 were found to be homologous to another hypothetical archaean ORF, while 36 were homologous to non-archaean proteins, of which as many as 30 were homologous to a characterized protein family. Thus the level of sequence similarity in this set reaches 64%, while the level of function assignment is only 32%. Of the ORFs with predicted functions, 12 homologies are reported here for the first time and represent nine new functions and one gene duplication at an acetyl-coA synthetase locus. The novel functions include components of the transcriptional and translational apparatus, such as ribosomal proteins, modification enzymes and a translation initiation factor. In addition, new enzymes are identified in archaea, such as cobyric acid synthase, dCTP deaminase and the first archaean homologues of a new subclass of ATP binding proteins found in fungi. Finally, it is shown that the putative laminin receptor family of eukaryotes and an archaean homologue belong to the previously characterized ribosomal protein family S2 from eubacteria. From the present and previous work, the major implication is that archaea seem to have a mode of expression of genetic information rather similar to eukaryotes, while eubacteria may have proceeded into unique ways of transcription and translation. In addition, with the detection of proteins in various metabolic and genetic processes in archaea, we can further predict the presence of additional proteins involved in these processes.

Journal ArticleDOI
TL;DR: The analysis of the 269 open reading frames of yeast chromosome VIII by computational methods has yielded 24 new significant sequence similarities to proteins of known function, including peptidyl‐tRNA hydrolase, a ribosome recycling factor homologue, and a protein similar to cytochrome b translational activator CBS2.
Abstract: The analysis of the 269 open reading frames of yeast chromosome VIII by computational methods has yielded 24 new significant sequence similarities to proteins of known function. The resulting predicted functions include three particularly interesting cases of translation-associated proteins: peptidyl-tRNA hydrolase, a ribosome recycling factor homologue, and a protein similar to cytochrome b translational activator CBS2. The methodological limits of the meaningful transfer of functional information between distant homologues are discussed.

Journal ArticleDOI
TL;DR: Computational analysis shows that this LAC ORF arrangement is conserved in other hsp70 loci in a wide range of organisms, raising questions about possible evolutionary benefits of such a peculiar genomic organization.
Abstract: A clone isolated from a Drosophila auraria heat-shock cDNA library presents two long, antiparallel, coupled (LAC) open reading frames (ORFs). One strand ORF is 1,929 nucleotides long and exhibits great identity (87.5% at the nucleotide level and 94% at the amino acid level) with the hsp70 gene copies of D. melanogaster, while the second strand ORF, in antiparallel in-frame register arrangement, is 1,839 nucleotides long and exhibits 32% identity with a putative, recently identified, NAD+-dependent glutamate dehydrogenase (NAD+-GDH). The overlap of the two ORFs is 1,824 nucleotides long. Computational analysis shows that this LAC ORF arrangement is conserved in other hsp70 loci in a wide range of organisms, raising questions about possible evolutionary benefits of such a peculiar genomic organization.

Journal ArticleDOI
01 Jan 1995-Yeast
TL;DR: The nucleotide sequence of a cosmid containing the centromere region of yeast (Saccharomyces cerevisiae) chromosome IX is determined by using an efficient directed sequencing strategy in combination with automated DNA sequencing on the A.L.F. DNA sequencer.
Abstract: We have determined the nucleotide sequence of a cosmid (pIX338) containing the centromere region of yeast (Saccharomyces cerevisiae) chromosome IX. The complete nucleotide sequence of 33·8 kb was obtained by using an efficient directed sequencing strategy in combination with automated DNA sequencing on the A.L.F. DNA sequencer. Sequence analysis revealed the presence of 17 open reading frames (ORFs), four of them previously known yeast genes (sly12, pan1, sts1 and prl1), a tRNA gene and the centromere motif. Exhaustive database searches detected sequence homologues of known function for as many as 14 of the 17 ORFs. These include a mammalian tyrosine kinase substrate; the Escherichia coli cell cycle protein MinD; the human inositol polyphosphate-5-phosphatase (gene OCRL) involved in Lowe's syndrome, a developmental disorder; and helicases, for which the new yeast member defines a distinct DEAD/H-box subfamily. A surprisingly large fraction of the ORFs (at least six out of 17) in the centromeric region are apparently involved in RNA or DNA binding. The nucleotide sequence reported here has been submitted to the EMBL data library under the accession number X79743.

Journal ArticleDOI
TL;DR: This work has engineered the Kinase 1 and 2 motifs into a protein that has the CMBF and no nucleotide binding activity, the chemotactic protein from Escherichia coli, CheY, which demonstrates that the native structure of the P-loop requires external interactions with the rest of the protein.

Book ChapterDOI
04 Jun 1995
TL;DR: This subdivision of the genomes of four best known model organisms can form a design principle for the construction of computational models of genomes and organisms and, ultimately, the design and fabrication of artificial organisms.
Abstract: How similar are the engineering principles of artificial and natural machines? One way to approach this question is to compare in detail the basic functional components of living cells and human-made machines. Here, we provide some basic material for such a comparison, based on the analysis of functions for a few thousand protein molecules, the most versatile functional components of living cells. The composition of the genomes of four best known model organisms is analyzed and three major classes of molecular functions are defined: energy-, information- and communication-related. It is interesting that at the expense of the other two categories, communication-related coding potential has increased in relative numbers during evolution, and the progression from prokaryotes to eukaryotes and from unicellular to multi-cellular organisms. Based on the currently available data, 42% of the four genomes codes for energy-related proteins, 37% for information-related proteins, and finally the rest 21% for communication-related proteins, on average. This subdivision, and future refinements thereof, can form a design principle for the construction of computational models of genomes and organisms and, ultimately, the design and fabrication of artificial organisms.