scispace - formally typeset
Search or ask a question

Showing papers by "Chris Sander published in 1992"


Journal ArticleDOI
TL;DR: A common evolutionary origin for all of the proteins in this class is proposed, and a pattern of amino acid properties required at each position is defined, which significantly matches sugar kinases, such as fuco-, glucono-, xylulo-, ribulo-, and glycerokinase.
Abstract: The functionally diverse actin, hexokinase, and hsp70 protein families have in common an ATPase domain of known three-dimensional structure. Optimal superposition of the three structures and alignment of many sequences in each of the three families has revealed a set of common conserved residues, distributed in five sequence motifs, which are involved in ATP binding and in a putative interdomain hinge. From the multiple sequence alignment in these motifs a pattern of amino acid properties required at each position is defined. The discriminatory power of the pattern is in part due to the use of several known three-dimensional structures and many sequences and in part to the "property" method of generalizing from observed amino acid frequencies to amino acid fitness at each sequence position. A sequence data base search with the pattern significantly matches sugar kinases, such as fuco-, glucono-, xylulo-, ribulo-, and glycerokinase, as well as the prokaryotic cell cycle proteins MreB, FtsA, and StbA. These are predicted to have subdomains with the same tertiary structure as the ATPase subdomains Ia and IIa of hexokinase, actin, and Hsc70, a very similar ATP binding pocket, and the capacity for interdomain hinge motion accompanying functional state changes. A common evolutionary origin for all of the proteins in this class is proposed.

827 citations


Journal ArticleDOI
TL;DR: Two algorithms are developed to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy and are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory.
Abstract: The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.

817 citations


Journal ArticleDOI
TL;DR: The results lead to the hypothesis that this type of domain has a common tertiary structure and that there is a functional similarity in the recognition mechanism of the sperm receptor system and the TGF‐β receptor complex.

305 citations


Journal ArticleDOI
TL;DR: The database makes explicitly visible architectural similarities in the known part of the universe of protein folds and may be useful for understanding protein folding and for extracting structural modules for protein design.
Abstract: The availability of fast and robust algorithms for protein structure comparison provides an opportunity to produce a database of three-dimensional comparisons, called families of structurally similar proteins (FSSP). The database currently contains an extended structural family for each of 154 representative (below 30% sequence identity) protein chains. Each data set contains: the search structure; all its relatives with 70-30% sequence identity, aligned structurally; and all other proteins from the representative set that contain substructures significantly similar to the search structure. Very close relatives (above 70% sequence identity) rarely have significant structural differences and are excluded. The alignments of remote relatives are the result of pairwise all-against-all structural comparisons in the set of 154 representative protein chains. The comparisons were carried out with each of three novel automatic algorithms that cover different aspects of protein structure similarity. The user of the database has the choice between strict rigid-body comparisons and comparisons that take into account interdomain motion or geometrical distortions; and, between comparisons that require strictly sequential ordering of segments and comparisons, which allow altered topology of loop connections or chain reversals. The data sets report the structurally equivalent residues in the form of a multiple alignment and as a list of matching fragments to facilitate inspection by three-dimensional graphics. If substructures are ignored, the result is a database of structure alignments of full-length proteins, including those in the twilight zone of sequence similarity.(ABSTRACT TRUNCATED AT 250 WORDS)

226 citations


Journal ArticleDOI
TL;DR: Atomic solvation preference is recommended for use as a diagnostic tool in model building based on sequence similarity, in folding simulations and in protein design and is computationally fast compared to methods based on surface area calculations.

193 citations


Journal ArticleDOI
01 Oct 1992-Proteins
TL;DR: An extremely efficient Monte Carlo algorithm in rotamer space with simulated annealing and simple potential energy functions is used to optimize the packing of side chains on given backbone models.
Abstract: An unknown protein structure can be predicted with fair accuracy once an evolutionary connection at the sequence level has been made to a protein of known 3-D structure. In model building by homology, one typically starts with a backbone framework, rebuilds new loop regions, and replaces nonconserved side chains. Here, we use an extremely efficient Monte Carlo algorithm in rotamer space with simulated annealing and simple potential energy functions to optimize the packing of side chains on given backbone models. Optimized models are generated within minutes on a workstation, with reasonable accuracy (average of 81% side chain chi 1 dihedral angles correct in the cores of proteins determined at better than 2.5 A resolution). As expected, the quality of the models decreases with decreasing accuracy of backbone coordinates. If the back-bone was taken from a homologous rather than the same protein, about 70% side chain chi 1 angles were modeled correctly in the core in a case of strong homology and about 60% in a case of medium homology. The algorithm can be used in automated, fast, and reproducible model building by homology.

155 citations


Journal ArticleDOI
TL;DR: It is suggested that the development of an automated computer workbench for protein sequence analysis must be an important item in genome projects because the information gap between known protein sequences and unknown function is expected to widen and become a major bottleneck of genome projects in the near future.
Abstract: With the completion of the first phase of the European yeast genome sequencing project, the complete DNA sequence of chromosome III of Saccharomyces cerevisiae has become available (Oliver, S. G., et al., 1992, Nature 357, 38-46). We have tested the predictive power of computer sequence analysis of the 176 probable protein products of this chromosome, after exclusion of six problem cases. When the results of database similarity searches are pooled with prior knowledge, a likely function can be assigned to 42% of the proteins, and a predicted three-dimensional structure to a third of these (14% of the total). The function of the remaining 58% remains to be determined. Of these, about one-third have one or more probable transmembrane segments. Among the most interesting proteins with predicted functions are a new member of the type X polymerase family, a transcription factor with an N-terminal DNA-binding domain related to GAL4, a "fork head" DNA-binding domain previously known only in Drosophila and in mammals, and a putative methyltransferase. Our analysis increased the number of known significant sequence similarities on chromosome III by 13, to now 67. Although the near 40% success rate of identifying unknown protein function by sequence analysis is surprisingly high, the information gap between known protein sequences and unknown function is expected to widen and become a major bottleneck of genome projects in the near future. Based on the experience gained in this test study, we suggest that the development of an automated computer workbench for protein sequence analysis must be an important item in genome projects.

98 citations


Journal ArticleDOI
TL;DR: The DnaJ family shows typical features of mosaic proteins 9 which contain different building units (modules) with separate functions, and a surprising similarity to the ring-infected erythrocyte surface antigen of the malaria parasite Plasmodium falciparum.

89 citations


Journal ArticleDOI
10 Dec 1992-Nature

80 citations



Journal ArticleDOI
TL;DR: An atomic model of the fibre shaft of the adenovirus fibre has been constructed by computer modelling techniques and satisfies criteria of extensive hydrogen bonding, reasonable backbone torsion angles, burial of most hydrophobic residues and good packing of the Hydrophobic core.

Journal ArticleDOI
TL;DR: Two cDNAs from Zea mays are isolated, cloned, and characterized, encoding proteins related to the ypt protein family, suggesting that they could be involved in the control of secretory processes.
Abstract: We have isolated, cloned, and characterized two cDNAs from Zea mays (L.), denoted yptm1 and yptm2, encoding proteins related to the ypt protein family. Amino acid similarity scores with YPT1 from yeast and ypt from mouse are in the range of 70% for yptm1 and 74% for yptm2, respectively, whereas similarities with p21 ras and other ras-related proteins are less than 40%. Most amino acid residues showing identity are clustered in the GTP/GDP binding domain. In addition, two cysteine residues close to the C-terminal ends, known to be palmitoylated and necessary for membrane binding in all eukaryotic ras-related proteins that have been characterized so far, are conserved in the maize genes as well. Northern blot hybridization analysis of poly(A)+ mRNA from etiolated maize coleoptiles revealed single mRNA species of approximately the same size as the isolated cDNAs. The gene for yptm1 is expressed at very low levels in maize coleoptiles and tissue culture cells. The gene for yptm2 is expressed at higher levels and is differentially represented in RNAs isolated from various organs of maize plants, with its highest level in leaves and flowers. The structural similarity of the genes identified suggests that they could be involved in the control of secretory processes.

Journal ArticleDOI
TL;DR: The results suggest, that in the short cytoplasmic tail of LAP tyrosine is required for stabilization of the right turn and that the aromatic ring system of the tyosine residue is a contact point to the putative cytop lasmic receptor.
Abstract: Lysosomal acid phosphatase (LAP) is rapidly internalized from the cell surface due to a tyrosine-containing internalization signal in its 19 amino acid cytoplasmic tail. Measuring the internalization of a series of LAP cytoplasmic tail truncation and substitution mutants revealed that the N-terminal 12 amino acids of the cytoplasmic tail are sufficient for rapid endocytosis and that the hexapeptide 411-PGYRHV-416 is the tyrosine-containing internalization signal. Truncation and substitution mutants of amino acid residues following Val416 can prevent internalization even though these residues do not belong to the internalization signal. It was shown recently that part of the LAP cytoplasmic tail peptide corresponding to 410-PPGY-413 forms a well-ordered beta turn structure in solution. Two-dimensional NMR spectroscopy of two modified LAP tail peptides, in which the single tyrosine was substituted either by phenylalanine or by alanine, revealed that the tendency to form a beta turn is reduced by 25% in the phenylalanine-containing peptide and by approximately 50% in the alanine-containing mutant peptide. Our results suggest, that in the short cytoplasmic tail of LAP tyrosine is required for stabilization of the right turn and that the aromatic ring system of the tyrosine residue is a contact point to the putative cytoplasmic receptor.


Journal ArticleDOI
01 Feb 1992-Proteins
TL;DR: Five novel proteins were designed: Shpilka, a sandwich of two four‐stranded β‐sheets, a scaffold on which to explore variations in loop topology; Grendel, a four‐helical membrane anchor, ready for fusion to water‐soluble functional domains; Fingerclasp, a dimer of interdigitating β–β–α units, the simplest variant of the “handshake” structural class.
Abstract: What is the current state of the art in protein design? This question was approached in a recent two-week protein design workshop sponsored by EMBO and held at the EMBL in Heidelberg. The goals were to test available design tools and to explore new design strategies. Five novel proteins were designed: Shpilka, a sandwich of two four-stranded beta-sheets, a scaffold on which to explore variations in loop topology; Grendel, a four-helical membrane anchor, ready for fusion to water-soluble functional domains; Finger-clasp, a dimer of interdigitating beta-beta-alpha units, the simplest variant of the "handshake" structural class; Aida, an antibody binding surface intended to be specific for flavodoxin; Leather--a minimal NAD binding domain, extracted from a larger protein. Each design is available as a set of three-dimensional coordinates, the corresponding amino acid sequence and a set of analytical results. The designs are placed in the public domain for scrutiny, improvement, and possible experimental verification.

Book ChapterDOI
01 Jan 1992
TL;DR: Genetic sequences contain the basic instruction code of living systems — a basic book of life that needs to be deciphered, but the translation rules from the basic code to biological function is not yet fully known.
Abstract: Genetic sequences contain the basic instruction code of living systems — a basic book of life. The period 1992–2010 will see the deciphering of much of this information, in many organisms, including that of the human genome. Unfortunately, the code is written in biological assembler language and needs to be deciphered. The translation rules from the basic code to biological function is not yet fully known. Here, computational molecular biology is challenged to make major contributions. The potential benefits to medical science and biotechnology are huge.

Book ChapterDOI
TL;DR: HHV-6, which was originally isolated from peripheral blood lymphocytes of patients with AIDS and other lymphoproliferative disorders, was isolated from the blood and saliva of others, including healthy individuals and patients with other diseases such as chronic fatigue syndrome, bone marrow transplant recipients, roseola infantum, and autoimmune diseases.
Abstract: Publisher Summary HHV-6 is a herpesvirus—that is, an enveloped DNA virus with an icosahedral capsid made up of 162 capsomeres. This virus infects mainly cells of lymphocytic lineage. HHV-6, which was originally isolated from peripheral blood lymphocytes of patients with AIDS and other lymphoproliferative disorders, was isolated from the blood and saliva of others, including healthy individuals and patients with other diseases such as chronic fatigue syndrome, bone marrow transplant recipients, roseola infantum, and autoimmune diseases. In infected cells, HHV-6 is found at various stages of its morphogenesis. Its ultra structure and morphogenesis closely resemble those of cytomegalovirus. The chapter also presents diagrammatic representation of electron micrographs of HHV-6 infected cells, early stages of the internalization and uncoating of HHV-6 in lymphoblastic cells, morphogenesis of HHV-6-GS in HSB 2 cells, and low magnification view of a HSB 2 cell infected with HHV-6 GS.

Journal ArticleDOI
TL;DR: The quality of a multi-layered network predicting the secondary structure of proteins is improved substantially by using information about evolutionarily conserved amino acids, balancing the training dynamics, and combining uncorrelated networks in a jury.
Abstract: The quality of a multi-layered network predicting the secondary structure of proteins is improved substantially by: (i) using information about evolutionarily conserved amino acids (increase of overall accuracy by six percentage points), (ii) balancing the training dynamics (increase of accuracy for strand), and (iii) combining uncorrelated networks in a jury (increase two percentage points). In addition, appending a second level structure-to-structure network results in better reproduction of the length of secondary structure segments.

Journal ArticleDOI
TL;DR: The Graphics Command Interpreter (GCI) is an independent server module that can be interfaced to any program that needs interactive three-dimensional (3D) graphics capabilities and provides the user with facilities to manipulate the view of the displayed 3D objects interactively, independently of the master program.