scispace - formally typeset
Search or ask a question

Showing papers in "Protein Engineering in 1993"


Journal ArticleDOI
TL;DR: The ALSCRIPT program was developed specifically to allow the easy formatting and graphical display of large multiple alignments, and should be easy to learn by anyone familiar with plotting graphs.
Abstract: The ALSCRIPT program described in this article was developed specifically to allow the easy formatting and graphical display of large multiple alignments. Although written originally for the author's use, the interface is relatively friendly, and should be easy to learn by anyone familiar with plotting graphs

1,105 citations


Journal ArticleDOI
TL;DR: An optimized self-organizing map algorithm has been used to obtain protein topological (proteinotopic) maps and analysis of the proteinotopic map reveals that the network extracts the main secondary structure features even with the small number of examples used.
Abstract: An optimized self-organizing map algorithm has been used to obtain protein topological (proteinotopic) maps. A neural network is able to arrange a set of proteins depending on their ultraviolet circular dichroism spectra in a completely unsupervised learning process. Analysis of the proteinotopic map reveals that the network extracts the main secondary structure features even with the small number of examples used. Some methods to use the proteinotopic map for protein secondary structure prediction are tested showing a good performance in the 200-240 nm wavelength range that is likely to increase as new protein structures are known.

1,010 citations


Journal ArticleDOI
TL;DR: The effects of linker length on binding affinity and degree of aggregation have been examined in the antifluorescein 4-4-20 and anticarcinoma CC49 single-chain Fvs and a new linker sequence was designed in which a proline was placed at the C-terminal side of the proteolytic clip site in the 212 linker.
Abstract: The effects of linker length on binding affinity and degree of aggregation have been examined in the antifluorescein 4-4-20 and anticarcinoma CC49 single-chain Fvs. Longer linkers in the antifluorescein sFvs have higher affinities for fluorescein and aggregate less. A proteolytically susceptible site between Lys8 and Ser9, in the previoulsy reported 212 linker has been identified. A new linker sequence, 218 (GSTSGSGKPGSGEGSTKG) was designed in which a proline was placed at the C-terminal side of the proteolytic clip site in the 212 linker

342 citations


Journal ArticleDOI
TL;DR: The use of a novel kind of random peptide library for the stepwise engineering of a C-terminal fusion peptide which confers binding activity towards streptavidin is described in this study.
Abstract: The facile detection and purification of a recombinant protein without detailed knowledge about its individual biochemical properties constitutes a problem of general interest in protein engineering. The use of a novel kind of random peptide library for the stepwise engineering of a C-terminal fusion peptide which confers binding activity towards streptavidin is described in this study. Because of its widespread use as part of a variety of conjugates and other affinity reagents, streptavidin constitutes the binding partner of choice both for detection and purification purposes. The streptavidin-affinity tag was engineered at the C-terminus of the VH domain as part of the D1.3 Fv fragment which was functionally expressed in Escherichia coli. Irrespective of whether it was displayed by the VH or the VL domain, the optimized version of the affinity peptide termed 'Strep-tag' allowed the detection of the Fv fragment both on Western blots and in ELISAs by a streptavidin-alkaline phosphatase conjugate. In addition, the one-step purification of the intact Fv fragment carrying a single Strep-tag at the C-terminus of only one of its domains was achieved by affinity chromatography with streptavidin-agarose using very mild elution conditions.

325 citations


Journal ArticleDOI
TL;DR: A modification of the Metropolis Monte Carlo scheme in sequence space with an evolutionary temperature which sets the energy scale is proposed, implying that the design algorithm does not encounter multiple-minima problems and is very fast.
Abstract: We propose a simple algorithm to design a sequence which fits a given protein structure with a given energy. The algorithm is a modification of the Metropolis Monte Carlo scheme in sequence space with an evolutionary temperature which sets the energy scale. There is a one to one correspondence between this optimization scheme and the Ising model of ferromagnetism. This analogy implies that the design algorithm does not encounter multiple-minima problems and is very fast. The algorithm is tested by «predicting» the primary structures of four proteins. In each case the calculated primary structures had statistically significant homology with the natural structures

237 citations


Journal ArticleDOI
TL;DR: Analysis of sequence and structure conservation within the larger families shows the globins to be the most highly conserved family and the TIM barrels the most weakly conserved.
Abstract: We have developed a method for identifying fold families in the protein structure data bank. Pairwise sequence alignments are first performed to extract families of homologous proteins having 35% or more sequence identity. Representatives are selected with the best resolution and R-factor to give a nonhomologous data set. Subsequent structure comparisons between all members of this set detect homologous folds with low sequence identity but highly conserved structures. By softening the requirement on structural similarity, families of analogous proteins are obtained that have related folds but more diverse structures. Representatives are selected to give a non-analogous data set. Starting with 1410 chains from the Brookhaven Data Bank, we generate a set of 150 nonhomologous folds and a set of 112 non-analogous folds. Analysis of sequence and structure conservation within the larger families shows the globins to be the most highly conserved family and the TIM barrels the most weakly conserved.

225 citations


Journal ArticleDOI
TL;DR: A chemical filter is added to the ligand placement algorithm of the molecular docking program DOCK that ranks known inhibitors better than does matching based on shape alone and finds fewer physically unrealistic complexes without reducing the number of complexes resembling the known ligand-receptor configurations.
Abstract: We have added a chemical filter to the ligand placement algorithm of the molecular docking program DOCK. DOCK places ligands in receptors using local shape features. Here we label these shape features by chemical type and insist on complementary matches. We find fewer physically unrealistic complexes without reducing the number of complexes resembling the known ligand-receptor configurations. Approximately 10-fold fewer complexes are calculated and the new algorithm is correspondingly 10-fold faster than the previous shape-only matching. We tested the new algorithm's ability to reproduce three known ligand-receptor complexes: methotrexate in dihydrofolate reductase, deoxyuridine monophosphate in thymidylate synthase and pancreatic trypsin inhibitor in trypsin. The program found configurations within 1 A of the crystallographic mode, with fewer non-native solutions compared with shape-only matching. We also tested the program's ability to retrieve known inhibitors of thymidylate synthase and dihydrofolate reductase by screening molecular databases against the enzyme structures. Both algorithms retrieved many known inhibitors preferentially to other compounds in the database. The chemical matching algorithm generally ranks known inhibitors better than does matching based on shape alone.

215 citations


Journal ArticleDOI
TL;DR: Analysis by far UV circular dichroism spectroscopy suggests that all Z mutant proteins have similar folds, and these binding properties were used to compare the contribution of each mutated amino acid residue in the Fc interaction.
Abstract: The interactions have been studied between an IgG-binding domain derivative based on domain B of staphylococcal protein A (designated Z) and human immunoglobulin G class 1 (IgG1) and its Fc fragment (Fc1) respectively. Five single amino acid substituted mutant forms of Z were constructed at the gene level, produced intracellularly in Escherichia coli, purified to homogeneity and characterized. Four of these Z variants, designated Z(L17D), Z(N28A), Z(I31A) and Z(K35A), were mutated in residues suggested to be involved in binding, based on the three-dimensional structure of the complex between a one domain protein A molecule and Fc1 [Deisenhofer, J. (1981) Biochemistry, 20, 2361-2370]. The fifth mutant protein, Z(F30A), had a mutation in a phenylalanine residue which was not expected to be involved in the interaction. Analysis by far UV circular dichroism spectroscopy suggests that all Z mutant proteins have similar folds. Their respective binding to human monoclonal IgG1 and to human recombinant Fc1 were studied in a competitive binding assay using radioactively labeled Z as a tracer, demonstrating that the mutant proteins with a substitution in the postulated binding surface showed a weakened binding to both the full-length antibody and the recombinant Fc1. The affinity constants of the interactions as well as relative binding free energies from the parent Z molecule were calculated. These values were similar for each Z variant to both IgG1 and Fc1, suggesting that Fc and not Fab binding was measured also for IgG1. However, the binding strengths differ significantly, and these binding properties were used to compare the contribution of each mutated amino acid residue in the Fc interaction.(ABSTRACT TRUNCATED AT 250 WORDS)

187 citations


Journal ArticleDOI
TL;DR: The results, obtained without using the sequence order of the chains, confirm published structural analogies that use sequence-dependent techniques and extend previous analogies by detecting geometrically equivalent out-of-sequential-order structural elements which cannot be obtained by current techniques.
Abstract: A detailed description of an efficient approach to comparison of protein structures is presented. Given the 3-D coordinate data of the structures to be compared, the system automatically identifies every region of structural similarity between the structures without prior knowledge of an initial alignment. The method uses the geometric hashing technique which was originally developed for model-based object recognition problems in the area of computer vision. It exploits a rotationally and translationally invariant representation of rigid objects, resulting in a highly efficient, fully automated tool. The method is independent of the amino acid sequence and, thus, insensitive to insertions, deletions and displacements of equivalent substructures between the molecules being compared. The method described here is general, identifies 'real' 3-D substructures and is not constrained by the order imposed by the primary chain of the amino acids. Typical structure comparison problems are examined and the results of the new method are compared with the published results from previous methods. These results, obtained without using the sequence order of the chains, confirm published structural analogies that use sequence-dependent techniques. Our results also extend previous analogies by detecting geometrically equivalent out-of-sequential-order structural elements which cannot be obtained by current techniques.

153 citations


Journal ArticleDOI
TL;DR: A consensus assignment is proposed where each residue is assigned to the state determined by at least two of the three methods, so that the artefacts of each algorithm are attenuated and the success rate of prediction methods more accurately.
Abstract: Accurate assignments of secondary structures in proteins are crucial for a useful comparison with theoretical predictions. Three major programs which automatically determine the location of helices and strands are used for this purpose, namely DSSP, P-Curve and Define. Their results have been compared for a non-redundant database of 154 proteins. On a residue per residue basis, the percentage match score is only 63% between the three methods. While these methods agree on the overall number of residues in each of the three states (helix, strand or coil), they differ on the number of helices or strands, thus implying a wide discrepancy in the length of assigned structural elements. Moreover, the length distribution of helices and strands points to the existence of artefacts inherent to each assignment algorithm. To overcome these difficulties a consensus assignment is proposed where each residue is assigned to the state determined by at least two of the three methods. With this assignment the artefacts of each algorithm are attenuated. The residues assigned in the same state by the three methods are better predicted than the others. This assignment will thus be useful for analysing the success rate of prediction methods more accurately.

129 citations


Journal ArticleDOI
TL;DR: Alignment of the sequence to be modelled with each group of homologues facilitates identification of structurally conserved regions of the unknown and leads to an improved model.
Abstract: A 3-D model of a protein can be constructed from its amino acid sequence and the 3-D structures of one or more homologues by annealing three sets of fragments: the structurally conserved regions, structurally variable regions and the side chains. The method encoded in the computer program COMPOSER was assessed by generating 3-D models of eight proteins whose crystal structures are already known and for which 3-D structures of homologues are available. In the structurally conserved regions, differences between modelled and X-ray structures are smaller than the differences between the X-ray structures of the modelled protein and the homologues used to build the model. When several homologues are used, the contributions of the known structures are weighted, preferably by the square of sequence similarity; this is especially important when the similarities of the homologues to the modelled structure differ greatly. The 'collar' extension approach, in which a similar region of different length in a homologue is used to extend the framework, can result in a more accurate model. If known homologues comprise more than one related group of proteins and they are both distantly related to the unknown, then alignment of the sequence to be modelled with each group of homologues facilitates identification of structurally conserved regions of the unknown and leads to an improved model. Models have root mean square differences (r.m.s.d.s) with the structures defined by X-ray analysis of between 0.73 and 1.56 A for all C alpha atoms, for seven the eight models. For the model of mucor pepsin, where the closest homologue has 33% sequence identity and 20% of the residues are in structurally variable regions, the r.m.s.d. for the framework region is 1.71 A and the r.m.s.d. for all C alpha atoms is 3.47 A.

Journal ArticleDOI
TL;DR: Surprisingly, a few of the reshaped human C21 antibodies exhibited patterns of binding and affinities that were essentially identical to those of mouse C21 antibody.
Abstract: Mouse mAb TES-C21 (C21) recognizes an epitope on human IgE and, therefore, has potential as a therapeutic agent in patients with IgE-mediated allergies such as hay fever, food and drug allergies and extrinsic asthma. The clinical usefulness of mouse antibodies is limited, however, due to their immunogenicity in humans. Mouse C21 antidoby was humanized by complementarity detemining region (CDR) grafting with the aim of developing an effective and safe therapeutic for the treatment of IgE-mediated allergies

Journal ArticleDOI
TL;DR: This work has used recursive ensemble mutagenesis (REM) to simultaneously mutate six amino acid residues in a model protein, and found that one iteration of REM yielded a 30-fold increase in the frequency of 'positive' mutants.
Abstract: We have developed a generally applicable experimental procedure to find functional proteins that are many mutational steps from wild type. Optimization algorithms, which are typically used to search for solutions to certain combinatorial problems, have been adapted to the problem of searching the 'sequence space' of proteins. Many of the steps normally performed by a digital computer are embodied in this new molecular genetics technique, termed recursive ensemble mutagenesis (REM). REM uses information gained from previous iterations of combinatorial cassette mutagenesis (CCM) to search sequence space more efficiently. We have used REM to simultaneously mutate six amino acid residues in a model protein. As compared to conventional CCM, one iteration of REM yielded a 30-fold increase in the frequency of 'positive' mutants. Since a multiplicative factor of similar magnitude is expected for the mutagenesis of additional sets of six residues, performing REM on 18 sites is expected to yield an exponential (30,000-fold) increase in the throughput of positive mutants as compared to random [NN(G,C)]18 mutagenesis.

Journal ArticleDOI
TL;DR: Primary structure predictions can be reliably improved using alignments from an automatic alignment procedure with a mean increase of 6.8%, giving an overall prediction accuracy of 68.5%, if there is a minimum of 25% sequence identity between all sequences in a family.
Abstract: The use of multiple sequence alignments for secondary structure predictions is analysed. Seven different protein families, containing only sequences of known structure, were considered to provide a range of alignment and prediction conditions. Using aligments obtained by spatial superposition of main chain atoms in known tertiary protein structures allowed a mean of 8% in secondary structure prediction accuracy, when compared to those obtained from the individual sequences. Substitution of these alignments by those determined directly from an automated sequence alignment algorithm showed variations in the prediction accuracy which correlated with the quality of the multiple alignments and distance of the primary sequence

Journal ArticleDOI
TL;DR: A molecular model of the M2 channel is presented in which a bundle of four parallel M2 transbilayer helices surrounds a central ion-permeable pore, which provides a molecular model for amantadine-H+ block of M2 channels.
Abstract: The influenza A M2 protein forms cation-selective ion channels which are blocked by the anti-influenza drug amantadine. A molecular model of the M2 channel is presented in which a bundle of four parallel M2 transbilayer helices surrounds a central ion-permeable pore. Analysis of helix amphipathicity was used to aid determination of the orientation of the helices about their long axes. The helices are tilted such that the N-terminal mouth of the pore is wider than the C-terminal mouth. The channel is lined by residues V27, S31 and I42. Residues D24 and D44 are located at opposite mouths of the pore, which is narrowest in the vicinity of I42. Energy profiles for interaction of the channel with Na+, amantadine-H+ and cyclopentylamine-H+ are evaluated. The interaction profile for Na+ exhibits three minima, one at each mouth of the pore, and one in the region of residue S31. The amantadine-H+ profile exhibits a minimum close to S31 and a barrier near residue I42. This provides a molecular model for amantadine-H+ block of M2 channels. The profile for cyclopentylamine-H+ does not exhibit such a barrier. It is predicted that cyclopentylamine-H+ will not act as an M2 channel blocker.

Journal ArticleDOI
TL;DR: The catalytic subunit of mouse cAMP-dependent protein kinase expressed in Escherichia coli was separated into three distinct species using Mono-S ion exchange chromatography and the differences between the isozymes were shown to be due to phosphorylation, with each form differing by 80 mass units corresponding to a single phosphate.
Abstract: The catalytic subunit of mouse cAMP-dependent protein kinase expressed in Escherichia coli was separated into three distinct species using Mono-S ion exchange chromatography. These isoenzymes corresponded to three isoelectric variants with pIs of 6.4 (30%), 7.2 (60%) and 8.2 (10%). The Stokes' radius of each form was 27.7, 27.1 and 26.3 A respectively. Using electrospray mass spectroscopy the differences between the isozymes were shown to be due to phosphorylation, with each form differing by 80 mass units corresponding to a single phosphate. The fully phosphorylated recombinant enzyme contained four phosphates while the dominant isozyme contained only three. Since the enzyme is not phosphorylated when active site mutations are introduced into the C-subunit, these phosphates are incorporated in an autocatalytic mechanism and are not due to E. coli protein kinases. When the recombinant enzyme was compared with the mammalian porcine heart enzyme significant differences in post-translational modifications were observed. The mammalian enzyme could also be separated into two isozymes. However, in contrast to the recombinant enzyme, the mammalian isozymes displayed an identical mass of 40 840. This correlated with two different post-translational modifications: two phosphates and an N-terminal myristyl moiety. The importance of post-translational modifications, and in particular the phosphorylation state, for the expression of eukaryotic proteins in E. coli is discussed.

Journal ArticleDOI
TL;DR: A new approach is described for the modeling of transmembrane seven helix bundles based on statistically derived environmental preference parameters combined with experimentally determined features of the receptors to create a model for the human beta 2-adrenoreceptor.
Abstract: Transmembrane seven helix bundles form a large family of membrane inserted receptors and are responsible for a wide range of biological functions. Experimental data suggest that their overall structure is similar to bacteriorhodopsin. We describe here a new approach for the modeling of transmembrane seven helix bundles based on statistically derived environmental preference parameters combined with experimentally determined features of the receptors. The method was used to create a model for the human beta 2-adrenoreceptor. This model is physically plausible, is in reasonable agreement with experimental data and may be helpful in planning new receptor engineering experiments.

Journal ArticleDOI
TL;DR: A method is presented that applies structural information from the protein data bank to the ab initio design and characterization of novel metal binding sites in antibody structures, confirming the predictive power of the method.
Abstract: The rational engineering of novel functions into proteins can only be attempted when the underlying structural scaffold on which the new function is displayed and the structure of the target protein are both well understood. To introduce functions mediated by metals it is therefore necessary to identify the principal liganding residues for the chosen metal, the required architecture of the metal-ligand complex and sites within the target protein that could accommodate such sites. Here we present a method that applies structural information from the protein data bank to the ab initio design and characterization of novel metal binding sites. The prediction method has been tested on 28 metalloprotein structures from the Brookhaven Protein Data Bank. It successfully identified > 90% of the metal binding sites. In addition, we have used the method to design and characterize zinc binding sites in two antibody structures. Metal binding studies on one of these putative metalloantibodies showed metal binding, confirming the predictive power of the method.

Journal ArticleDOI
TL;DR: An evaluation function composed of four terms, side chain packing, hydration, hydrogen bonding and local conformation potentials, which were empirically derived from 101 proteins of known structure successfully discriminated truly homologous sequence pairs from non-homologous proteins even when the sequence similarities were very weak.
Abstract: Recent approaches to the 3-D - 1-D compatibility problem have tried to predict protein 3-D structure from sequence. One or the critical factors in this issue is the evaluation of fitness between a given 3-D structure and any sequence mounted on it. We have developed an evaluation function composed of four terms, side chain packing, hydration, hydrogen bonding and local conformation potentials, which were empirically derived from 101 proteins of known structure. The efficiency of the evaluation function was tested in two ways. In the first test, the sequence of protein A is mounted (without gaps) on the structure of protein B which is greater in size than A

Journal ArticleDOI
TL;DR: It is shown that components of the free energy change can be highly sensitive to the computational details of the simulation leading to the conclusion that free energy calculations cannot currently be used to reliably predict protein stability.
Abstract: The use of free energy simulation techniques in the study of protein stability is critically evaluated. Results from two simulations of the thermostability mutation Asn218 to Ser218 in Subtilisin are presented. It is shown that components of the free energy change can be highly sensitive to the computational details of the simulation leading to the conclusion that free energy calculations cannot currently be used to reliably predict protein stability. The different factors that undermine the reliability are discussed.

Journal ArticleDOI
TL;DR: A theorem is formulated which proves that contrary to intuition dead-end rotamer pairs cannot simply be discarded from consideration in the iterative process leading to the further elimination of dead- end rotamers.
Abstract: Recently it has been shown that the dead-end elimination theorem is a powerful tool in the search for the global minimum energy conformation (GMEC) of a large collection of protein side chains given known backbone coordinates and a library of allowed side chain conformational states, also known as rotamers. A side chain placement algorithm based on this theorem iteratively applies this theorem to single as well as to pairs of rotamers leading to the identification of rotamers, single or pairs, that are incompatible with the GMEC and that can thus be qualified as 'dead-ending'. Here we formulate a theorem which proves that contrary to intuition dead-end rotamer pairs cannot simply be discarded from consideration in the iterative process leading to the further elimination of dead-end rotamers. We refer to this theorem as the fuzzy-end elimination theorem. We also describe how the obtained dead-end rotamer pairs can contribute to the search for the GMEC in the protein side chain placement problem. Hence the present work forms a theoretical basis for the correct implementation of a side chain placement algorithm based on the dead-end elimination theorem. In addition, possible future perspectives are presented.

Journal ArticleDOI
TL;DR: This work has classified Greek keys, based on their hydrogen bonding patterns, into three groups with similar three-dimensional structures, and shows the variability of secondary structure segment length and sequences of Greek keys even within one class.
Abstract: The Greek key is a very common structural motif in proteins. It has been traditionally defined as four beta-strands with '+3,-1,-1' topology. This definition encompasses motifs with several different three-dimensional structures. We have classified Greek keys, based on their hydrogen bonding patterns, into three groups with similar three-dimensional structures. All examples of Greek keys in each of these classes have been automatically extracted using a set of programs. Analysis of these examples shows the variability of secondary structure segment length and sequences of Greek keys even within one class. This variability suggests that no single folding pathway is likely to fit all Greek key structures.

Journal ArticleDOI
TL;DR: The description of protein structure in the language of side chain contact maps is shown to offer many advantages over more traditional approaches, and alignments based on contact map overlaps are a powerful alternative to other structure-based alignments.
Abstract: The description of protein structure in the language of side chain contact maps is shown to offer many advantages over more traditional approaches. Because it focuses on side chain interactions, it aids in the discovery, study and classification of similarities between interactions defining particular protein folds and offers new insights into the rules of protein structure. For example, there is a small number of characteristic patterns of interactions between protein supersecondary structural fragments, which can be seen in various non-related proteins. Furthermore, the overlap of the side chain contact maps of two proteins provides a new measure of protein structure similarity. As shown in several examples, alignments based on contact map overlaps are a powerful alternative to other structure-based alignments.

Journal ArticleDOI
TL;DR: The instabilities of the native structures of mutant proteins with an amino acid exchange are estimated by using the contact energy and the number of contacts for each type of amino acid pair to evaluate a transition probability matrix of codon substitutions and a log relatedness odds matrix, which is used as a scoring matrix to measure the similarity between protein sequences.
Abstract: The instabilities of the native structures of mutant proteins with an amino acid exchange are estimated by using the contact energy and the number of contacts for each type of amino acid pair, which were estimated from 18,192 residue-residue contacts observed in 42 crystals of globular proteins. They were then used to evaluate a transition probability matrix of codon substitutions and a log relatedness odds matrix, which is used as a scoring matrix to measure the similarity between protein sequences. To consider amino acid substitutions in homologous proteins, base mutation rates and the effects of the genetic code are also taken into account. The average fitness of an amino acid exchange is approximated to be proportional to the structural stability of the mutant protein, which is then approximated by the average energy change of the protein native structure expected for the amino acid exchange with neglect of the energy change of the denatured state. In global and local homology searches, this scoring matrix tends to yield significantly higher alignment scores than either the unitary matrix or the genetic code matrix, and also may yield higher alignment scores for distantly related protein pairs than MDM78. One of advantages of this scoring matrix is that the equilibrium frequencies of codons and also base mutation rates can be adjusted.

Journal ArticleDOI
TL;DR: It is concluded that on the level of secondary structure, there is no practical advantage in training on two states, especially given the added margin of error in identifying the structural class of a protein.
Abstract: Can secondary structure prediction be improved by prediction rules that focus on a particular structural class of proteins? To help answer this question, we have assessed the accuracy of prediction for all-helical proteins, using two conceptually different method and two levels of description. An overall two-state single-residue accuracy of ∼80% can be obtained by a neural network, no matter whether it is trained on two states (helix and non-helix) or first trained on three states (helix, strand and loop) and then evaluated on two states. For four test proteins, this is similar to the accuracy obtained with inductive logic programming

Journal ArticleDOI
TL;DR: The results suggest that foreign proteins tagged with the duplicated segment could be incorporated into the cellulosome in order to modify the enzymatic properties of the complex.
Abstract: The DNA sequence encoding the duplicated 22 amino acid segment of Clostrum thermocellum endoglucanase CelD was fused to the 3'-terminus of the celC gene encoding C.thermocellum endoglucanase CelC. The presence of the duplicated segment endowed CelC with the capacity to form cytoplasmic inclusion bodies containing active enzyme when the hybrid gene was expressed in Escherichia coli. Inclusion body formation prevented proteolytic cleavage of the duplicated segment. The intact hybrid protein CelC-Cel'D was purified from inclusion bodies and characterized. In contrast to CelC, CelC-Cel'D was able to bind to CipA, a protein acting as a scholding component of the C.thermocellum cellulase complex (cellulosome)


Journal ArticleDOI
TL;DR: The investigation suggests that discriminating power is improved in the fingerprint approach because the recognition of individual features is made mutually conditional, and members of protein families possessing all or only part of the fingerprint may be identified.
Abstract: A systematic method for designing discriminating protein sequence fingerprints is described. The approach used is iterative, and diagnostic performance is evaluated in terms of the relative abilities of sequences to match with individual elements of the fingerprint. The method allows complete protein folds to be characterized in terms of a number of separate 'features', without the requirement to define specific intervals between them, and is described here with reference to the derivation of a fingerprint for G-protein-coupled receptors: this comprises the seven hydrophobic regions shown by protein chemistry approaches to be membrane-spanning. The fingerprint is potently diagnostic of all sequences of this type in the database in which it was derived (the OWL composite sequence database, version 8.1), and has continued to perform well on subsequent database updates, identifying 240 receptors in OWL17.0. Results are compared with a commonly used pattern template for this class of receptors. The investigation suggests that discriminating power is improved in the fingerprint approach because the recognition of individual features is made mutually conditional. Furthermore, by avoiding the definition of predetermined feature separations, members of protein families possessing all or only part of the fingerprint may be identified.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the number of topologies compatible with a six-stranded antiparallel beta-sandwich and found that the majority of these topologies are right-handed.
Abstract: Chain topology in beta-structured protein domains and handedness associated with it are discussed. Previously, other workers have shown that by considering just two restrictions--structures that are left-handed and/or have loops that cross can be disregarded--the number of topologies associated with such structures is expected to be severely limited. By way of example, we determine the number of topologies compatible with a six-stranded antiparallel beta-sandwich. Without restriction on the type of strand-strand connection allowed but with elimination of symmetry related structures 360 topologies are possible. If connections between parallel strands are disqualified the number is reduced, 10-fold, to 36. The figure is cut to 24 when structures with loop crossings are eliminated. Handedness in these structures is examined in detail and from this a rationale for the observed predominance of right-handed forms of beta-structures is presented. The 24 structures can be considered as a set of right- and left-handed pairs of 12 topologies. All but two of these pairs can be assigned hands on the basis of existing rules. Six of the structures are found to occur in the Brookhaven Protein Databank and all are right-handed. This study provides a basis for protein design projects which might, for example, attempt the synthesis of unobserved protein topologies. Of the 24 structures in the final set eight are examples of the classic Greek key fold. Thus, the predominance of this motif among all-beta proteins can be attributed in part to these topological constraints. The possible physicochemical origins of the structural selection rules and additional factors which might contribute to the particular favourability of certain structures are also explored.

Journal ArticleDOI
TL;DR: A structure-function analysis of the icosahedral RNA bacteriophage fr coat protein (CP) assembly was undertaken using linker-insertion, deletion and substitution mutagenesis to determine the relative contributions of particular fr CP domains in maintenance of capsid structural integrity as well as the possible capsid assembly mechanism.
Abstract: A structure-function analysis of the icosahedral RNA bacteriophage fr coat protein (CP) assembly was undertaken using linker-insertion, deletion and substitution mutagenesis. Mutations were specifically induced into either pre-existing or artificially created restriction enzyme sites within fr CP gene expressed in Escherichia coli from a recombinant plasmid. This directs synthesis of wild type protein that undergoes self-assembly and forms capsid-like particles indistinguishable morphologically and immunologically from native phage particles