scispace - formally typeset
Search or ask a question

Showing papers by "Toby J. Gibson published in 1996"


Book ChapterDOI
TL;DR: It is argued that using one weight matrix and two gap penalties is too simplistic to be of general use in the most difficult cases and a large number of new parameters designed primarily to help encourage gaps in loop regions are replaced.
Abstract: We have tested CLUSTAL W in a wide variety of situations, and it is capable of handling some very difficult protein alignment problems. If the data set consists of enough closely related sequences so that the first alignments are accurate, then CLUSTAL W will usually find an alignment that is very close to ideal. Problems can still occur if the data set includes sequences of greatly different lengths or if some sequences include long regions that are impossible to align with the rest of the data set. Trying to balance the need for long insertions and deletions in some alignments with the need to avoid them in others is still a problem. The default values for our parameters were tested empirically using test cases of sets of globular proteins where some information as to the correct alignment was available. The parameter values may not be very appropriate with nonglobular proteins. We have argued that using one weight matrix and two gap penalties is too simplistic to be of general use in the most difficult cases. We have replaced these parameters with a large number of new parameters designed primarily to help encourage gaps in loop regions. Although these new parameters are largely heuristic in nature, they perform surprisingly well and are simple to implement. The underlying speed of the progressive alignment approach is not adversely affected. The disadvantage is that the parameter space is now huge; the number of possible combinations of parameters is more than can easily be examined by hand. We justify this by asking the user to treat CLUSTAL W as a data exploration tool rather than as a definitive analysis method. It is not sensible to automatically derive multiple alignments and to trust particular algorithms as being capable of always getting the correct answer. One must examine the alignments closely, especially in conjunction with the underlying phylogenetic tree (or estimate of it) and try varying some of the parameters. Outliers (sequences that have no close relatives) should be aligned carefully, as should fragments of sequences. The program will automatically delay the alignment of any sequences that are less than 40% identical to any others until all other sequences are aligned, but this can be set from a menu by the user. It may be useful to build up an alignment of closely related sequences first and to then add in the more distant relatives one at a time or in batches, using the profile alignments and weighting scheme described earlier and perhaps using a variety of parameter settings. We give one example using SH2 domains. SH2 domains are widespread in eukaryotic signalling proteins where they function in the recognition of phosphotyrosine-containing peptides. In the chapter by Bork and Gibson ([11], this volume), Blast and pattern/profile searches were used to extract the set of known SH2 domains and to search for new members. (Profiles used in database searches are conceptually very similar to the profiles used in CLUSTAL W: see the chapters [11] and [13] for profile search methods.) The profile searches detected SH2 domains in the JAK family of protein tyrosine kinases, which were thought not to contain SH2 domains. Although the JAK family SH2 domains are rather divergent, they have the necessary core structural residues as well as the critical positively charged residue that binds phosphotyrosine, leaving no doubt that they are bona fide SH2 domains. The five new JAK family SH2 domains were added sequentially to the existing alignment of 65 SH2 domains using the CLUSTAL W profile alignment option. Figure 6 shows part of the resulting alignment. Despite their divergent sequences, the new SH2 domains have been aligned nearly perfectly with the old set. No insertions were placed in the original SH2 domains. In this example, the profile alignment procedure has produced better results than a one-step full alignment of all 70 SH2 domains, and in considerably less time. (ABSTRACT TRUNCATED)

1,654 citations



Journal ArticleDOI
19 Apr 1996-Cell
TL;DR: This work presents the three-dimensional solution structure of the KH module, a sequence motif found in a number of proteins that are known to be in close association with RNA, and suggests a potential surface for RNA binding centered on the loop between the first two helices.

280 citations


Journal ArticleDOI
TL;DR: A dynamic alignment algorithm is reported here which compares a protein profile (a residue scoring matrix for one or more aligned sequences) against the three translation frames of a DNA strand, allowing frameshifting.
Abstract: DNA translation frames can be disrupted for several reasons, including: (i) errors in sequence determination; (ii) RNA processing, such as intron removal and guide RNA editing; (iii) less commonly, polymerase frameshifting during transcription or ribosomal frameshifting during translation. Frameshifts frequently confound computational activities involving homologous sequences, such as database searches and inferences on structure, function or phylogeny made from multiple alignments. A dynamic alignment algorithm is reported here which compares a protein profile (a residue scoring matrix for one or more aligned sequences) against the three translation frames of a DNA strand, allowing frameshifting. The algorithm has been incorporated into a new package, WiseTools, for comparison of biological sequences. A protein profile can be compared against either a DNA sequence or a protein sequence. The program PairWise may be used interactively for alignment of any two sequence inputs. SearchWise can perform combinations of searches through DNA or protein databases by a protein profile or DNA sequence. Routine application of the programs has revealed a set of database entries with frameshifts caused by errors in sequence determination.

159 citations


Book ChapterDOI
TL;DR: This chapter discusses the main strategies currently in use in applying motif and profile searches, making clear both their powers and pitfalls, and demonstrates their usage with two well-known domains as examples.
Abstract: Publisher Summary This chapter discusses the main strategies currently in use in applying motif and profile searches, making clear both their powers and pitfalls, and demonstrates their usage with two well-known domains as examples. There are numerous examples in which predictions based on motif and profile searches were useful as guides in further research while themselves being verified by various experimental approaches. In the hope of predicting a function for a protein under study, fast homology search programs are almost universally used. The current standard seems to be the basic local alignment search tool (BLAST) series of programs, accessible via several World Wide Web (WWW) servers. These programs undertake a database search for a query sequence and are usually the sole search undertaken. Two additional points should be noted, such as soon all sequences will have homologs in public databases and motif and profile search methods are being actively developed at a number of institutions and can be expected to be significantly improved. As a result these methods will continue to be very valuable tools.

158 citations


Journal ArticleDOI
TL;DR: Clues to the function of frataxin are provided by the mitochondrial location, a clinically similar ataxia with vitamin E deficiency, and certain neuropathies with mitochondrial DNA instability caused by mutations in nuclear genes.

140 citations


Journal ArticleDOI
TL;DR: Genetic Data Environment (GDE) analysis of the coiled coil repeat regions of dystrophin and utrophin in comparison with the spectrins has allowed us to compare the structural arrangement of theCoiled coil structures in these proteins.
Abstract: Dystrophin and utrophin form flexible links between the actin cytoskeleton and the cell membrane. Utrophin is found in all cell types whilst dystrophin expression is restricted to muscle and neuronal tissues only. Mutations in the dystrophin gene cause a break in this protein link leading to membrane damage and cell death as typified by the x-linked myopathies, Ouchenne and Becker muscular dystrophies. The N-terminal regions of dystrophin and utrophin are anchored to the actin cytoskeleton while their C termini are linked to a group of integral transmembrane proteins. The central repeating coiled coil region is believed to form a flexible shock-absorber function separating the N and C termini and effectively protecting the cell membrane from being damaged by the underlying contractile machinery. The high degree of sequence and predicted structural and functional similarity between dystrophin and utrophin and other members of the spectrin family of proteins has lead to the assumption that dystrophin and utrophin associate with themselves in a manner similar to that seen with aand pspectrin and a-actinin, namely, as antiparallel dimers. Genetic Data Environment (GDE) analysis of the coiled coil repeat regions of dystrophin and utrophin in comparison with the spectrins [ l ] has allowed us to compare the structural arrangement of the coiled coil structures in these proteins. Fourier analysis of dystrophin repeats [2] revealed a 'one long, one short' helix arrangement rather than the triple helical bundle originally proposed by Speicher and Marchesi [3], with the three helix repeat being formed by the nested

10 citations