scispace - formally typeset
Search or ask a question

Showing papers in "Bioinformatics in 1993"


Journal ArticleDOI
TL;DR: An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment that simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.
Abstract: An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment. The new algorithm allows questions important in the design of mutagenesis experiments to be quickly answered since positions in the alignment that show unusual or interesting residue substitution patterns may be rapidly identified. The strategy is based on a flexible set-based description of amino acid properties, which is used to define the conservation between any group of amino acids. Sequences in the alignment are gathered into subgroups on the basis of sequence similarity, functional, evolutionary or other criteria. All pairs of subgroups are then compared to highlight positions that confer the unique features of each subgroup. The algorithm is encoded in the computer program AMAS (Analysis of Multiply Aligned Sequences) which provides a textual summary of the analysis and an annotated (boxed, shaded and/or coloured) multiple sequence alignment. The algorithm is illustrated by application to an alignment of 67 SH2 domains where patterns of conserved hydrophobic residues that constitute the protein core are highlighted. The analysis of charge conservation across annexin domains identifies the locations at which conserved charges change sign. The algorithm simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.

649 citations


Journal ArticleDOI
TL;DR: GEPASI allows the automatic generation of a sequence of simulations with different combinations of parameter values, effectively scanning a hyper-solid in parameter space, which allows for both very quick and systematic study of biochemical pathway models.
Abstract: GEPASI is a software system for modelling chemical and biochemical reaction networks on computers running Microsoft Windows. For any system of up to 45 metabolites and 45 reactions, each with any user-defined or one of 35 predefined rate equations, one can produce trajectories of the metabolite concentrations and obtain a steady state (if it does exist). When steady-state solutions are produced, elasticity and control coefficients, as defined in metabolic control analysis, are calculated. GEPASI also allows the automatic generation of a sequence of simulations with different combinations of parameter values, effectively scanning a hyper-solid in parameter space. Together with the ability to produce user-defined columnar data files, these features allow for both very quick and systematic study of biochemical pathway models. The source code (in C) is available on request from the author, and while the user interface is dependent on having MS-Windows as the operating system, the numerical part is portable to other operating systems. GEPASI is suitable both for research and educational purposes. Although GEPASI was written with biochemical pathways in mind, it can equally be used to stimulate other dynamical systems.

544 citations


Journal ArticleDOI
TL;DR: A package of programs (run by a management program called TREECON) was developed for the construction and drawing of evolutionary trees and the modules TREE, ROOT and DRAW are applicable to any kind of dissimilarity matrix.
Abstract: A package of programs (run by a management program called TREECON) was developed for the construction and drawing of evolutionary trees. The program MATRIX calculates dissimilarity values and can perform boostrap analysis on nucleic acid sequences. TREE implements different evolutionary tree constructing methods based on distance matrices. Because some of these methods produce unrooted evolutionary trees, a program ROOT places a root on the tree. Finally, the program DRAW draws the evolutionary tree, changes its size or topology, and produces drawings suitable for publication. Whereas, MATRIX is suited only for nucleic acids, the modules TREE, ROOT and DRAW are applicable to any kind of dissimilarity matrix. The programs run on IBM-compatible microcomputers using the DOS operating system.

419 citations


Journal ArticleDOI
TL;DR: SRS (Sequence Retrieval System) is an information indexing and retrieval system designed for libraries with a flat file format such as the EMBL nucleotide sequencedatabank, the SwissProt protein sequence databank or the Prosite library of protein subsequence consensus patterns.
Abstract: SRS (Sequence Retrieval System) is an information indexing and retrieval system designed for libraries with a flat file format such as the EMBL nucleotide sequence databank, the SwissProt protein sequence databank or the Prosite library of protein subsequence consensus patterns. SRS supports the data structure of these libraries by providing special indices for implementing lists of subentities (e.g. feature tables) or hierarchically structured data-fields (e.g. taxonomic classification). A language (ODD) has been designed for the convenient specification of library format and organization, representation of individual data-fields within the system (design of indices) and structuring other data needed during retrieval. This ensures flexibility required for coping with different library formats, which are subject to continuous change. Queries and inspection of retrieved entries can be performed from a user interface with pull-down menus and windows. SRS supports various input and output formats but is particularly well adapted to the GCG programs.

300 citations


Journal ArticleDOI
TL;DR: Although DCSE can be used on protein sequence alignments, it is especially targeted at the examination of RNA because it uses a different approach towards editing.
Abstract: DCSE provides a user-friendly package for the creation and editing of sequence alignments. The program runs on different platforms, including microcomputers and workstations. Apart from available hardware, the program is not limited in the size of the alignment it can handle. It deviates more from classical text editors than other available sequence editors because it uses a different approach towards editing. It shifts characters or entire blocks of aligned characters, rather than inserting or deleting gaps in the sequences. Alignment of a new sequence to an existing alignment is partly automated. Although DCSE can be used on protein sequence alignments, it is especially targeted at the examination of RNA. The secondary structure for every sequence can be incorporated easily in the alignment. DCSE also has extensive built-in support for finding and checking secondary structure elements. A sophisticated system of markers allows notation of special positions in an alignment. This system can be used to store information such as the position of hidden breaks, introns and tertiary structure interactions.

244 citations


Journal ArticleDOI
TL;DR: The results show that there may exist weak pairwise correlations within the signals and that the proposed weight array method can help to better discriminate these signals.
Abstract: A new method of sequence analysis, using a weight array method (WAM), which generalizes the traditional Staden weight matrix method (WMM), is proposed. With the help of a statistical mechanical model, the discriminant function is identified with the energy function describing macromolecular interactions. The method is applied to the study of 5'-splice signals in Schizosaccharomyces pombe pre-mRNA sequences. The results show that there may exist weak pairwise correlations within the signals and that our method can help to better discriminate these signals. Experiments are proposed to test the predictions of the theory.

191 citations


Journal ArticleDOI
TL;DR: A graphic program has been developed to calculate the secondary structure content of proteins from their circular dichroism spectrum using a reference model for alpha-helix, beta-sheet and beta-turn.
Abstract: A graphic program has been developed to calculate the secondary structure content of proteins from their circular dichroism spectrum. All the information concerning analysis and results are given on a single screen. The actual and the theoretical spectra are plotted to allow visual inspection of the fit quality. The percentages of secondary structure and statistical parameters (r.m.s., residuals) are provided. The program is fully interactive for spectra analysis. Moreover, cursors driven by a mouse or arrow keys are moveable onto spectra yielding all the information concerning a given wavelength, such as the theoretical and experimental ellipticities, wavelength, values of reference model for alpha-helix, beta-sheet and beta-turn. Interfaces are provided for the CONTIN program of Provencher and Glockner.

191 citations


Journal ArticleDOI
TL;DR: The MASCOT multiple alignment system is developed, which can sustain the reliability of alignment even when the similarity of sequences is low, and achieves high-quality alignment by employing three-way alignment in addition to two- way alignment.
Abstract: A multiple alignment methodology that can produce high-quality alignment is extremely important for predicting the structure of unknown proteins. Nearly all the methodologies developed so far have employed two-way alignment only. Although these methods are fast, the alignments they produce lose reliability as the similarity of sequences reduces. We developed the MASCOT multiple alignment system. MASCOT can sustain the reliability of alignment even when the similarity of sequences is low. MASCOT achieves high-quality alignment by employing three-way alignment in addition to two-way alignment. The resultant alignments are refined by simulated annealing to higher quality. We also use a cluster analysis of sequences to produce highly reliable alignments.

174 citations


Journal ArticleDOI
TL;DR: A new method for analyzing the amino acid sequences of proteins using the hidden Markov model (HMM), which is a type of stochastic model that is 'without grammar' (no rule for the appearance patterns of secondary structure).
Abstract: The purpose of this paper is to introduce a new method for analyzing the amino acid sequences of proteins using the hidden Markov model (HMM), which is a type of stochastic model. Secondary structures such as helix, sheet and turn are learned by HMMs, and these HMMs are applied to new sequences whose structures are unknown. The output probabilities from the HMMs are used to predict the secondary structures of the sequences. The authors tested this prediction system on approximately 100 sequences from a public database (Brookhaven PDB). Although the implementation is 'without grammar' (no rule for the appearance patterns of secondary structure) the result was reasonable.

151 citations


Journal ArticleDOI
TL;DR: Four algorithms, A-D, were developed to align two groups of biological sequences, which are designed to evaluate the cost for a deletion/insertion more accurately when internal gaps are present in either or both groups of sequences.
Abstract: Four algorithms, A-D, were developed to align two groups of biological sequences. Algorithm A is equivalent to the conventional dynamic programming method widely used for aligning ordinary sequences, whereas algorithms B-D are designed to evaluate the cost for a deletion/insertion more accurately when internal gaps are present in either or both groups of sequences. Rigorous optimization of the 'sum of pairs' (SP) score is achieved by algorithm D, whose average performance is close to O(MNL2), where M and N are numbers of sequences included in the two groups and L is the mean length of the sequences. Algorithm B uses some approximations to cope with profile-based operations, whereas algorithm C is a simpler variant of algorithm D. These group-to-group alignment algorithms were applied to multiple sequence alignment with two iterative strategies: a progressive method based on a given binary tree and a randomized grouping--realignment method. The advantages and disadvantages of the four algorithms are discussed on the basis of the results of examinations of several protein families.

142 citations


Journal ArticleDOI
TL;DR: SCAMP is a general-purpose simulator of metabolic and chemical networks that accepts metabolic models described in a biochemical language and enables novice as well as experienced users rapidly to build and simulate metabolic systems.
Abstract: SCAMP is a general-purpose simulator of metabolic and chemical networks. The program is written in C and is portable to all computer systems that support an ANSI C compiler. SCAMP accepts metabolic models described in a biochemical language, and this enables novice as well as experienced users rapidly to build and simulate metabolic systems. The language is sufficiently flexible to enable other types of model to be built, e.g. chemostat or ecological models. The language offers many facilities, including: the ability to describe metabolic pathways of any structure and possessing any kinetics using normal chemical notation; optionally build models directly from the differential equations; differing compartment volumes; access to flux, concentration and rate of change information; detection of conserved cycles; access to all coefficients and elasticities of metabolic control analysis; user-defined forcing functions at the model boundaries; user-defined monitoring functions; user-configurable output of any quantity. From the model description SCAMP can either generate C code for later compilation to produce fast executable stand-alone models or run-time code for input to a run-time interpreter for immediate execution. The simulator also incorporates an inbuilt symbolic differentiator for evaluating the Jacobian and elasticity matrices.

Journal ArticleDOI
TL;DR: The Software, CURVATURE, can thus be used to investigate possible roles of curvature in modulation of gene expression and for location of curved portions of DNA, which may play an important role in sequence-specific protein--DNA interactions.
Abstract: Software is presented to plot the sequence-dependent spatial trajectory of the DNA double helix and/or distribution of curvature along the DNA molecule. The nearest-neighbor wedge model is implemented to calculate overall DNA path using local helix parameters: helix twist angle, wedge (deflection) angle and direction (of deflection) angle. The procedures described proved to be very convenient as tools for investigation of a relationship between overall DNA curvature and its gel electrophoretic mobility. All parameters of the model had been estimated from experimental data. Using these wedge parameters the program takes, as input, any DNA sequence and calculates the likely degree of curvature at each point along the molecule. This information is displayed both graphically and in the form of simplified representations of curved double helices. The Software, CURVATURE, can thus be used to investigate possible roles of curvature in modulation of gene expression and for location of curved portions of DNA, which may play an important role in sequence-specific protein--DNA interactions.

Journal ArticleDOI
TL;DR: This paper defines DNA sequences to be simple if they contain repeated occurrences of certain 'words' and thus can be encoded in a small number of bits and thus includes minisatellites and microsatellites.
Abstract: A new method, 'algorithmic significance', is proposed as a tool for discovery of patterns in DNA sequences. The main idea is that patterns can be discovered by finding ways to encode the observed data concisely. In this sense, the method can be viewed as a formal version of the Occam's Razor principle. In this paper the method is applied to discover significantly simple DNA sequences. We define DNA sequences to be simple if they contain repeated occurrences of certain 'words' and thus can be encoded in a small number of bits. Such definition includes minisatellites and microsatellites. A standard dynamic programming algorithm for data compression is applied to compute the minimal encoding lengths of sequences in linear time. An electronic mail server for identification of simple sequences based on the proposed method has been installed at the Internet address pythia/anl.gov.

Journal ArticleDOI
TL;DR: A refined algorithm together with a computer procedure for determining the complete set of non-negative, steady-state fluxes in biochemical reaction systems of any complexity, with or without some flux rates fixed, is given and it is shown that this set is a convex polyhedron, which may or may not be bounde.
Abstract: A refined algorithm together with a computer procedure for determining the complete set of non-negative, steady-state fluxes in biochemical reaction systems of any complexity, with or without some flux rates fixed, is given. It is shown that this set is a convex polyhedron, which may or may not be bounde. The algorithm is illustrated by several examples; one of them concerns intermediary metabolism. A computer code in standard C is presented.

Journal ArticleDOI
TL;DR: ANREP provides a unified framework for almost all previously proposed biosequence patterns and extends them by providing approximate matching, a feature heretofore unavailable except for the limited case of individual sequences.
Abstract: ANREP is a system for finding matches to patterns composed of (i) spacing constraints called 'spacers', and (ii) approximate matches to 'motifs' that are, recursively, patterns composed of 'atomic' symbols. A user specifies such patterns via a declarative, free-format and strongly typed language called A that is presented here in a tutorial style through a series of progressively more complex examples. The sample patterns are for protein and DNA sequences, the application domain for which ANREP was specifically created. ANREP provides a unified framework for almost all previously proposed biosequence patterns and extends them by providing approximate matching, a feature heretofore unavailable except for the limited case of individual sequences. The performance of ANREP is discussed and an appendix gives a concise specification of syntax and semantics. A portable C software package implementing ANREP is available via anonymous remote file transfer.

Journal ArticleDOI
TL;DR: The temperature parallel algorithm of simulated annealing is considered to be the most suitable for finding the optimal multiple sequence alignment because the algorithm does not require any scheduling for optimization.
Abstract: We have developed simulated annealing algorithms to solve the problem of multiple sequence alignment. The algorithm was shown to give the optimal solution as confirmed by the rigorous dynamic programming algorithm for three-sequence alignment. To overcome long execution times for simulated annealing, we utilized a parallel computer. A sequential algorithm, a simple parallel algorithm and the temperature parallel algorithm were tested on a problem. The results were compared with the result obtained by a conventional tree-based algorithm where alignments were merged by two-way dynamic programming. Every annealing algorithm produced a better energy value than the conventional algorithm. The best energy value, which probably represents the optimal solution, was reached within a reasonable time by both of the parallel annealing algorithms. We consider the temperature parallel algorithm of simulated annealing to be the most suitable for finding the optimal multiple sequence alignment because the algorithm does not require any scheduling for optimization. The algorithm is also useful for refining multiple alignments obtained by other heuristic methods.

Journal ArticleDOI
TL;DR: A program to aid in the search of primers for specific polymerase chain reaction (PCR) amplification of highly variable genomes involves the derivation of variability profiles to identify optimal regions for PCR amplification, taking into account stability of DNA-primer hybrids.
Abstract: A program to aid in the search of primers for specific polymerase chain reaction (PCR) amplification of highly variable genomes is presented. It involves the derivation of variability profiles to identify optimal regions for PCR amplification, taking into account stability of DNA-primer hybrids. An application of the program to foot-and-mouth disease virus diagnosis is presented.

Journal ArticleDOI
TL;DR: The two main approaches to the calculation of the Malthusian parameter, its error and confidence intervals have been implemented in a program and have been compared by means of an example.
Abstract: The intrinsic rate of natural increase or Malthusian parameter plays a key role fields as diverse as ecology, genetics, demography and evolution. It characterizes the growth of a population in a determinate environment. Since its rigorous statistical estimation requires of intensive calculation, the use of a computer becomes essential. The two main approaches to the calculation of the Malthusian parameter, its error and confidence intervals have been implemented in a program and have been compared by means of an example.

Journal ArticleDOI
TL;DR: An efficient algorithm is described to locate locally optimal alignments between two sequences allowing for insertions and deletions, and is fast enough to be used on a conventional workstation to scan large sequence databanks.
Abstract: An efficient algorithm is described to locate locally optimal alignments between two sequences allowing for insertions and deletions. The algorithm is based on that of Smith and Waterman which returns the single best local alignment. However, the algorithm described here permits all non-intersecting locally optimal alignments to be determined in a single pass through the comparison matrix. The algorithm simplifies the location of repeats, multiple domains and shuffled motifs, and is fast enough to be used on a conventional workstation to scan large sequence databanks.

Journal ArticleDOI
TL;DR: A simple approach to scan quickly a large protein sequence database for homology is described, in which protein sequences are grouped into families of closely related proteins, each family being characterized by its average dipeptide composition.
Abstract: A simple approach to scan quickly a large protein sequence database for homology is described. The approach used is strictly dependent on the database organization. A database has been compiled in which protein sequences are grouped into families of closely related proteins, each family being characterized by its average dipeptide composition. A new entry in the database can be allocated in a family by comparing its dipeptide composition with the average dipeptide composition of the families.

Journal ArticleDOI
TL;DR: SRS (Sequence Retrieval System), an indexing system for flat file libraries, provides fast access to individual library entries via retrieval by keywords from various data fields, is now also able to build indices using cross-references that most libraries provide.
Abstract: SRS (Sequence Retrieval System), an indexing system for flat file libraries, provides fast access to individual library entries via retrieval by keywords from various data fields SRS is now also able to build indices using cross-references that most libraries provide Fifteen libraries of DNA and protein sequences and structures have been selected These libraries interact with at least one other by means of cross-references Indexing these cross-references allows a complete network of libraries to be built In the network an entry from one library can be linked in principle to every other library If two libraries are not directly cross-referenced, the linkage can be made with a succession of single links between neighbouring, cross-referenced libraries A new operator has been added to the query language of SRS for convenient specification of links amongst complete libraries or entry sets generated by previous queries on particular libraries All the information in the network can now be used to retrieve an entry in a specific library, eg the full information given in amino acid sequence entries from SwissProt can now be used to retrieve related tertiary structure entries from PDB Furthermore, a search in a single library can be extended to a search in the complete library network, eg all entries in all databases pertaining to elastase can be found

Journal ArticleDOI
TL;DR: It is shown that comparison of the vocabularies can distinguish among different families and the latter from random sequences, and is reasonably efficient for localizing functional domains in the amino acid sequences.
Abstract: A new method for distinguishing among protein families based on the analysis of oligopeptide composition of amino acid sequences is presented. It is assumed that any protein family can be characterized by a set of essential oligopeptides (oligopeptide vocabulary). A simple approach to find such a vocabulary is suggested. It is shown that comparison of the vocabularies can distinguish among different families and the latter from random sequences. This comparison can be successfully made with a small set of frequencies of 25 dipeptides (or tripeptides). No preliminary alignment is necessary. It is established that characteristic peptides are located in the regions of functional value, as shown for GTP-binding domains of the translation elongation factors. It is demonstrated that this method is reasonably efficient for localizing functional domains in the amino acid sequences. The average error of prediction does not exceed three or four amino acid residues as shown for several functional domains.

Journal ArticleDOI
TL;DR: The program ODS is very general in the types of data that can be utilized for chromosome reconstruction, such as hybridized synthetic oligonucleotides, restriction endonuclease recognition sites or single copy landmarks, can be used for analysis.
Abstract: In the program ODS we provide a methodology for quickly ordering random clones into a physical map. The process of ordering individual clones with respect to their position along a chromosome is based on the similarity of binary signatures assigned to each clone. This binary signature is obtained by hybridizing each clone to a panel of oligonucleotide probes. By using the fact that the amount of overlap between any two clones is reflected in the similarity of their binary signatures, it is possible to reconstruct a chromosome by minimizing the sum of linking distances between an ordered sequence of clones. Unlike other programs for physical mapping, ODS is very general in the types of data that can be utilized for chromosome reconstruction. Any trait that can be scored in a presence--absence manner, such as hybridized synthetic oligonucleotides, restriction endonuclease recognition sites or single copy landmarks, can be used for analysis. Furthermore, the computational requirements for the construction of large physical maps can be measured in a matter of hours on work-stations such as the VAX2000.

Journal ArticleDOI
TL;DR: Methods for computing several 'robustness measures' at each position of a given alignment are presented, all of which are very space-efficient and used to locate particularly well-conserved regions in the beta-globin gene locus control region and in the 5' flank of the gamma- globin gene.
Abstract: Within a single alignment of two DNA sequences or two protein sequences, some regions may be much better conserved than others. Such strong conservation may reveal a region that possesses an important function. When alignments are so long that it is infeasible, or at least undesirable, to inspect them in complete detail, it is helpful to have an automatic process that computes information about the varying degree of conservation along the alignment and displays the information in a graphical representation that is readily assimilated. This paper presents methods for computing several such 'robustness measures' at each position of a given alignment. These methods are all very space-efficient; they use only space proportional to the sum of the two sequence lengths. To illustrate their effectiveness, one of the methods is used to locate particularly well-conserved regions in the beta-globin gene locus control region and in the 5' flank of the gamma-globin gene.

Journal ArticleDOI
TL;DR: This paper presents an efficient algorithm for constructing a multiple alignment from a set of pairwise alignments that is effective for exposing the existence and locations of conserved regions.
Abstract: Given a family of related sequences, one can first determine alignments between various pairs of those sequences, then construct a simultaneous alignment of all the sequences that is determined in a natural manner by the set of pairwise alignments. This approach is sometimes effective for exposing the existence and locations of conserved regions, which can then be aligned by more sensitive multiple-alignment methods. This paper presents an efficient algorithm for constructing a multiple alignment from a set of pairwise alignments.

Journal ArticleDOI
TL;DR: The database was useful for the analysis of the relationship between chemical structures and amino acid sequence motifs and one of these motifs shared by different enzymes was S-G- G-L-D, which was conserved in argininosuccinate synthase and asparagine synthase.
Abstract: Recently we have constructed a database--the Enzyme-Reaction Database--which links a chemical structure to amino acid sequences of enzymes that recognize the chemical structure as their ligand. The total number of enzymes registered in the database is 1103 with 6668 NBRF-PIR entry codes and 1756 chemical compounds. The chemical structures and chemical names for 842 compounds are registered in the Chemical-Structure Database on the MACCS system. For each enzyme, the sequences were divided into clusters, and multiply aligned in each cluster to extract a conserved sequence. A total of 158,781 five-residue-long fragments were constructed from 433 conserved sequences and compared among different clusters of different enzymes. One of these motifs shared by different enzymes was S-G-G-L-D. The motif was conserved in both argininosuccinate synthase (EC 6.3.4.5) and asparagine synthase (glutamine-hydrolysing) (EC 6.3.5.4). This result showed that the database was useful for the analysis of the relationship between chemical structures and amino acid sequence motifs.

Journal ArticleDOI
TL;DR: A computer module that includes multiple alignments, secondary structure prediction, and site and pattern search has been developed and integrated into the ANTHEPROT software for protein sequence analysis, and all methods are connected in an interactive graphic manner.
Abstract: A computer module that includes multiple alignments, secondary structure prediction, and site and pattern search has been developed and integrated into our ANTHEPROT software for protein sequence analysis. All the programs can be invoked from within any routine, thus yielding multiple pathways to obtain final results. All the results are graphically displayed. The main feature of this module is that all methods are connected in an interactive graphic manner. This module has been designed to display easily the potential sites with conserved predicted structures.

Journal ArticleDOI
TL;DR: The SIGNAL SCAN transcription factor database format has changed and the program output format has been improved, and new features allow the user to update the SIGNal SCAN database automatically, to retrieve original journal citations and to develop user signal databases.
Abstract: SIGNAL SCAN is a program that utilizes a transcription factor database to find potential transcription factor binding sites in DNA sequences. The program is now in its third version. The SIGNAL SCAN transcription factor database format has changed and the program output format has been improved. New features allow the user to update the SIGNAL SCAN database automatically, to retrieve original journal citations and to develop user signal databases. The program now uses an indexing algorithm, improving scanning speed by a factor of 3. SIGNAL SCAN is now network compatible and is available for IBM-compatible PC, Unix and VMS platforms.

Journal ArticleDOI
TL;DR: A set of programs written in C language with the GL library and under UNIX has been developed for generating compact, pleasant and non-overlapping displays of secondary structures of ribonucleic acids.
Abstract: A set of programs written in C language with the GL library and under UNIX has been developed for generating compact, pleasant and non-overlapping displays of secondary structures of ribonucleic acids. The first program, rnasearch, implements a new search procedure that dynamically rearranges overlapping portions of the two-dimensional drawing while preserving clear and readable displays of the two-dimensional structure. The algorithm is fast (the execution time for the command rnasearch is 38.6 s for the 16S rRNA of Escherichia coli with 1542 bases), accepts outputs from two-dimensional prediction programs and therefore allows for rapid comparison between the various two-dimensional folds generated. A second program, rnadisplay, allows the graphical display of the computed two-dimensional structures on a graphics workstation. Otherwise, it is possible to obtain a paper output of the two-dimensional structure by using the program print2D which builds a Postscript file. Moreover the two-dimensional drawing can be labelled for representing data coming from chemical modifications and/or enzymatic cleavages. Application to a few secondary structures such as RNaseP, 5S rRNA and 16S rRNA are given.

Journal ArticleDOI
TL;DR: A fast, sensitive pattern-matching algorithm that describes a pattern by its physico-chemical properties rather than by occurrence of amino acids is presented, using a fast, dynamic programming algorithm.
Abstract: Pattern-matching algorithms are a powerful tool for finding similarities and relationships among the steadily growing amount of known protein sequences. We present a fast, sensitive pattern-matching algorithm that describes a pattern by its physico-chemical properties rather than by occurrence of amino acids, using a fast, dynamic programming algorithm. Selected examples will demonstrate applications and advantages of our approach.