scispace - formally typeset
Search or ask a question

Showing papers in "Bioinformatics in 1989"


Journal ArticleDOI
TL;DR: A strategy is described for the rapid alignment of many long nucleic acid or protein sequences on a microcomputer based on progressively aligning sequences according to the branching order in an initial phylogenetic tree.
Abstract: A strategy is described for the rapid alignment of many long nucleic acid or protein sequences on a microcomputer. The program described can handle up to 100 sequences of 1200 residues each. The approach is based on progressively aligning sequences according to the branching order in an initial phylogenetic tree. The results obtained using the package appear to be as sensitive as those from any other available method.

1,609 citations


Journal ArticleDOI
TL;DR: A full-function multiple sequence editor for use with IBM-PC compatible microcomputers that permits the simultaneous editing, manipulation and display of several macromolecular sequences with the same degree of ease and generality afforded by commercial word processors.
Abstract: We have developed a full-function multiple sequence editor (ESEE) for use with IBM-PC compatible microcomputers. ESEE permits the simultaneous editing, manipulation and display of several macromolecular sequences with the same degree of ease and generality afforded by commercial word processors, but with correct line wrapping. In addition to multiple sequence alignment, ESEE can serve as a universal front-end for analysis programs and as a utility for producing publication-quality figures

500 citations


Journal ArticleDOI
TL;DR: Equations and algorithms are given for calculating the probabilities associated with nine different ways of defining motifs in nucleic acid and protein sequences.
Abstract: This paper describes the use of probability-generating functions for calculating the probabilities of finding motifs in nucleic acid and protein sequences. Equations and algorithms are given for calculating the probabilities associated with nine different ways of defining motifs. Comparisons are made with searches of random sequences. A higher level structure ― the pattern ― is defined as a list of motifs. A pattern also specifies the permitted ranges of spacing allowed between its constituent motifs. Equations for calculating the expected numbers of matches to patterns are given

160 citations


Journal ArticleDOI
TL;DR: A two-step multiple alignment strategy is presented that allows rapid alignment of a set of homologous sequences and comparison of pre-aligned groups of sequences, allowing for storage of aligned sequences and successive alignment of any number of sequences.
Abstract: A two-step multiple alignment strategy is presented that allows rapid alignment of a set of homologous sequences and comparison of pre-aligned groups of sequences. Examples are given demonstrating the improvement in the quality of alignments when comparing entire groups instead of single sequences. The modular design of computer programs based on this algorithm allows for storage of aligned sequences and successive alignment of any number of sequences.

124 citations


Journal ArticleDOI
TL;DR: A computer tool to aid the discovery of new motifs in nucleic acid sequences by creating dictionaries of related subsequences that are analysed to look for the commonest or best-defined subsequences.
Abstract: We describe a computer tool to aid the discovery of new motifs in nucleic acid sequences A typical use would be to analyse a set of upstream regions from a family of related genes in order to find possible control sequences The heart of the method is the creation of dictionaries of related subsequences These dictionaries can then be analysed to look for the commonest or best-defined subsequences, those that occur in the highest number of different sequences, or for those in equivalent positions within the family We show the application of the method to a set of E coli promoter sequences

103 citations


Journal ArticleDOI
TL;DR: A genome mapping system has been developed that reads and assembles data from clones analysed by restriction enzyme fragmentation and polyacrylamide gel electrophoresis that can be most effectively obtained by the use of a scanning densitometer and image-processing package.
Abstract: A genome mapping system has been developed that reads and assembles data from clones analysed by restriction enzyme fragmentation and polyacrylamide gel electrophoresis. Input data for the system can be most effectively obtained by the use of a scanning densitometer and image-processing package, such as that described in this article. The image-processing procedure involves preliminary location of bands, cooperative tracking of lanes by correlation of adjacent bands, a precise densitometric pass, alignment of the marker bands with the standard, optional interactive editing, and normalization of the accepted bands.

94 citations


Journal ArticleDOI
TL;DR: Two Macintosh programs written for multivariate data analysis and multivariateData graphical display are presented and GraphMu is designed for drawing collections of elementary graphics thus allowing comparisons between variables, individuals, and principal axes planes of multivariate methods.
Abstract: Two Macintosh programs written for multivariate data analysis and multivariate data graphical display are presented. MacMul includes principal component analysis (PCA), correspondence analysis (CA) and multiple correspondence analysis (MCA), with a complete, original and unified set of numerical aids to interpretation. GraphMu is designed for drawing collections of elementary graphics (curves, maps, graphical models) thus allowing comparisons between variables, individuals, and principal axes planes of multivariate methods. Both programs are self-documented applications and make full use of the user-oriented graphical interface of the Macintosh to simplify the process of analysing data sets. An example is described to show the results obtained on a small ecological data set.

64 citations


Journal ArticleDOI
TL;DR: A method for assessing the preserved stem-loops of RNA secondary structures resulting from the simulated folding process of a given RNA are assessed and consensus structural motifs can then be selected to construct a secondary structure of the RNA.
Abstract: A method for assessing the preserved stem-loops of RNA secondary structures is presented. Frequently recurring helical stems in a set of secondary structures resulting from the simulated folding process of a given RNA are assessed and consensus structural motifs can then be selected to construct a secondary structure of the RNA. Alternatively, it can be applied to a series of 'optimal' and 'suboptimal' secondary structures computed using the dynamic program developed by Williams and Tinoco. To demonstrate the power and the usefulness of the program we give examples of this procedure.

47 citations


Journal ArticleDOI
TL;DR: Two programs, MOTIF and PATTERN, that scan sequences for matches to user-defined motifs and patterns of motifs based on identity and set membership are described.
Abstract: Two programs, MOTIF and PATTERN, that scan sequences for matches to user-defined motifs and patterns of motifs based on identity and set membership are described. The programs use a simple and logical notation to define motifs, and may be used either interactively or by using command line parameters (suitable for batch processing). The two programs described also incorporate a simple, yet reliable, algorithm that automatically detects in which of six possible formats the sequence entry is written.

32 citations


Journal ArticleDOI
TL;DR: This review describes the current methods of their construction and their use in the determination of protein function, and offers guidelines on interpreting data obtained.
Abstract: Protein sequence motifs are acquiring increasing prominence in the area of sequence analysis. This review describes the current methods of their construction and their use in the determination of protein function, and offers guidelines on interpreting data obtained. An appendix is attached which refers to 200 motifs of various kinds.

31 citations


Journal ArticleDOI
TL;DR: A computer program that allows interactive sequence comparison using residue physiochemical characteristics and multilength segmental comparisons is described and the results are compared with those of ALIGN and BESTFIT.
Abstract: A computer program that allows interactive sequence comparison is described. It graphically displays a search matrix using residue physicochemical characteristics and multilength segmental comparisons. The user selects through a mousing device and screen pointer the sequence spans to be matched. The results of this method are compared with those of ALIGN and BESTFIT

Journal ArticleDOI
TL;DR: The programs described herein function as part of a suite of programs designed for pairwise alignment, multiple alignment, generation of randomized sequences, production of alignment scores and a sorting routine for analysis of the alignments produced.
Abstract: The programs described herein function as part of a suite of programs designed for pairwise alignment, multiple alignment, generation of randomized sequences, production of alignment scores and a sorting routine for analysis of the alignments produced. The sequence alignment programs penalize gaps (absences of residues) within regions of protein secondary structure and have the added option of 'fingerprinting' structurally or functionally important protein-residues. The multiple alignment program is based upon the sequence alignment method of Needleman and Wunsch and the multiple alignment extension of Barton and Sternberg. Our application includes the feature of optionally weighting active site, monomer--monomer, ligand contact or other important template residues to bias the alignment toward matching these residues. A sum-score for the alignments is introduced, which is independent of gap penalties. This score more adequately reflects the character of the alignments for a given scoring matrix than the gap-penalty-dependent total score described previously in the literature. In addition, individual amino acid similarity scores at each residue position in the alignments are printed with the alignment output to enable immediate quantitative assessment of homology at key sections of the aligned chains.

Journal ArticleDOI
TL;DR: The definition of chaos and strange attractors are introduced and the implications in biology are discussed and some of the basic principles of the theory of dynamical systems are reviewed.
Abstract: In this paper we review some of the basic principles of the theory of dynamical systems. We introduce the reader to the definition of chaos and strange attractors and we discuss their implications in biology.

Journal ArticleDOI
TL;DR: The secondary structure prediction methods have been translated with the possibility of predicting a set or subset of proteins and of saving the predicted states into a single file.
Abstract: The translation has been realized on an IBM-PC-compatible computer. The program has also been checked on the IBM PS/2 series. Options are given for graphic cards (EGA, CGA, VGA, Hercules). The input of sequences is made via an editor which allows the sequences to be corrected. All the graphics can be displayed on the screen together with a movable cursor on the curve. The coordinates of this cursor are displayed at the same time above the profile as well as the given amino acid position (see Figure 1). This improvement allows the user to identify both the amino acid (with its position) and its corresponding value. Facilities have also been included to permit the plot of proteins of any length by means of variable scaling factors. A new option predicts the amphiphilicity of a-helices and 0-sheets (Eisenberg et al., 1982). The possibility of displaying the helical wheel (orthogonal projection of a-helices) is also offered. The secondary structure prediction methods have been translated with the possibility of predicting a set or subset of proteins and of saving the predicted states into a single file. For the HOMOL and DIRINFO programs, the possibility of scanning the a-helix, /3-sheet, /3-turn and coil potentials is given by a graphic display. The GOR method has been implemented as described by Gibrat et al. (1987) using the new set of parameters.

Journal ArticleDOI
TL;DR: It is observed that some oligonucleotides show a statistical behaviour and a regional distribution similar to that of known signal sequences, indicating the existence of a population of very frequent oligon nucleotides.
Abstract: The large body of nucleic acid sequence data now available offers a unique opportunity for the characterization of individual oligonucleotides which may be specific to sequence functional domains. We have prepared algorithms for the study of the frequency distribution of all oligonucleotides of length 2-6 in DNA sequences. We have implemented them in the study of 634 mammalian DNA sequences spanning 1.782 Mb, and have obtained the distribution of the ratio between the observed frequency of oligonucleotides and their expected frequency based on independent nucleotide probabilities. We then studied the distribution of oligonucleotides (or k-tuples) of each length in a subset of 129 complete mammalian genes spanning 0.607 Mb. Eight distinct genomic regions, namely 5'-non-transcribed, first exon, first intron, intermediate exons, intermediate introns, last intron, last exon and 3'-non-transcribed, were considered. We observed that some oligonucleotides show a statistical behaviour and a regional distribution similar to that of known signal sequences. Moreover the frequency distribution of oligonucleotides of length 5 and 6 tends to become bimodal, indicating the existence of a population of very frequent oligonucleotides.


Journal ArticleDOI
TL;DR: A new analytical method has been used to examine the set of 40 exon/intron boundaries within the rat embryonic myosin heavy chain (MHCemb) gene and work out a more detailed set of recognition sequence requirements for the splicing of nuclear pre-mRNA.
Abstract: A new analytical method has been used to examine the set of 40 exon/intron boundaries within the rat embryonic myosin heavy chain (MHCemb) gene. It has also been applied to an additional set of 850 splice sequences selected from GenBank. Strong evidence is obtained for the involvement of 3' ends but not 5' ends of exon sequences in splice site recognition. It can be determined that signal sequences of 5' intron ends concentrate near the splice borders, while the distributions of the 3' intron ends have a diffuse character. The possibility of re-interpreting some known features, in terms of the absence of certain elements rather than the presence of elements forming sequence determinants, is discussed. The analysis undertaken enabled us to work out a more detailed set of recognition sequence requirements for the splicing of nuclear pre-mRNA. In addition to requirements which have already been established we suggest the following: the 'AG-absence' in the immediate 3' terminal intron sequences; and a minimal match between a particular sequence and the known exon/intron consensus sequence of 5' splice junctions.

Journal ArticleDOI
TL;DR: A new version of the MULTAN multiple alignment program is described wich is capable of aligning up to 200 strings of up to 6000 characters each of any string data.
Abstract: Description of a new version of the MULTAN multiple alignment program wich is capable of aligning up to 200 strings of up to 6000 characters each of any string data

Journal ArticleDOI
TL;DR: The ARIZONA dATAbASE of Aqueous Solubility was developed and is the largest and most comprehensive compilation of aqueous solubility available for unionised organic compounds.
Abstract: The ARIZONA dATAbASE of Aqueous Solubility was developed At the present time, it is the largest and most comprehensive compilation of aqueous solubility available for unionised organic compounds The solubility data, which are extracted from various scientific articles, are objectively evaluated by five criteria: temperature, solute purity, equilibration, analysis and data accuracy These criteria are used to calculate a weighting factor The weighting factor is used to obtain a «recommended value» for solubility


Journal ArticleDOI
TL;DR: A novel scoring method is shown to be superior to the method currently used in most word-searching algorithms and the effects on the power of the test of the scoring method, word length, sequence length, and sequence composition are examined.
Abstract: A method is developed, based on word-searching, which provides a rapid test for the statistical significance of DNA sequence similarities for use in databank searching. The method makes allowance for the lengths and dinucleotide compositions of the sequences being compared. A way is also described to calculate the power of the test, i.e. the probability of detecting a given similarity as being statistically significant. The effects on the power of the test of the scoring method, word length, sequence length, and sequence composition are examined. A novel scoring method is shown to be superior to the method currently used in most word-searching algorithms.

Journal ArticleDOI
TL;DR: A novel single pass charge-coupling algorithm 'Q-COUPLE', which should be usable as a separate add-on subroutine with many one-dimensional finite difference diffusion calculations, is proposed and shown to be invariant with respect to the direction of sweep.
Abstract: The importance of interionic charge coupling in chemical and biological diffusion problems is discussed, and the Nernst-Planck (ionic) and Onsager-Fuoss (neutral component) methods are considered. A novel single pass charge-coupling algorithm 'Q-COUPLE' is proposed, which should be usable as a separate add-on subroutine with many one-dimensional finite difference diffusion calculations. Its mode of operation is explained with the help of elementary electrostatics and by reference to listings in BASIC. The algorithm is being applied in a finite difference model of diffusion-with-reaction in dental plaque, with 12 ions or ionizable molecules diffusing and interacting with fixed charges. It is shown to be invariant with respect to the direction of sweep, and in the simple case of coupled diffusion of a single polyvalent electrolyte is found to compare well with the analytical solution. Advantages and limitations of the proposal are discussed.

Journal ArticleDOI
TL;DR: A computer program (CLEAVAGE) based on the same algorithm and developed in Applesoft BASIC under the operating system DOS 3.3 for an Apple IIe, a microcomputer widely used in biochemistry laboratories.
Abstract: We present a computer program (CLEAVAGE) based on the same algorithm and developed in Applesoft BASIC under the operating system DOS 3.3 for an Apple IIe, a microcomputer widely used in biochemistry laboratories

Journal ArticleDOI
TL;DR: A computer program is designed in which the initial values of kinetic parameters are estimated by a linear method and then refined by a non-linear least-squares method to choose an adequate model.
Abstract: Usually the first step of an enzyme kinetic analysis is to test whether a simple Michaelis-Menten or Hill model fits the measured results. To cope with this task we designed a computer program in which the initial values of kinetic parameters are estimated by a linear method and then refined by a non-linear least-squares method. Graphical illustration of the linear regression calculations helps to choose an adequate model

Journal ArticleDOI
TL;DR: The computer program HYLAS generates from a standard DNA letter sequence a three-dimensional space curve (H curve) which embodies the entire information content of the original nucleotide sequence which can be marked at specific nucleotide locations, annotated, rotated for observation from any viewing angle, and manipulated for convenient side-by-side comparisons.
Abstract: The computer program HYLAS generates from a standard DNA letter sequence a three-dimensional space curve (H curve) which embodies the entire information content of the original nucleotide sequence. The program can display H curves either as two-dimensional (front and side view) projections or as stereo-pair images. The curves can be marked at specific nucleotide locations, annotated, rotated for observation from any viewing angle, and manipulated for convenient side-by-side comparisons. Unlike the cumbersome letter sequences, H curves can be drastically condensed in size without losing their ability to reflect the global nucleotide-distribution pattern of the entire DNA sequence. Often, biologically important loci can be visually identified on the H curves. HYLAS is written in FORTRAN with separate mainframe (IBM-VM/CMS) and microcomputer (MS-DOS) versions. It uses the Tektronix-TCS library of graphic subroutines.

Journal ArticleDOI
TL;DR: All sequence data files used and generated by the SDSE package conform to the standard GenBank database format, thus allowing the use of any sequence retrieved from this databank, as well as the application of other packages to analyse, manipulate or retrieve stimulated sequences.
Abstract: An algorithm to simulate DNA sequence evolution under a general stochastic model, including as particular cases all the previously used schemes of nucleotide substitution, is described. The stimulation is carried out on finite, variable length, DNA sequences through a strict stochastic process, according to the particular substitution rates imposed by each scheme. Five FORTRAN programs, running on an IBM PC and compatibles, carry out all the tasks needed for the simulation. They are menu driven and interfaced to the system through a principal menu. All sequence data files used and generated by the SDSE package conform to the standard GenBank database format, thus allowing the use of any sequence retrieved from this databank, as well as the application of other packages to analyse, manipulate or retrieve stimulated sequences.

Journal ArticleDOI
Andrzej Galat1
TL;DR: A FORTRAN-77 program is described which was applied for analysis of optimized structures and computer-generated dynamics trajectories of DNA and DNA-drug complexes and can be used for detailed analysis of dynamics structures ofDNA and their complexes with intercalating drugs.
Abstract: A FORTRAN-77 program is described which was applied for analysis of optimized structures and computer-generated dynamics trajectories of DNA and DNA-drug complexes. The CORDAN program (coordinates analysis) also can be used for various manipulations of DNA-drug complexes, i.e. inversion of asymmetric sites or rebuilding the structure of the intercalator, among others. These procedures can find application in drug design. Analysis of dynamics trajectory of neocarzinostatin antibiotic (NCS) intercalated to the A-DNA form of 5'GGATGGGAG:5'CTCCCATCC is presented. The procedures described can be used for detailed analysis of dynamics structures of DNA and their complexes with intercalating drugs.

Journal ArticleDOI
TL;DR: To aid the animal taxonomist to recognize these genera, a computer program named DORY has been developed for the IBM PC/XT/AT and compatibles.
Abstract: The order Dorylaimida includes many soil, freshwater and plant parasitic nematodes. In 1963 they were included in 56 genera. After the splitting of some already known genera and the description of new ones, their number had risen to 219 in 1988. To aid the animal taxonomist to recognize these genera; we have developed a computer program named DORY. DORY has been developed for the IBM PC/XT/AT and compatibles. This program was developed using dBASE III version 1.10

Journal ArticleDOI
TL;DR: ELBAMAP (Electrophoresis Band Management Package) is a 41-kbyte program written in Borland Turbo Pascal, which will run on IBM PC compatible machines and has been used successfully for storage and comparison of plasmid DNA digest patterns analysed by agarose gel electrophoresIS and seed promoter profiles analysed by SDS-polyacrylamide gel electophoresis.
Abstract: ELBAMAP (Electrophoresis Band Management Package) is a41-kbyte program written in Borland Turbo Pascal, which willrun on IBM PC compatible machines. The program consistsof 16procedures involved in data entry, calculation, comparisonand graphic representation of banding patterns. Theseprocedures are selected interactively in response to a main menuand a series of prompts. Database files may be stored forsubsequent analysis or addition of data. Pairwise comparisonsof banding patterns may be stored as a similarity matrix suitablefor use by other packages which will carry out multivariate orcluster analysis. The system has been used successfully forstorage and comparison ofplasmid DNA digest patterns analys-ed by agarose gel electrophoresis and seed protomer profilesanalysed by SDS—polyacrylamide gel electrophoresis..IntroductionThe collection of banding data from electrophoresis gels isroutine in biological laboratories. Sophisticated densitometersand associated software are available for quantitative comparisonof electrophoresis bands. However, in many instances thepresence or absence of bands is the only criterion required forcomparison. Cataloguing and comparing banding patterns froma large number of samples, which may have been analysed ondifferent gels over a period of time, is often an arduous task.Electrophoretic banding patterns often represent a form ofdiagnostic 'fingerprint'. Acquisition of a novel pattern requirescomparison with existing patterns. Similarity coefficients maybe used in this situation.A set of normalized band sizes or fragment lengths whichcomprise a banding pattern are used as input to the ELBAMAPdatabase for further manipulation. Methods are available fordata logging of electrophoresis banding patterns using densito-meters or digitizers (Mansur-Vergara et al., 1984; Gupta andRobbelen, 1986). To facilitate gel-to-gel comparisons, therelative mobility of fragments may be standardized to commonunits such as molecular weight or base pairs (Schaffer andSederoff, 1981).ELBAMAP allows the setting up of a database of bandingpatterns, treating the bands as discrete units, without weighting.

Journal ArticleDOI
TL;DR: The Barcelonagram is applied to real bacterial growth in different and significant experimental conditions, and the diverse contributions of these simulated results to the understanding of the microbiological processes and the general reliability of the simulation is discussed.
Abstract: The Barcelonagram is a Monte Carlo simulator recently designed in order to take account of the behaviour of living systems. In this paper we apply this technique to real bacterial growth in different and significant experimental conditions, namely (i) the growth of the Serratia marcescens in a minimal glucose-limited medium, (ii) the temperature effect on the anaerobic growth of the same strain, (iii) the growth of the Escherichia coli in a minimal medium and (iv) the normal specific growth rate of bacterial populations against the available substrate concentration. In the context of these different cases we discuss the diverse contributions of these simulated results to the understanding of the microbiological processes and the general reliability of the simulation considered as a third alternative besides both (and together with!) experience and mathematical modelling.