Showing papers in "Bioinformatics in 1993"

PDF

Open Access

Journal Article•DOI•

Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation

[...]

Craig D. Livingstone¹, Geoffrey J. Barton¹•Institutions (1)

01 Dec 1993-Bioinformatics

TL;DR: An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment that simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.

...read moreread less

Abstract: An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment. The new algorithm allows questions important in the design of mutagenesis experiments to be quickly answered since positions in the alignment that show unusual or interesting residue substitution patterns may be rapidly identified. The strategy is based on a flexible set-based description of amino acid properties, which is used to define the conservation between any group of amino acids. Sequences in the alignment are gathered into subgroups on the basis of sequence similarity, functional, evolutionary or other criteria. All pairs of subgroups are then compared to highlight positions that confer the unique features of each subgroup. The algorithm is encoded in the computer program AMAS (Analysis of Multiply Aligned Sequences) which provides a textual summary of the analysis and an annotated (boxed, shaded and/or coloured) multiple sequence alignment. The algorithm is illustrated by application to an alignment of 67 SH2 domains where patterns of conserved hydrophobic residues that constitute the protein core are highlighted. The analysis of charge conservation across annexin domains identifies the locations at which conserved charges change sign. The algorithm simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.

...read moreread less

649 citations

Journal Article•DOI•

GEPASI: A software package for modelling the dynamics, steady states and control of biochemical and other systems

[...]

Pedro Mendes¹•Institutions (1)

Aberystwyth University¹

01 Oct 1993-Bioinformatics

TL;DR: GEPASI allows the automatic generation of a sequence of simulations with different combinations of parameter values, effectively scanning a hyper-solid in parameter space, which allows for both very quick and systematic study of biochemical pathway models.

...read moreread less

Abstract: GEPASI is a software system for modelling chemical and biochemical reaction networks on computers running Microsoft Windows. For any system of up to 45 metabolites and 45 reactions, each with any user-defined or one of 35 predefined rate equations, one can produce trajectories of the metabolite concentrations and obtain a steady state (if it does exist). When steady-state solutions are produced, elasticity and control coefficients, as defined in metabolic control analysis, are calculated. GEPASI also allows the automatic generation of a sequence of simulations with different combinations of parameter values, effectively scanning a hyper-solid in parameter space. Together with the ability to produce user-defined columnar data files, these features allow for both very quick and systematic study of biochemical pathway models. The source code (in C) is available on request from the author, and while the user interface is dependent on having MS-Windows as the operating system, the numerical part is portable to other operating systems. GEPASI is suitable both for research and educational purposes. Although GEPASI was written with biochemical pathways in mind, it can equally be used to stimulate other dynamical systems.

...read moreread less

544 citations

Journal Article•DOI•

TREECON: A software package for the construction and drawing of evolutionary trees

[...]

Yves Van de Peer¹, Rupert De Wachter¹•Institutions (1)

University of Antwerp¹

01 Apr 1993-Bioinformatics

TL;DR: A package of programs (run by a management program called TREECON) was developed for the construction and drawing of evolutionary trees and the modules TREE, ROOT and DRAW are applicable to any kind of dissimilarity matrix.

...read moreread less

Abstract: A package of programs (run by a management program called TREECON) was developed for the construction and drawing of evolutionary trees. The program MATRIX calculates dissimilarity values and can perform boostrap analysis on nucleic acid sequences. TREE implements different evolutionary tree constructing methods based on distance matrices. Because some of these methods produce unrooted evolutionary trees, a program ROOT places a root on the tree. Finally, the program DRAW draws the evolutionary tree, changes its size or topology, and produces drawings suitable for publication. Whereas, MATRIX is suited only for nucleic acids, the modules TREE, ROOT and DRAW are applicable to any kind of dissimilarity matrix. The programs run on IBM-compatible microcomputers using the DOS operating system.

...read moreread less

419 citations

Journal Article•DOI•

SRS—an indexing and retrieval tool for flat file data libraries

[...]

Thure Etzold, Patrick Argos

01 Feb 1993-Bioinformatics

TL;DR: SRS (Sequence Retrieval System) is an information indexing and retrieval system designed for libraries with a flat file format such as the EMBL nucleotide sequencedatabank, the SwissProt protein sequence databank or the Prosite library of protein subsequence consensus patterns.

...read moreread less

Abstract: SRS (Sequence Retrieval System) is an information indexing and retrieval system designed for libraries with a flat file format such as the EMBL nucleotide sequence databank, the SwissProt protein sequence databank or the Prosite library of protein subsequence consensus patterns. SRS supports the data structure of these libraries by providing special indices for implementing lists of subentities (e.g. feature tables) or hierarchically structured data-fields (e.g. taxonomic classification). A language (ODD) has been designed for the convenient specification of library format and organization, representation of individual data-fields within the system (design of indices) and structuring other data needed during retrieval. This ensures flexibility required for coping with different library formats, which are subject to continuous change. Queries and inspection of retrieved entries can be performed from a user interface with pull-down menus and windows. SRS supports various input and output formats but is particularly well adapted to the GCG programs.

...read moreread less

300 citations

Journal Article•DOI•

DCSE, an interactive tool for sequence alignment and secondary structure research

[...]

P. De Rijk¹, R De Wachter¹•Institutions (1)

University of Antwerp¹

01 Dec 1993-Bioinformatics

TL;DR: Although DCSE can be used on protein sequence alignments, it is especially targeted at the examination of RNA because it uses a different approach towards editing.

...read moreread less

Abstract: DCSE provides a user-friendly package for the creation and editing of sequence alignments. The program runs on different platforms, including microcomputers and workstations. Apart from available hardware, the program is not limited in the size of the alignment it can handle. It deviates more from classical text editors than other available sequence editors because it uses a different approach towards editing. It shifts characters or entire blocks of aligned characters, rather than inserting or deleting gaps in the sequences. Alignment of a new sequence to an existing alignment is partly automated. Although DCSE can be used on protein sequence alignments, it is especially targeted at the examination of RNA. The secondary structure for every sequence can be incorporated easily in the alignment. DCSE also has extensive built-in support for finding and checking secondary structure elements. A sophisticated system of markers allows notation of special positions in an alignment. This system can be used to store information such as the position of hidden breaks, introns and tertiary structure interactions.

...read moreread less

244 citations

Journal Article•DOI•

A weight array method for splicing signal analysis

[...]

Michael Q. Zhang¹, Thomas G. Marr•Institutions (1)

Cold Spring Harbor Laboratory¹

01 Oct 1993-Bioinformatics

TL;DR: The results show that there may exist weak pairwise correlations within the signals and that the proposed weight array method can help to better discriminate these signals.

...read moreread less

Abstract: A new method of sequence analysis, using a weight array method (WAM), which generalizes the traditional Staden weight matrix method (WMM), is proposed. With the help of a statistical mechanical model, the discriminant function is identified with the energy function describing macromolecular interactions. The method is applied to the study of 5'-splice signals in Schizosaccharomyces pombe pre-mRNA sequences. The results show that there may exist weak pairwise correlations within the signals and that our method can help to better discriminate these signals. Experiments are proposed to test the predictions of the theory.

...read moreread less

191 citations

Journal Article•DOI•

An interactive graphic program for calculating the secondary structure content of proteins from circular dichroism spectrum.

[...]

Gilbert Deléage¹, Christophe Geourjon¹•Institutions (1)

Independent Bank¹

01 Apr 1993-Bioinformatics

TL;DR: A graphic program has been developed to calculate the secondary structure content of proteins from their circular dichroism spectrum using a reference model for alpha-helix, beta-sheet and beta-turn.

...read moreread less

Abstract: A graphic program has been developed to calculate the secondary structure content of proteins from their circular dichroism spectrum. All the information concerning analysis and results are given on a single screen. The actual and the theoretical spectra are plotted to allow visual inspection of the fit quality. The percentages of secondary structure and statistical parameters (r.m.s., residuals) are provided. The program is fully interactive for spectra analysis. Moreover, cursors driven by a mouse or arrow keys are moveable onto spectra yielding all the information concerning a given wavelength, such as the theoretical and experimental ellipticities, wavelength, values of reference model for alpha-helix, beta-sheet and beta-turn. Interfaces are provided for the CONTIN program of Provencher and Glockner.

...read moreread less

191 citations

Journal Article•DOI•

MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming

[...]

Makoto Hirosawa, Masaki Hoshida, Masato Ishikawa, Tomoyuki Toya

01 Apr 1993-Bioinformatics

TL;DR: The MASCOT multiple alignment system is developed, which can sustain the reliability of alignment even when the similarity of sequences is low, and achieves high-quality alignment by employing three-way alignment in addition to two- way alignment.

...read moreread less

Abstract: A multiple alignment methodology that can produce high-quality alignment is extremely important for predicting the structure of unknown proteins. Nearly all the methodologies developed so far have employed two-way alignment only. Although these methods are fast, the alignments they produce lose reliability as the similarity of sequences reduces. We developed the MASCOT multiple alignment system. MASCOT can sustain the reliability of alignment even when the similarity of sequences is low. MASCOT achieves high-quality alignment by employing three-way alignment in addition to two-way alignment. The resultant alignments are refined by simulated annealing to higher quality. We also use a cluster analysis of sequences to produce highly reliable alignments.

...read moreread less

174 citations

Journal Article•DOI•

Prediction of protein secondary structure by the hidden Markov model.

[...]

Kiyoshi Asai, Satoru Hayamizu, Ken'ichi Handa

01 Apr 1993-Bioinformatics

TL;DR: A new method for analyzing the amino acid sequences of proteins using the hidden Markov model (HMM), which is a type of stochastic model that is 'without grammar' (no rule for the appearance patterns of secondary structure).

...read moreread less

Abstract: The purpose of this paper is to introduce a new method for analyzing the amino acid sequences of proteins using the hidden Markov model (HMM), which is a type of stochastic model. Secondary structures such as helix, sheet and turn are learned by HMMs, and these HMMs are applied to new sequences whose structures are unknown. The output probabilities from the HMMs are used to predict the secondary structures of the sequences. The authors tested this prediction system on approximately 100 sequences from a public database (Brookhaven PDB). Although the implementation is 'without grammar' (no rule for the appearance patterns of secondary structure) the result was reasonable.

...read moreread less

151 citations

Journal Article•DOI•

Optimal alignment between groups of sequences and its application to multiple sequence alignment

[...]

Osamu Gotoh

01 Jun 1993-Bioinformatics

TL;DR: Four algorithms, A-D, were developed to align two groups of biological sequences, which are designed to evaluate the cost for a deletion/insertion more accurately when internal gaps are present in either or both groups of sequences.

...read moreread less

Abstract: Four algorithms, A-D, were developed to align two groups of biological sequences. Algorithm A is equivalent to the conventional dynamic programming method widely used for aligning ordinary sequences, whereas algorithms B-D are designed to evaluate the cost for a deletion/insertion more accurately when internal gaps are present in either or both groups of sequences. Rigorous optimization of the 'sum of pairs' (SP) score is achieved by algorithm D, whose average performance is close to O(MNL2), where M and N are numbers of sequences included in the two groups and L is the mean length of the sequences. Algorithm B uses some approximations to cope with profile-based operations, whereas algorithm C is a simpler variant of algorithm D. These group-to-group alignment algorithms were applied to multiple sequence alignment with two iterative strategies: a progressive method based on a given binary tree and a randomized grouping--realignment method. The advantages and disadvantages of the four algorithms are discussed on the basis of the results of examinations of several protein families.

...read moreread less

142 citations

Journal Article•DOI•

SCAMP: a general-purpose simulator and metabolic control analysis program

[...]

Herbert M. Sauro

01 Aug 1993-Bioinformatics

TL;DR: SCAMP is a general-purpose simulator of metabolic and chemical networks that accepts metabolic models described in a biochemical language and enables novice as well as experienced users rapidly to build and simulate metabolic systems.

...read moreread less

Abstract: SCAMP is a general-purpose simulator of metabolic and chemical networks. The program is written in C and is portable to all computer systems that support an ANSI C compiler. SCAMP accepts metabolic models described in a biochemical language, and this enables novice as well as experienced users rapidly to build and simulate metabolic systems. The language is sufficiently flexible to enable other types of model to be built, e.g. chemostat or ecological models. The language offers many facilities, including: the ability to describe metabolic pathways of any structure and possessing any kinetics using normal chemical notation; optionally build models directly from the differential equations; differing compartment volumes; access to flux, concentration and rate of change information; detection of conserved cycles; access to all coefficients and elasticities of metabolic control analysis; user-defined forcing functions at the model boundaries; user-defined monitoring functions; user-configurable output of any quantity. From the model description SCAMP can either generate C code for later compilation to produce fast executable stand-alone models or run-time code for input to a run-time interpreter for immediate execution. The simulator also incorporates an inbuilt symbolic differentiator for evaluating the Jacobian and elasticity matrices.

...read moreread less

Journal Article•DOI•

CURVATURE: software for the analysis of curved DNA

[...]

E. S. Shpigelman, Edward N. Trifonov¹, Alexander Bolshoy¹•Institutions (1)

Weizmann Institute of Science¹

01 Aug 1993-Bioinformatics

TL;DR: The Software, CURVATURE, can thus be used to investigate possible roles of curvature in modulation of gene expression and for location of curved portions of DNA, which may play an important role in sequence-specific protein--DNA interactions.

...read moreread less

Abstract: Software is presented to plot the sequence-dependent spatial trajectory of the DNA double helix and/or distribution of curvature along the DNA molecule. The nearest-neighbor wedge model is implemented to calculate overall DNA path using local helix parameters: helix twist angle, wedge (deflection) angle and direction (of deflection) angle. The procedures described proved to be very convenient as tools for investigation of a relationship between overall DNA curvature and its gel electrophoretic mobility. All parameters of the model had been estimated from experimental data. Using these wedge parameters the program takes, as input, any DNA sequence and calculates the likely degree of curvature at each point along the molecule. This information is displayed both graphically and in the form of simplified representations of curved double helices. The Software, CURVATURE, can thus be used to investigate possible roles of curvature in modulation of gene expression and for location of curved portions of DNA, which may play an important role in sequence-specific protein--DNA interactions.

...read moreread less

Journal Article•DOI•

Discovering simple DNA sequences by the algorithmic significance method

[...]

Aleksandar Milosavljevic¹, Jerzy Jurka¹•Institutions (1)

Linus Pauling Institute¹

01 Aug 1993-Bioinformatics

TL;DR: This paper defines DNA sequences to be simple if they contain repeated occurrences of certain 'words' and thus can be encoded in a small number of bits and thus includes minisatellites and microsatellites.

...read moreread less

Abstract: A new method, 'algorithmic significance', is proposed as a tool for discovery of patterns in DNA sequences. The main idea is that patterns can be discovered by finding ways to encode the observed data concisely. In this sense, the method can be viewed as a formal version of the Occam's Razor principle. In this paper the method is applied to discover significantly simple DNA sequences. We define DNA sequences to be simple if they contain repeated occurrences of certain 'words' and thus can be encoded in a small number of bits. Such definition includes minisatellites and microsatellites. A standard dynamic programming algorithm for data compression is applied to compute the minimal encoding lengths of sequences in linear time. An electronic mail server for identification of simple sequences based on the proposed method has been installed at the Internet address pythia/anl.gov.

...read moreread less

Journal Article•DOI•

Refined algorithm and computer program for calculating all non-negative fluxes admissible in steady states of biochemical reaction systems with or without some flux rates fixed.

[...]

Ronny Schuster¹, Stefan Schuster•Institutions (1)

University of Bordeaux¹

01 Feb 1993-Bioinformatics

TL;DR: A refined algorithm together with a computer procedure for determining the complete set of non-negative, steady-state fluxes in biochemical reaction systems of any complexity, with or without some flux rates fixed, is given and it is shown that this set is a convex polyhedron, which may or may not be bounde.

...read moreread less

Abstract: A refined algorithm together with a computer procedure for determining the complete set of non-negative, steady-state fluxes in biochemical reaction systems of any complexity, with or without some flux rates fixed, is given. It is shown that this set is a convex polyhedron, which may or may not be bounde. The algorithm is illustrated by several examples; one of them concerns intermediary metabolism. A computer code in standard C is presented.

...read moreread less

Journal Article•DOI•

A system for pattern matching applications on biosequences.

[...]

Gerhard Mehldau¹, Gene Myers•Institutions (1)

University of Arizona¹

01 Jun 1993-Bioinformatics

TL;DR: ANREP provides a unified framework for almost all previously proposed biosequence patterns and extends them by providing approximate matching, a feature heretofore unavailable except for the limited case of individual sequences.

...read moreread less

Abstract: ANREP is a system for finding matches to patterns composed of (i) spacing constraints called 'spacers', and (ii) approximate matches to 'motifs' that are, recursively, patterns composed of 'atomic' symbols. A user specifies such patterns via a declarative, free-format and strongly typed language called A that is presented here in a tutorial style through a series of progressively more complex examples. The sample patterns are for protein and DNA sequences, the application domain for which ANREP was specifically created. ANREP provides a unified framework for almost all previously proposed biosequence patterns and extends them by providing approximate matching, a feature heretofore unavailable except for the limited case of individual sequences. The performance of ANREP is discussed and an appendix gives a concise specification of syntax and semantics. A portable C software package implementing ANREP is available via anonymous remote file transfer.

...read moreread less

Journal Article•DOI•

Multiple sequence alignment by parallel simulated annealing

[...]

Masato Ishikawa, Tomoyuki Toya, Masaki Hoshida, Katsumi Nitta, Atushi Ogiwara, Minoru Kanehisa - Show less +2 more

01 Jun 1993-Bioinformatics

TL;DR: The temperature parallel algorithm of simulated annealing is considered to be the most suitable for finding the optimal multiple sequence alignment because the algorithm does not require any scheduling for optimization.

...read moreread less

Abstract: We have developed simulated annealing algorithms to solve the problem of multiple sequence alignment. The algorithm was shown to give the optimal solution as confirmed by the rigorous dynamic programming algorithm for three-sequence alignment. To overcome long execution times for simulated annealing, we utilized a parallel computer. A sequential algorithm, a simple parallel algorithm and the temperature parallel algorithm were tested on a problem. The results were compared with the result obtained by a conventional tree-based algorithm where alignments were merged by two-way dynamic programming. Every annealing algorithm produced a better energy value than the conventional algorithm. The best energy value, which probably represents the optimal solution, was reached within a reasonable time by both of the parallel annealing algorithms. We consider the temperature parallel algorithm of simulated annealing to be the most suitable for finding the optimal multiple sequence alignment because the algorithm does not require any scheduling for optimization. The algorithm is also useful for refining multiple alignments obtained by other heuristic methods.

...read moreread less

Journal Article•DOI•

Design of primers for PCR amplification of highly variable genomes.

[...]

Joaquín Dopazo, Ana Rodríguez, Juan-Carlos Saiz, Francisco Sobrino

01 Apr 1993-Bioinformatics

TL;DR: A program to aid in the search of primers for specific polymerase chain reaction (PCR) amplification of highly variable genomes involves the derivation of variability profiles to identify optimal regions for PCR amplification, taking into account stability of DNA-primer hybrids.

...read moreread less

Abstract: A program to aid in the search of primers for specific polymerase chain reaction (PCR) amplification of highly variable genomes is presented. It involves the derivation of variability profiles to identify optimal regions for PCR amplification, taking into account stability of DNA-primer hybrids. An application of the program to foot-and-mouth disease virus diagnosis is presented.

...read moreread less

Journal Article•DOI•

Estimation of the intrinsic rate of natural increase and its error by both algebraic and resampling approaches

[...]

Ana Taberner¹, Pedro Castañera, Enrique Silvestre, Joaquín Dopazo•Institutions (1)

Spanish National Research Council¹

01 Oct 1993-Bioinformatics

TL;DR: The two main approaches to the calculation of the Malthusian parameter, its error and confidence intervals have been implemented in a program and have been compared by means of an example.

...read moreread less

Abstract: The intrinsic rate of natural increase or Malthusian parameter plays a key role fields as diverse as ecology, genetics, demography and evolution. It characterizes the growth of a population in a determinate environment. Since its rigorous statistical estimation requires of intensive calculation, the use of a computer becomes essential. The two main approaches to the calculation of the Malthusian parameter, its error and confidence intervals have been implemented in a program and have been compared by means of an example.

...read moreread less

Journal Article•DOI•

An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps

[...]

Geoffrey J. Barton

01 Dec 1993-Bioinformatics

TL;DR: An efficient algorithm is described to locate locally optimal alignments between two sequences allowing for insertions and deletions, and is fast enough to be used on a conventional workstation to scan large sequence databanks.

...read moreread less

Abstract: An efficient algorithm is described to locate locally optimal alignments between two sequences allowing for insertions and deletions. The algorithm is based on that of Smith and Waterman which returns the single best local alignment. However, the algorithm described here permits all non-intersecting locally optimal alignments to be determined in a single pass through the comparison matrix. The algorithm simplifies the location of repeats, multiple domains and shuffled motifs, and is fast enough to be used on a conventional workstation to scan large sequence databanks.

...read moreread less

Journal Article•DOI•

Classification of protein sequences by their dipeptide composition

[...]

Pasquale Petrilli

01 Apr 1993-Bioinformatics

TL;DR: A simple approach to scan quickly a large protein sequence database for homology is described, in which protein sequences are grouped into families of closely related proteins, each family being characterized by its average dipeptide composition.

...read moreread less

Abstract: A simple approach to scan quickly a large protein sequence database for homology is described. The approach used is strictly dependent on the database organization. A database has been compiled in which protein sequences are grouped into families of closely related proteins, each family being characterized by its average dipeptide composition. A new entry in the database can be allocated in a family by comparing its dipeptide composition with the average dipeptide composition of the families.

...read moreread less

Journal Article•DOI•

Transforming a set of biological flat file libraries to a fast access network.

[...]

Thure Etzold, Patrick Argos

01 Feb 1993-Bioinformatics

TL;DR: SRS (Sequence Retrieval System), an indexing system for flat file libraries, provides fast access to individual library entries via retrieval by keywords from various data fields, is now also able to build indices using cross-references that most libraries provide.

...read moreread less

Abstract: SRS (Sequence Retrieval System), an indexing system for flat file libraries, provides fast access to individual library entries via retrieval by keywords from various data fields SRS is now also able to build indices using cross-references that most libraries provide Fifteen libraries of DNA and protein sequences and structures have been selected These libraries interact with at least one other by means of cross-references Indexing these cross-references allows a complete network of libraries to be built In the network an entry from one library can be linked in principle to every other library If two libraries are not directly cross-referenced, the linkage can be made with a succession of single links between neighbouring, cross-referenced libraries A new operator has been added to the query language of SRS for convenient specification of links amongst complete libraries or entry sets generated by previous queries on particular libraries All the information in the network can now be used to retrieve an entry in a specific library, eg the full information given in amino acid sequence entries from SwissProt can now be used to retrieve related tertiary structure entries from PDB Furthermore, a search in a single library can be extended to a search in the complete library network, eg all entries in all databases pertaining to elastase can be found

...read moreread less

Journal Article•DOI•

A novel method of protein sequence classification based on oligopeptide frequency analysis and its application to search for functional sites and to domain localization

[...]

Victor V. Solovyev¹, Kira S. Makarova¹•Institutions (1)

Russian Academy¹

01 Feb 1993-Bioinformatics

TL;DR: It is shown that comparison of the vocabularies can distinguish among different families and the latter from random sequences, and is reasonably efficient for localizing functional domains in the amino acid sequences.

...read moreread less

Abstract: A new method for distinguishing among protein families based on the analysis of oligopeptide composition of amino acid sequences is presented. It is assumed that any protein family can be characterized by a set of essential oligopeptides (oligopeptide vocabulary). A simple approach to find such a vocabulary is suggested. It is shown that comparison of the vocabularies can distinguish among different families and the latter from random sequences. This comparison can be successfully made with a small set of frequencies of 25 dipeptides (or tripeptides). No preliminary alignment is necessary. It is established that characteristic peptides are located in the regions of functional value, as shown for GTP-binding domains of the translation elongation factors. It is demonstrated that this method is reasonably efficient for localizing functional domains in the amino acid sequences. The average error of prediction does not exceed three or four amino acid residues as shown for several functional domains.

...read moreread less

Journal Article•DOI•

ODS: ordering DNA sequences--a physical mapping algorithm based on simulated annealing.

[...]

Anthony Cuticchia¹, Jonathan Arnold², William E. Timberlake²•Institutions (2)

Johns Hopkins University¹, University of Georgia²

01 Apr 1993-Bioinformatics

TL;DR: The program ODS is very general in the types of data that can be utilized for chromosome reconstruction, such as hybridized synthetic oligonucleotides, restriction endonuclease recognition sites or single copy landmarks, can be used for analysis.

...read moreread less

Abstract: In the program ODS we provide a methodology for quickly ordering random clones into a physical map. The process of ordering individual clones with respect to their position along a chromosome is based on the similarity of binary signatures assigned to each clone. This binary signature is obtained by hybridizing each clone to a panel of oligonucleotide probes. By using the fact that the amount of overlap between any two clones is reflected in the similarity of their binary signatures, it is possible to reconstruct a chromosome by minimizing the sum of linking distances between an ordered sequence of clones. Unlike other programs for physical mapping, ODS is very general in the types of data that can be utilized for chromosome reconstruction. Any trait that can be scored in a presence--absence manner, such as hybridized synthetic oligonucleotides, restriction endonuclease recognition sites or single copy landmarks, can be used for analysis. Furthermore, the computational requirements for the construction of large physical maps can be measured in a matter of hours on work-stations such as the VAX2000.

...read moreread less

Journal Article•DOI•

Locating well-conserved regions within a pairwise alignment

[...]

Kun-Mao Chao¹, Ross C. Hardison¹, Webb Miller¹•Institutions (1)

Pennsylvania State University¹

01 Aug 1993-Bioinformatics

TL;DR: Methods for computing several 'robustness measures' at each position of a given alignment are presented, all of which are very space-efficient and used to locate particularly well-conserved regions in the beta-globin gene locus control region and in the 5' flank of the gamma- globin gene.

...read moreread less

Abstract: Within a single alignment of two DNA sequences or two protein sequences, some regions may be much better conserved than others. Such strong conservation may reveal a region that possesses an important function. When alignments are so long that it is infeasible, or at least undesirable, to inspect them in complete detail, it is helpful to have an automatic process that computes information about the varying degree of conservation along the alignment and displays the information in a graphical representation that is readily assimilated. This paper presents methods for computing several such 'robustness measures' at each position of a given alignment. These methods are all very space-efficient; they use only space proportional to the sum of the two sequence lengths. To illustrate their effectiveness, one of the methods is used to locate particularly well-conserved regions in the beta-globin gene locus control region and in the 5' flank of the gamma-globin gene.

...read moreread less

Journal Article•DOI•

Building multiple alignments from pairwise alignments.

[...]

Webb Miller¹•Institutions (1)

Pennsylvania State University¹

01 Apr 1993-Bioinformatics

TL;DR: This paper presents an efficient algorithm for constructing a multiple alignment from a set of pairwise alignments that is effective for exposing the existence and locations of conserved regions.

...read moreread less

Abstract: Given a family of related sequences, one can first determine alignments between various pairs of those sequences, then construct a simultaneous alignment of all the sequences that is determined in a natural manner by the set of pairwise alignments. This approach is sometimes effective for exposing the existence and locations of conserved regions, which can then be aligned by more sensitive multiple-alignment methods. This paper presents an efficient algorithm for constructing a multiple alignment from a set of pairwise alignments.

...read moreread less

Journal Article•DOI•

Searching for amino acid sequence motifs among enzymes: the Enzyme-Reaction Database.

[...]

Mikita Suyama, Atsushi Ogiwara, Takaaki Nishioka, Jun'ichi Oda

01 Feb 1993-Bioinformatics

TL;DR: The database was useful for the analysis of the relationship between chemical structures and amino acid sequence motifs and one of these motifs shared by different enzymes was S-G- G-L-D, which was conserved in argininosuccinate synthase and asparagine synthase.

...read moreread less

Abstract: Recently we have constructed a database--the Enzyme-Reaction Database--which links a chemical structure to amino acid sequences of enzymes that recognize the chemical structure as their ligand. The total number of enzymes registered in the database is 1103 with 6668 NBRF-PIR entry codes and 1756 chemical compounds. The chemical structures and chemical names for 842 compounds are registered in the Chemical-Structure Database on the MACCS system. For each enzyme, the sequences were divided into clusters, and multiply aligned in each cluster to extract a conserved sequence. A total of 158,781 five-residue-long fragments were constructed from 433 conserved sequences and compared among different clusters of different enzymes. One of these motifs shared by different enzymes was S-G-G-L-D. The motif was conserved in both argininosuccinate synthase (EC 6.3.4.5) and asparagine synthase (glutamine-hydrolysing) (EC 6.3.5.4). This result showed that the database was useful for the analysis of the relationship between chemical structures and amino acid sequence motifs.

...read moreread less

Journal Article•DOI•

Interactive and graphic coupling between multiple alignments, secondary structure predictions and motif/pattern scanning into proteins.

[...]

Christophe Geourjon¹, Gilbert Deléage¹•Institutions (1)

Claude Bernard University Lyon 1¹

01 Feb 1993-Bioinformatics

TL;DR: A computer module that includes multiple alignments, secondary structure prediction, and site and pattern search has been developed and integrated into the ANTHEPROT software for protein sequence analysis, and all methods are connected in an interactive graphic manner.

...read moreread less

Abstract: A computer module that includes multiple alignments, secondary structure prediction, and site and pattern search has been developed and integrated into our ANTHEPROT software for protein sequence analysis. All the programs can be invoked from within any routine, thus yielding multiple pathways to obtain final results. All the results are graphically displayed. The main feature of this module is that all methods are connected in an interactive graphic manner. This module has been designed to display easily the potential sites with conserved predicted structures.

...read moreread less

Journal Article•DOI•

SIGNAL SCAN 3.0: new database and program features.

[...]

Dan S. Prestridge¹, Gary D. Stormo²•Institutions (2)

University of Minnesota¹, University of Colorado Boulder²

01 Feb 1993-Bioinformatics

TL;DR: The SIGNAL SCAN transcription factor database format has changed and the program output format has been improved, and new features allow the user to update the SIGNal SCAN database automatically, to retrieve original journal citations and to develop user signal databases.

...read moreread less

Abstract: SIGNAL SCAN is a program that utilizes a transcription factor database to find potential transcription factor binding sites in DNA sequences. The program is now in its third version. The SIGNAL SCAN transcription factor database format has changed and the program output format has been improved. New features allow the user to update the SIGNAL SCAN database automatically, to retrieve original journal citations and to develop user signal databases. The program now uses an indexing algorithm, improving scanning speed by a factor of 3. SIGNAL SCAN is now network compatible and is available for IBM-compatible PC, Unix and VMS platforms.

...read moreread less

Journal Article•DOI•

Automatic display of RNA secondary structures

[...]

G. Muller¹, Ch. Gaspin¹, A. Etienne¹, Eric Westhof¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Oct 1993-Bioinformatics

TL;DR: A set of programs written in C language with the GL library and under UNIX has been developed for generating compact, pleasant and non-overlapping displays of secondary structures of ribonucleic acids.

...read moreread less

Abstract: A set of programs written in C language with the GL library and under UNIX has been developed for generating compact, pleasant and non-overlapping displays of secondary structures of ribonucleic acids. The first program, rnasearch, implements a new search procedure that dynamically rearranges overlapping portions of the two-dimensional drawing while preserving clear and readable displays of the two-dimensional structure. The algorithm is fast (the execution time for the command rnasearch is 38.6 s for the 16S rRNA of Escherichia coli with 1542 bases), accepts outputs from two-dimensional prediction programs and therefore allows for rapid comparison between the various two-dimensional folds generated. A second program, rnadisplay, allows the graphical display of the computed two-dimensional structures on a graphics workstation. Otherwise, it is possible to obtain a paper output of the two-dimensional structure by using the program print2D which builds a Postscript file. Moreover the two-dimensional drawing can be labelled for representing data coming from chemical modifications and/or enzymatic cleavages. Application to a few secondary structures such as RNaseP, 5S rRNA and 16S rRNA are given.

...read moreread less

Journal Article•DOI•

A fast, sensitive pattern-matching approach for protein sequences

[...]

K. Rohde¹, Peer Bork•Institutions (1)

Max Delbrück Center for Molecular Medicine¹

01 Apr 1993-Bioinformatics

TL;DR: A fast, sensitive pattern-matching algorithm that describes a pattern by its physico-chemical properties rather than by occurrence of amino acids is presented, using a fast, dynamic programming algorithm.

...read moreread less

Abstract: Pattern-matching algorithms are a powerful tool for finding similarities and relationships among the steadily growing amount of known protein sequences. We present a fast, sensitive pattern-matching algorithm that describes a pattern by its physico-chemical properties rather than by occurrence of amino acids, using a fast, dynamic programming algorithm. Selected examples will demonstrate applications and advantages of our approach.

...read moreread less