Showing papers in "Journal of Computational Biology in 1995"

PDF

Open Access

Journal Article•DOI•

A new algorithm for DNA sequence assembly.

[...]

Ramana M. Idury¹, Michael S. Waterman•Institutions (1)

01 Jan 1995-Journal of Computational Biology

TL;DR: This paper proposes a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods, and promises to be very fast and practical forDNA sequence assembly.

...read moreread less

Abstract: Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.

...read moreread less

336 citations

Journal Article•DOI•

Maximum Discrimination Hidden Markov Models of Sequence Consensus

[...]

Sean R. Eddy¹, Graeme Mitchison, Richard Durbin•Institutions (1)

Washington University in St. Louis¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The maximum discrimination method for building hidden Markov models (HMMs) of protein or nucleic acid primary sequence consensus compensates for biased representation in sequence data sets, superseding the need for sequence weighting methods.

...read moreread less

Abstract: We introduce a maximum discrimination method for building hidden Markov models (HMMs) of protein or nucleic acid primary sequence consensus. The method compensates for biased representation in sequence data sets, superseding the need for sequence weighting methods. Maximum discrimination HMMs are more sensitive for detecting distant sequence homologs than various other HMM methods or BLAST when tested on globin and protein kinase catalytic domain sequences. Key words: hidden Markov model; database searching; sequence consensus; sequence weighting

...read moreread less

272 citations

Journal Article•DOI•

Toward Simplifying and Accurately Formulating Fragment Assembly

[...]

Eugene W. Myers¹•Institutions (1)

University of Arizona¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The fragment assembly problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem.

...read moreread less

Abstract: The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov–Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the un...

...read moreread less

258 citations

Journal Article•DOI•

Four strikes against physical mapping of DNA.

[...]

Paul Goldberg¹, Martin Charles Golumbic, Haim Kaplan, Ron Shamir•Institutions (1)

Sandia National Laboratories¹

01 Jan 1995-Journal of Computational Biology

TL;DR: It is shown that four simplified models of the physical mapping problem lead to NP-complete decision problems: Colored unit interval graph completion, the maximum interval subgraph, the pathwidth of a bipartite graph, and the k-consecutive ones problem for k > or = 2.

...read moreread less

Abstract: Physical mapping is a central problem in molecular biology and the human genome project. The problem is to reconstruct the relative position of fragments of DNA along the genome from information on their pairwise overlaps. We show that four simplified models of the problem lead to NP-complete decision problems: Colored unit interval graph completion, the maximum interval (or unit interval) subgraph, the pathwidth of a bipartite graph, and the k -consecutive ones problem for k ≥ 2. These models have been chosen to reflect various features typical in biological data, including false-negative and positive errors, small width of the map, and chimericism. Key words: physical mapping; NP-completeness; interval graphs; k-consecutive ones problem

...read moreread less

220 citations

Journal Article•DOI•

Challenges in integrating biological data sources.

[...]

Susan B. Davidson¹, G. Christian Overton, Peter Buneman•Institutions (1)

University of Pennsylvania¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies are surveyed, to counter the increasing dispersion and heterogeneity of data.

...read moreread less

Abstract: Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies.

...read moreread less

200 citations

Journal Article•DOI•

Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment

[...]

Sandeep K. Gupta¹, John Kececioglu², Alejandro A. Schäffer¹•Institutions (2)

Rice University¹, University of Georgia²

01 Jan 1995-Journal of Computational Biology

TL;DR: The MSA program implements a branch-and-bound technique together with a variant of Dijkstra's shortest paths algorithm to prune the basic dynamic programming graph to find optimal alignments of multiple protein or DNA sequences.

...read moreread less

Abstract: The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program impleme...

...read moreread less

165 citations

Journal Article•DOI•

Prediction of Function in DNA Sequence Analysis

[...]

Mikhail S. Gelfand¹•Institutions (1)

Russian Academy of Sciences¹

01 Jan 1995-Journal of Computational Biology

TL;DR: An extensive review of methods for prediction of functional sites, tRNA, and protein-coding genes and discuss possible further directions of research in this area of computational molecular biology.

...read moreread less

Abstract: Recognition of function of newly sequenced DNA fragments is an important area of computational molecular biology Here we present an extensive review of methods for prediction of functional sites, tRNA, and protein-coding genes and discuss possible further directions of research in this area Key words: DNA sequence analysis; functional sites; genes; protein-coding regions; exons; introns; prediction; tRNA

...read moreread less

128 citations

Journal Article•DOI•

A Strategy for Database Interoperation

[...]

Peter D. Karp¹•Institutions (1)

Artificial Intelligence Center¹

01 Jan 1995-Journal of Computational Biology

TL;DR: This work proposes an architecture for query-based interoperation that includes a number of novel components of an information infrastructure for molecular biology that bridge the heterogeneities that exist between biological DBs at several different levels.

...read moreread less

Abstract: To realize the full potential of biological databases (DBs) requires more than the interactive, hypertext flavor of database interoperation that is now so popular in the bioinformatics community. Interoperation based on declarative queries to multiple network-accessible databases will support analyses and investigations that are orders of magnitude faster and more powerful than what can be accomplished through interactive navigation. I present a vision of the capabilities that a query-based interoperation infrastructure should provide, and identify assumptions underlying, and requirements of, this vision. I then propose an architecture for query-based interoperation that includes a number of novel components of an information infrastructure for molecular biology. These components include a knowledge base that describes relationships among the conceptualizations used in different biological databases, a module that can determine the DBs that are relevant to a particular query, a module that can tr...

...read moreread less

120 citations

Journal Article•DOI•

A biologically consistent model for comparing molecular phylogenies

[...]

Boris Mirkin¹, Ilya Muchnik, Temple F. Smith•Institutions (1)

Rutgers University¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The model is employed for embedding a phylogeny tree into another one via the so-called duplication/speciation principle requiring that the gene duplicated evolves in such a way that any of the contemporary species involved bears only one of the gene copies diverged.

...read moreread less

Abstract: In the framework of the problem of combining different gene trees into a unique species phylogeny, a model for duplication/speciation/loss events along the evolutionary tree is introduced. The model is employed for embedding a phylogeny tree into another one via the so-called duplication/speciation principle requiring that the gene duplicated evolves in such a way that any of the contemporary species involved bears only one of the gene copies diverged. The number of biologically meaningful elements in the embedding result (duplications, losses, information gaps) is considered a (asymmetric) dissimilarity measure between the trees. The model duplication concept is compared with that one defined previously in terms of a mapping procedure for the trees. A graph-theoretic reformulation of the measure is derived.

...read moreread less

106 citations

Journal Article•DOI•

Physical mapping of chromosomes using unique probes.

[...]

Farid Alizadeh¹, Richard M. Karp, Deborah K. Weisser, Geoffrey Zweig•Institutions (1)

International Computer Science Institute¹

01 Jan 1995-Journal of Computational Biology

TL;DR: This work presents several algorithms to infer how the clones overlap, given data about each clone, in data used to map human chromosomes 21 and Y, in which relatively short substrings, or probes, are extracted from the ends of clones.

...read moreread less

Abstract: The goal of physical mapping of the genome is to reconstruct a strand of DNA given a collection of overlapping fragments, or clones, from the strand. We present several algorithms to infer how the clones overlap, given data about each clone. We focus on data used to map human chromosomes 21 and Y, in which relatively short substrings, or probes, are extracted from the ends of clones. The substrings are long enough to be unique with high probability. The data we are given is an incidence matrix of clones and probes. In the absence of error, the correct placement can be found easily using a PQ-tree. The data are never free from error, however, and algorithms are differentiated by their performance in the presence of errors. We approach errors from two angles: by detecting and removing them, and by using algorithms that are robust in the presence of errors. We have also developed a strategy to recover noiseless data through an interactive process that detects anomalies in the data and retests questionable entries in the incidence matrix of clones and probes. We evaluate the effectiveness of our algorithms empirically, using simulated data as well as real data from human chromosome 21.

...read moreread less

101 citations

Journal Article•DOI•

Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

[...]

Sophie Schbath¹, Bernard Prum, Elisabeth de Turckheim•Institutions (1)

Institut national de la recherche agronomique¹

01 Jan 1995-Journal of Computational Biology

TL;DR: Different Markov chain models, either with stationary or periodic transition probabilities, are considered, showing that many overabundant words are one-letter mutations of avoided palindromes.

...read moreread less

Abstract: Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are an...

...read moreread less

Journal Article•DOI•

The Polymerase Chain Reaction and Branching Processes

[...]

Fengzhu Sun¹•Institutions (1)

University of Southern California¹

01 Jan 1995-Journal of Computational Biology

TL;DR: A mathematical model for the polymerase chain reaction and its mutations is constructed using the theory of branching processes and a method for estimating the mutation rate based on pairwise differences is proposed.

...read moreread less

Abstract: We construct a mathematical model for the polymerase chain reaction and its mutations using the theory of branching processes. Under this model we study the number of mutations in a random...

...read moreread less

Journal Article•DOI•

Locating Protein Coding Regions in Human DNA Using a Decision Tree Algorithm

[...]

Steven L. Salzberg¹•Institutions (1)

Johns Hopkins University¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The conclusion is that decision trees are a highly effective tool for identifying protein coding regions, on DNA sequences ranging from 54 to 162 base pairs in length.

...read moreread less

Abstract: Genes in eukaryotic DNA cover hundreds or thousands of base pairs, while the regions of those genes that code for proteins may occupy only a small percentage of the sequence. Identifying the coding regions is of vital importance in understanding these genes. Many recent research efforts have studied computational methods for distinguishing between coding and noncoding regions, and several promising results have been reported. We describe here a new approach, using a machine learning system that builds decision trees from the data. This approach combines several coding measures to produce classifiers with consistently higher accuracies than previous methods, on DNA sequences ranging from 54 to 162 base pairs in length. The algorithm is very efficient, and it can easily be adapted to different sequence lengths. Our conclusion is that decision trees are a highly effective tool for identifying protein coding regions.

...read moreread less

Journal Article•DOI•

Modeling the polymerase chain reaction.

[...]

Günter Weiss¹, Arndt von Haeseler•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Jan 1995-Journal of Computational Biology

TL;DR: A mathematical model to treat the polymerase chain reaction (PCR), where the accumulation of new molecules during a PCR cycle is regarded as a randomly bifurcating tree, enables an approximate formula for the distribution of the number of replications that have occurred between a pair of molecules to be computed.

...read moreread less

Abstract: We introduce a mathematical model to treat the polymerase chain reaction (PCR), where we regard the accumulation of new molecules during a PCR cycle as a randomly bifurcating tree. This mo...

...read moreread less

Journal Article•DOI•

Reconstructing strings from substrings.

[...]

Steven Skiena¹, Gopalakrishnan Sundaram•Institutions (1)

State University of New York System¹

01 Jan 1995-Journal of Computational Biology

TL;DR: It is shown that building an optimal decision tree is NP-complete, then an approximation algorithm is given that gives trees within a constant multiplicative factor of optimal, and it is demonstrated that subsequence queries are significantly more powerful than substring queries, matching the information theoretic lower bound.

...read moreread less

Abstract: We consider an interactive approach to DNA sequencing by hybridization, where we are permitted to ask questions of the form "is s a substring of the unknown sequence S?", where s is a specific query string. We are not told where s occurs in S, nor how many times it occurs, just whether or not s a substring of S. Our goal is to determine the exact contents of S using as few queries as possible. Through interaction, far fewer queries are necessary than using conventional fixed sequencing by hybridization (SBH) sequencing chips. We provide tight bounds on the complexity of reconstructing unknown strings from substring queries. Our lower bound, which holds even for a stronger model that returns the number of occurrence of s as a substring of S, relies on interesting arguments based on de Bruijn sequences. We also demonstrate that subsequence queries are significantly more powerful than substring queries, matching the information theoretic lower bound. Finally, in certain applications, something may already be known about the unknown string, and hence it can be determined faster than an arbitrary string. We show that building an optimal decision tree is NP-complete, then give an approximation algorithm that gives trees within a constant multiplicative factor of optimal.

...read moreread less

Journal Article•DOI•

Physical mapping by STS hybridization: algorithmic strategies and the challenge of software evaluation.

[...]

David S. Greenberg¹, Sorin Istrail•Institutions (1)

Sandia National Laboratories¹

01 Jan 1995-Journal of Computational Biology

TL;DR: This paper examines the construction of physical maps from hybridization data between sequence tag sites (STS) probes and clones of genomic fragments and proves that only certain types of mapping information can be reliably calculated by any algorithm.

...read moreread less

Abstract: An important tool in the analysis of genomic sequences is the physical map. In this paper we examine the construction of physical maps from hybridization data between sequence tag sites (STS) probes and clones of genomic fragments. An algorithmic theory of the mapping process, a proposed performance evaluation procedure, and several new algorithmic strategies for mapping are given. A unifying theme for these developments is the idea of a "conservative extension." An algorithm, measure of algorithm quality, or description of physical map is a conservative extension if it is a generalization for data with errors of a corresponding concept in the error-free case. In our algorithmic theory we show that the nature of hybridization experiments imposes inherent limitations on the mapping information recorded in the experimental data. We prove that only certain types of mapping information can be reliably calculated by any algorithm. A test generator is then presented along with quantitative measures for determining how much of the possible information is being computed by a given algorithm. Weaknesses and strengths of these measures are discussed. Each of the new algorithms presented in this paper is based on combinatorial optimizations. Despite the fact that all the optimizations are NP-complete, we have developed algorithmic tools for the design of competitive approximation algorithms. We apply our performance evaluation program to our algorithms and obtain solid evidence that the algorithms are capable of retrieving high-level reliable mapping information.

...read moreread less

Journal Article•DOI•

Computing with DNA.

[...]

Donald Beaver¹•Institutions (1)

Pennsylvania State University¹

01 Jan 1995-Journal of Computational Biology

TL;DR: This work considers molecular models for computing and derives a DNA-based mechanism for solving intractable problems through massive parallelism, and suggests that such methods might reduce the effort needed to solve otherwise difficult tasks.

...read moreread less

Abstract: We consider molecular models for computing and derive a DNA-based mechanism for solving intractable problems through massive parallelism. In principle, such methods might reduce the effort needed to solve otherwise difficult tasks, such as factoring large numbers, a computationally intensive task whose intractability forms the basis for much of modern cryptography. Key words: DNA; nanotechnology; recombination; site-directed mutagenesis; intractability; combinatorial search; NP-completeness

...read moreread less

Journal Article•DOI•

Characterizing heterogeneous molecular biology database systems.

[...]

Victor Markowitz¹, Otto Ritter•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Jan 1995-Journal of Computational Biology

TL;DR: This paper proposes criteria that would facilitate characterizing, evaluating, and comparing heterogeneous molecular biology database systems and proposes a methodology for evaluating these systems.

...read moreread less

Abstract: Molecular biology data are distributed among multiple databases. Although containing related data, these databases are often isolated and are characterized by various degrees of heterogeneity: they usually represent different views (schemas) of the scientific domain and are implemented using different data management systems. Currently, several systems support managing data in heterogeneous molecular biology databases. Lack of clear criteria for characterizing such systems precludes comprehensive evaluations of these systems or determining their relationships in terms of shared goals and facilities. In this paper, we propose criteria that would facilitate characterizing, evaluating, and comparing heterogeneous molecular biology database systems. Key words: characterization criteria, heterogeneous database systems, molecular biology databases

...read moreread less

Journal Article•DOI•

Classifying and counting linear phylogenetic invariants for the Jukes-Cantor model.

[...]

Mike Steel¹, Y. X. Fu²•Institutions (2)

University of Canterbury¹, University of Texas at Austin²

01 Jan 1995-Journal of Computational Biology

TL;DR: This method applies a recently developed Hadamard matrix-based technique to describe elements of I(T) in terms of edge-disjoint packings of subtrees in T, and thereby complements earlier more algebraic treatments.

...read moreread less

Abstract: Linear invariants are useful tools for testing phylogenetic hypotheses from aligned DNA/ RNA sequences, particularly when the sites evolve at different rates. Here we give a simple, graph theoretic classification for each phylogenetic tree T, of its associated vector space I(T) of linear invariants under the Jukes–Cantor one-parameter model of nucleotide substitution. We also provide an easily described basis for I(T), and show that if T is a binary (fully resolved) phylogenetic tree with n sequences at its leaves then: dim[I(T)] = 4n − F2n−2 where Fn is the nth Fibonacci number. Our method applies a recently developed Hadamard matrix-based technique to describe elements of I(T) in terms of edge-disjoint packings of subtrees in T, and thereby complements earlier more algebraic treatments. Key words: Phylogenetic invariants; trees; forests; Hadamard matrix; Jukes–Cantor model

...read moreread less

Journal Article•DOI•

ORFs and Genes: How Strong a Connection?

[...]

James W. Fickett¹•Institutions (1)

Los Alamos National Laboratory¹

01 Jan 1995-Journal of Computational Biology

TL;DR: Techniques are derived to estimate the conditional probability of gene function, given ORF length, based on evidence both from the databases and from simulation for Saccharomyces cerevisiae.

...read moreread less

Abstract: The length of an open reading frame (ORF) is one important piece of evidence often used in locating new genes, particularly in organisms where splicing is rare. However, there have been no systematic studies quantifying the degree of correlation between length of ORF, on the one hand, and likelihood of gene function, on the other. In this paper, techniques are derived to estimate the conditional probability of gene function, given ORF length, based on evidence both from the databases and from simulation. Several complete chromosomes of Saccharomyces cerevisiae have now been sequenced, and considerable effort is being expended on locating and characterizing the genes in these sequences. Thus, we illustrate the techniques for this organism.

...read moreread less

Journal Article•DOI•

Algorithms for protein structural motif recognition.

[...]

Bonnie Berger¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1995-Journal of Computational Biology

TL;DR: A correlation method that runs in linear time and incorporates pairwise dependencies between amino acid residues at multiple distances to assess the conditional probability that a given residue is part of a given 3D structure is presented.

...read moreread less

Abstract: The identification of protein sequences that fold into certain known three-dimensional (3D) structures, or motifs, is evaluated through a probabilistic analysis of their one-dimensional (1...

...read moreread less

Journal Article•DOI•

Toward a characterization of landscapes of combinatorial optimization problems, with special attention to the phylogeny problem

[...]

Michael A. Charleston¹•Institutions (1)

University of Texas at Austin¹

01 Jan 1995-Journal of Computational Biology

TL;DR: A coherent language base for describing and working with characteristics of combinatorial optimization problems is introduced, which is at once general enough to be used in all such problems and precise enough to allow subtle concepts in this field to be discussed unambiguously.

...read moreread less

Abstract: This article introduces a coherent language base for describing and working with characteristics of combinatorial optimization problems, which is at once general enough to be used in all such problems and precise enough to allow subtle concepts in this field to be discussed unambiguously. An example is provided of how this nomenclature is applied to an instance of the phylogeny problem. Also noted is the beneficial effect, on the landscape of the solution space, of transforming the observed data to account for multiple changes of character state.

...read moreread less

Journal Article•DOI•

Hen's teeth and whale's feet: generalized characters and their compatibility.

[...]

Craig J. Benham¹, Sampath Kannan², Mike Paterson³, Tandy Warnow•Institutions (3)

Icahn School of Medicine at Mount Sinai¹, University of Pennsylvania², University of Warwick³

01 Jan 1995-Journal of Computational Biology

TL;DR: It is shown that the general case of determining perfect compatibility of generalized ordered characters is an NP-complete problem, but can be solved in polynomial time for a special case.

...read moreread less

Abstract: We propose a new model of computation for deriving phylogenetic trees based upon a generalization of qualitative characters. The model we propose is based upon recent experimental research in molecular biology. We show that the general case of determining perfect compatibility of generalized ordered characters is an NP-complete problem, but can be solved in polynomial time for a special case.

...read moreread less

Journal Article•DOI•

A Simplified Proof of the NP- and MAX SNP-Hardness of Multiple Sequence Tree Alignment

[...]

H. Todd Wareham¹•Institutions (1)

University of Victoria¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The results suggest that it is unlikely that the multiple sequence tree alignment problem has polynomial-time algorithms that produce either optimal solutions or approximate solutions whose cost may be arbitrarily close to optimal.

...read moreread less

Abstract: We give a simple proof which shows that the multiple sequence tree alignment problem from molecular biology is both NP-complete and MAX SNP-hard. Our proof of MAX SNP-hardness is simpler than that given previously by Wang and Jiang. These results suggest that it is unlikely that the multiple sequence tree alignment problem has polynomial-time algorithms that produce either optimal solutions or approximate solutions whose cost may be arbitrarily close to optimal. Key words: multiple sequence tree alignment, computational complexity, approximability, NP-complete, MAX SNP-hard

...read moreread less

Journal Article•DOI•

Method for calculation of probability of matching a bounded regular expression in a random data string.

[...]

Roger F. Sewell, Richard Durbin

01 Jan 1995-Journal of Computational Biology

TL;DR: The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty.

...read moreread less

Abstract: A method is presented for determining within strict bounds the probability of matching a regular expression with a match start point in a given section of a random data string The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty

...read moreread less

Journal Article•DOI•

Short superstrings and the structure of overlapping strings.

[...]

Chris Armen¹, Clifford Stein•Institutions (1)

Dartmouth College¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The structure of strings with large amounts of overlap is studied and an algorithm that finds a superstring whose length is no more than 2 3/4 times that of the optimal superstring is given, which matches that of previous algorithms.

...read moreread less

Abstract: Given a collection of strings S = {s1,..., sn} over an alphabet Σ, a superstring α of S is a string containing each si as a substring, that is, for each i, 1 ≤ i ≤ n, α contains a block of...

...read moreread less

Journal Article•DOI•

Constructing lattice models of protein chains with side groups.

[...]

Boris A. Reva¹, Dmitrii S. Rykunov, Arthur J. Olson, Alexei V. Finkelstein•Institutions (1)

Scripps Research Institute¹

01 Jan 1995-Journal of Computational Biology

TL;DR: An algorithm to construct lattice models of polymers with side chains by dynamic programming, making the search for the global minimum of the error function for a given lattice-to-chain orientation both fast and complete.

...read moreread less

Abstract: An algorithm to construct lattice models of polymers with side chains is presented. A search for the global minimum of the error function for a given lattice-to-chain orientation is done by dynamic programming, making the search both fast and complete. Application of the algorithm is illustrated by constructing lattice models for 12 proteins of different sizes and structural types. Key words: protein structure, lattice model, dynamic programming

...read moreread less

Journal Article•DOI•

Fast Algorithms for Inferring Evolutionary Trees

[...]

Richa Agarwala¹, David Fernández-Baca, Giora Slutzki•Institutions (1)

Center for Discrete Mathematics and Theoretical Computer Science¹

01 Jan 1995-Journal of Computational Biology

TL;DR: Algorithms for the perfect phylogeny problem restricted to binary characters and two online algorithms that can process any sequence of additions and deletions of species and characters are presented.

...read moreread less

Abstract: We present algorithms for the perfect phylogeny problem restricted to binary characters. The first algorithm is faster than a previous algorithm by Gusfield when the input matrix for the p...

...read moreread less

Journal Article•DOI•

Heterogeneous Molecular Biology Databases

[...]

Victor Markowitz¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Jan 1995-Journal of Computational Biology

Journal Article•DOI•

Fragment collapsing and splitting while assembling high-resolution restriction maps.

[...]

Will D. Gillett¹, Jim Daues¹, Liz Hanks¹, Robert Capra•Institutions (1)

Washington University in St. Louis¹

01 Jan 1995-Journal of Computational Biology

TL;DR: In this paper, a classical anomaly, known as fragment collapsing, introduces errors into the maps that impedes the construction of high-resolution restriction maps via greedy algorithms, which is called fragment collapsing.

...read moreread less

Abstract: In the process of constructing high-resolution restriction maps via greedy algorithms, a classical anomaly, known as fragment collapsing, introduces errors into the maps that impedes furth...

...read moreread less