scispace - formally typeset
Search or ask a question

Showing papers by "Richard Durbin published in 1995"


Journal ArticleDOI
29 Dec 1995-Gene
TL;DR: DOTTER2, a dot-plot program for X-windows which can compare DNA or protein sequences, and also DNA versus protein, is presented here, and how DOTTER can improve gene modelling is illustrated.

765 citations


Journal ArticleDOI
TL;DR: The maximum discrimination method for building hidden Markov models (HMMs) of protein or nucleic acid primary sequence consensus compensates for biased representation in sequence data sets, superseding the need for sequence weighting methods.
Abstract: We introduce a maximum discrimination method for building hidden Markov models (HMMs) of protein or nucleic acid primary sequence consensus. The method compensates for biased representation in sequence data sets, superseding the need for sequence weighting methods. Maximum discrimination HMMs are more sensitive for detecting distant sequence homologs than various other HMM methods or BLAST when tested on globin and protein kinase catalytic domain sequences. Key words: hidden Markov model; database searching; sequence consensus; sequence weighting

272 citations


Journal ArticleDOI
TL;DR: This work cloned and sequenced the homologue of the HD gene in the pufferfish, Fugu rubripes, and describes a detailed example of sequence comparison between human and Fugu, which illustrates the power of the Pufferfish genome as a model system in the analysis of human genes.
Abstract: The Huntington's disease (HD) gene encodes a novel protein with as yet no known function. In order to identify the functionally important domains of this protein, we have cloned and sequenced the homologue of the HD gene in the pufferfish, Fugu rubripes. The Fugu HD gene spans only 23 kb of genomic DNA, compared to the 170 kb human gene, and yet all 67 exons are conserved. The first coding exon, the site of the disease-causing triplet repeat, is highly conserved. However, the glutamine repeat in Fugu consists of just four residues. We also show that gene order may be conserved over longer stretches of the two genomes. Our work describes a detailed example of sequence comparison between human and Fugu, and illustrates the power of the pufferfish genome as a model system in the analysis of human genes.

134 citations


Book ChapterDOI
TL;DR: This chapter provides an overview of ACeDB for the C. elegans user, focusing in particular on the Macintosh version Macace, and describes methods to obtain ACe DB and documentation for it, ways to access and use the information inACeDB, and examines the use of ACEDB as a laboratory-based data managing system.
Abstract: Publisher Summary This chapter discusses ACeDB (A Caenorhabditis elegans Data Base) and Macace. ACeDB is a data management and display system that contains a wide range of genomic and other information about C. elegans. This chapter provides an overview of ACeDB for the C. elegans user, focusing in particular on the Macintosh version Macace. This chapter describes methods to obtain ACeDB and documentation for it, ways to access and use the information in ACeDB, and examines the use of ACeDB as a laboratory-based data managing system. ACeDB is distributed primarily by anonymous file transfer program (ftp), which allows users to copy the database onto their own local computer. As a result, people can have full rapid access to all the data but they must have powerful computing systems and a non-negligible amount of hard disk space. Updates are released periodically and are announced via an e-mail mailing list and through the ACeDB and C. elegans news groups. For the UNIX version, these updates can be obtained by anonymous ftp and added to the local database by a simple command inside ACeDB. During every update, the Macintosh version is typically recopied in its entirety.

74 citations


Journal ArticleDOI
TL;DR: The concept of a hidden Markov model (HMM) to evolutionary trees which allows what may be loosely regarded as learnable affine-type gap penalties for alignments is extended and an alignment algorithm is defined which fails to find global optima for realistic sequence sets.
Abstract: There has been considerable interest in the problem of making maximum likelihood (ML) evolutionary trees which allow insertions and deletions. This problem is partly one of formulation: how does one define a probabilistic model for such trees which treats insertion and deletion in a biologically plausible manner? A possible answer to this question is proposed here by extending the concept of a hidden Markov model (HMM) to evolutionary trees. The model, called a tree-HMM, allows what may be loosely regarded as learnable affine-type gap penalties for alignments. These penalties are expressed in HMMs as probabilities of transitions between states. In the tree-HMM, this idea is given an evolutionary embodiment by defining trees of transitions. Just as the probability of a tree composed of ungapped sequences is computed, by Felsenstein's method, using matrices representing the probabilities of substitutions of residues along the edges of the tree, so the probabilities in a tree-HMM are computed by substitution matrices for both residues and transitions. How to define these matrices by a ML procedure using an algorithm that learns from a database of protein sequences is shown here. Given these matrices, one can define a tree-HMM likelihood for a set of sequences, assuming a particular tree topology and an alignment of the sequences to the model. If one could efficiently find the alignment which maximizes (or comes close to maximizing) this likelihood, then one could search for the optimal tree topology for the sequences. An alignment algorithm is defined here which, given a particular tree topology, is guaranteed to increase the likelihood of the model. Unfortunately, it fails to find global optima for realistic sequence sets. Thus further research is needed to turn the tree-HMM into a practical phylogenetic tool.

53 citations


Journal ArticleDOI
TL;DR: The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty.
Abstract: A method is presented for determining within strict bounds the probability of matching a regular expression with a match start point in a given section of a random data string The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty

19 citations