Showing papers by "Richard Durbin published in 1995"

PDF

Open Access

Journal Article•DOI•

A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis.

[...]

Erik L. L. Sonnhammer, Richard Durbin¹•Institutions (1)

29 Dec 1995-Gene

TL;DR: DOTTER2, a dot-plot program for X-windows which can compare DNA or protein sequences, and also DNA versus protein, is presented here, and how DOTTER can improve gene modelling is illustrated.

...read moreread less

765 citations

Journal Article•DOI•

Maximum Discrimination Hidden Markov Models of Sequence Consensus

[...]

Sean R. Eddy¹, Graeme Mitchison, Richard Durbin•Institutions (1)

Washington University in St. Louis¹

01 Jan 1995-Journal of Computational Biology

TL;DR: The maximum discrimination method for building hidden Markov models (HMMs) of protein or nucleic acid primary sequence consensus compensates for biased representation in sequence data sets, superseding the need for sequence weighting methods.

...read moreread less

Abstract: We introduce a maximum discrimination method for building hidden Markov models (HMMs) of protein or nucleic acid primary sequence consensus. The method compensates for biased representation in sequence data sets, superseding the need for sequence weighting methods. Maximum discrimination HMMs are more sensitive for detecting distant sequence homologs than various other HMM methods or BLAST when tested on globin and protein kinase catalytic domain sequences. Key words: hidden Markov model; database searching; sequence consensus; sequence weighting

...read moreread less

272 citations

Journal Article•DOI•

Comparative sequence analysis of the human and pufferfish Huntington's disease genes.

[...]

Sarah Baxendale, Sarah Abdulla, Greg Elgar¹, David Buck, Mary Berks, Gos Micklem, Richard Durbin, Gill Bates², Sydney Brenner¹, Stephan Beck, Hans Lehrach - Show less +7 more•Institutions (2)

University of Cambridge¹, Guy's Hospital²

01 May 1995-Nature Genetics

TL;DR: This work cloned and sequenced the homologue of the HD gene in the pufferfish, Fugu rubripes, and describes a detailed example of sequence comparison between human and Fugu, which illustrates the power of the Pufferfish genome as a model system in the analysis of human genes.

...read moreread less

Abstract: The Huntington's disease (HD) gene encodes a novel protein with as yet no known function. In order to identify the functionally important domains of this protein, we have cloned and sequenced the homologue of the HD gene in the pufferfish, Fugu rubripes. The Fugu HD gene spans only 23 kb of genomic DNA, compared to the 170 kb human gene, and yet all 67 exons are conserved. The first coding exon, the site of the disease-causing triplet repeat, is highly conserved. However, the glutamine repeat in Fugu consists of just four residues. We also show that gene order may be conserved over longer stretches of the two genomes. Our work describes a detailed example of sequence comparison between human and Fugu, and illustrates the power of the pufferfish genome as a model system in the analysis of human genes.

...read moreread less

134 citations

Book Chapter•DOI•

ACeDB and macace.

[...]

Frank H. Eeckman, Richard Durbin

01 Jan 1995-Methods in Cell Biology

TL;DR: This chapter provides an overview of ACeDB for the C. elegans user, focusing in particular on the Macintosh version Macace, and describes methods to obtain ACe DB and documentation for it, ways to access and use the information inACeDB, and examines the use of ACEDB as a laboratory-based data managing system.

...read moreread less

Abstract: Publisher Summary This chapter discusses ACeDB (A Caenorhabditis elegans Data Base) and Macace. ACeDB is a data management and display system that contains a wide range of genomic and other information about C. elegans. This chapter provides an overview of ACeDB for the C. elegans user, focusing in particular on the Macintosh version Macace. This chapter describes methods to obtain ACeDB and documentation for it, ways to access and use the information in ACeDB, and examines the use of ACeDB as a laboratory-based data managing system. ACeDB is distributed primarily by anonymous file transfer program (ftp), which allows users to copy the database onto their own local computer. As a result, people can have full rapid access to all the data but they must have powerful computing systems and a non-negligible amount of hard disk space. Updates are released periodically and are announced via an e-mail mailing list and through the ACeDB and C. elegans news groups. For the UNIX version, these updates can be obtained by anonymous ftp and added to the local database by a simple command inside ACeDB. During every update, the Macintosh version is typically recopied in its entirety.

...read moreread less

74 citations

Journal Article•DOI•

Tree-based maximal likelihood substitution matrices and hidden Markov models

[...]

Graeme Mitchison¹, Richard Durbin¹•Institutions (1)

Laboratory of Molecular Biology¹

01 Dec 1995-Journal of Molecular Evolution

TL;DR: The concept of a hidden Markov model (HMM) to evolutionary trees which allows what may be loosely regarded as learnable affine-type gap penalties for alignments is extended and an alignment algorithm is defined which fails to find global optima for realistic sequence sets.

...read moreread less

Abstract: There has been considerable interest in the problem of making maximum likelihood (ML) evolutionary trees which allow insertions and deletions. This problem is partly one of formulation: how does one define a probabilistic model for such trees which treats insertion and deletion in a biologically plausible manner? A possible answer to this question is proposed here by extending the concept of a hidden Markov model (HMM) to evolutionary trees. The model, called a tree-HMM, allows what may be loosely regarded as learnable affine-type gap penalties for alignments. These penalties are expressed in HMMs as probabilities of transitions between states. In the tree-HMM, this idea is given an evolutionary embodiment by defining trees of transitions. Just as the probability of a tree composed of ungapped sequences is computed, by Felsenstein's method, using matrices representing the probabilities of substitutions of residues along the edges of the tree, so the probabilities in a tree-HMM are computed by substitution matrices for both residues and transitions. How to define these matrices by a ML procedure using an algorithm that learns from a database of protein sequences is shown here. Given these matrices, one can define a tree-HMM likelihood for a set of sequences, assuming a particular tree topology and an alignment of the sequences to the model. If one could efficiently find the alignment which maximizes (or comes close to maximizing) this likelihood, then one could search for the optimal tree topology for the sequences. An alignment algorithm is defined here which, given a particular tree topology, is guaranteed to increase the likelihood of the model. Unfortunately, it fails to find global optima for realistic sequence sets. Thus further research is needed to turn the tree-HMM into a practical phylogenetic tool.

...read moreread less

53 citations

Journal Article•DOI•

Method for calculation of probability of matching a bounded regular expression in a random data string.

[...]

Roger F. Sewell, Richard Durbin

01 Jan 1995-Journal of Computational Biology

TL;DR: The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty.

...read moreread less

Abstract: A method is presented for determining within strict bounds the probability of matching a regular expression with a match start point in a given section of a random data string The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty

...read moreread less

19 citations