scispace - formally typeset
Search or ask a question

Showing papers by "Richard Durbin published in 1997"


Journal ArticleDOI
01 Jul 1997-Proteins
TL;DR: A database based on hidden Markov model profiles (HMMs), which combines high quality and completeness, and a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified.
Abstract: Databases of multiple se- quence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a data- base is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas complete- ness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-Ais curated and contains well-character- ized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated auto- matically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from theCaenorhabditis elegans genome project were classified. We have also identified many novel family member- ships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-Afamilies have perma- nent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405-420, 1997. r1997 Wiley-Liss, Inc.

1,283 citations


Journal ArticleDOI
01 Dec 1997-Genomics
TL;DR: A comprehensive analysis of protein domain families in Wormpep 11, which comprises 7299 proteins, finds that over two-thirds of the currently known human proteins are likely to have a homologue in the whole C. elegans genome and that a significant number of proteins are well conserved between C. Edwards and H. influenzae.

142 citations


Proceedings Article
21 Jun 1997
TL;DR: This work illustrates the Dynamite syntax and flexibility by showing definitions for dynamic programming routines to align two protein sequences under the assumption that they are both poly-topic trans Membrane proteins, with the simultaneous assignment of transmembrane helices.
Abstract: We have developed a code generating language, called Dynamite, specialised for the production and subsequent manipulation of complex dynamic programming methods for biological sequence comparison. From a relatively simple text definition file Dynamite will produce a variety of implementations of a dynamic programming method, including database searches and linear space alignments. The speed of the generated code is comparable to hand written code, and the additional flexibility has proved invaluable in designing and testing new algorithms. An innovation is a flexible labelling system, which can be used to annotate the original sequences with biological information. We illustrate the Dynamite syntax and flexibility by showing definitions for dynamic programming routines (i) to align two protein sequences under the assumption that they are both poly-topic transmembrane proteins, with the simultaneous assignment of transmembrane helices and (ii) to align protein information to genomic DNA, allowing for introns and sequencing error.

118 citations


Journal ArticleDOI
TL;DR: Tc7 shares with Tc1 all the sequences minimally required to parasitize upon the T c1 transposition machinery, and the genomic distribution of Tc7 shows a striking clustering on the X chromosome where two thirds of the elements are located.
Abstract: We have found a novel transposon in the genome of Caenorhabditis elegans. Tc7 is a 921 bp element, made up of two 345 bp inverted repeats separated by a unique, internal sequence. Tc7 does not contain an open reading frame. The outer 38 bp of the inverted repeat show 36 matches with the outer 38 bp of Tc1. This region of Tc1 contains the Tc1-transposase binding site. Furthermore, Tc7 is flanked by TA dinucleotides, just like Tc1, which presumably correspond to the target duplication generated upon integration. Since Tc7 does not encode its own transposase but contains the Tc1-transposase binding site at its extremities, we tested the ability of Tc7 to jump upon forced expression of Tc1 transposase in somatic cells. Under these conditions Tc7 jumps at a frequency similar to Tc1. The target site choice of Tc7 is identical to that of Tc1. These data suggest that Tc7 shares with Tc1 all the sequences minimally required to parasitize upon the Tc1 transposition machinery. The genomic distribution of Tc7 shows a striking clustering on the X chromosome where two thirds of the elements (20 out of 33) are located. Related transposons in C. elegans do not show this asymmetric distribution.

31 citations


Journal ArticleDOI
TL;DR: An accessory program ‘Angler’ can be used to browse sectional Nomarski images of the worm embryo during early development, and to relate these images to overlaid cell lineage data and 3-D schematic views of cell positions.

23 citations