scispace - formally typeset
Search or ask a question
Author

Richard Durbin

Bio: Richard Durbin is an academic researcher from University of Cambridge. The author has contributed to research in topics: Genome & Population. The author has an hindex of 125, co-authored 319 publications receiving 207192 citations. Previous affiliations of Richard Durbin include Wellcome Trust Sanger Institute & University of Manchester.
Topics: Genome, Population, Genomics, Gene, Sequence assembly


Papers
More filters
Journal ArticleDOI
Miriam Schmidts1, Yuqing Hou2, Claudio Cortes3, Dorus A. Mans4  +183 moreInstitutions (37)
TL;DR: TCTEX1D2 mutations causing Jeune asphyxiating thoracic dystrophy with partially penetrant inheritance are identified and defined as an integral component of the evolutionarily conserved retrograde IFT machinery.
Abstract: The analysis of individuals with ciliary chondrodysplasias can shed light on sensitive mechanisms controlling ciliogenesis and cell signalling that are essential to embryonic development and survival. Here we identify TCTEX1D2 mutations causing Jeune asphyxiating thoracic dystrophy with partially penetrant inheritance. Loss of TCTEX1D2 impairs retrograde intraflagellar transport (IFT) in humans and the protist Chlamydomonas, accompanied by destabilization of the retrograde IFT dynein motor. We thus define TCTEX1D2 as an integral component of the evolutionarily conserved retrograde IFT machinery. In complex with several IFT dynein light chains, it is required for correct vertebrate skeletal formation but may be functionally redundant under certain conditions.

53 citations

Journal ArticleDOI
TL;DR: The concept of a hidden Markov model (HMM) to evolutionary trees which allows what may be loosely regarded as learnable affine-type gap penalties for alignments is extended and an alignment algorithm is defined which fails to find global optima for realistic sequence sets.
Abstract: There has been considerable interest in the problem of making maximum likelihood (ML) evolutionary trees which allow insertions and deletions. This problem is partly one of formulation: how does one define a probabilistic model for such trees which treats insertion and deletion in a biologically plausible manner? A possible answer to this question is proposed here by extending the concept of a hidden Markov model (HMM) to evolutionary trees. The model, called a tree-HMM, allows what may be loosely regarded as learnable affine-type gap penalties for alignments. These penalties are expressed in HMMs as probabilities of transitions between states. In the tree-HMM, this idea is given an evolutionary embodiment by defining trees of transitions. Just as the probability of a tree composed of ungapped sequences is computed, by Felsenstein's method, using matrices representing the probabilities of substitutions of residues along the edges of the tree, so the probabilities in a tree-HMM are computed by substitution matrices for both residues and transitions. How to define these matrices by a ML procedure using an algorithm that learns from a database of protein sequences is shown here. Given these matrices, one can define a tree-HMM likelihood for a set of sequences, assuming a particular tree topology and an alignment of the sequences to the model. If one could efficiently find the alignment which maximizes (or comes close to maximizing) this likelihood, then one could search for the optimal tree topology for the sequences. An alignment algorithm is defined here which, given a particular tree topology, is guaranteed to increase the likelihood of the model. Unfortunately, it fails to find global optima for realistic sequence sets. Thus further research is needed to turn the tree-HMM into a practical phylogenetic tool.

53 citations

Journal ArticleDOI
TL;DR: Using Java, this work has developed a new visualization tool that allows effective comparative genome sequence analysis and presents the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus.
Abstract: Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology This requires the use of several specific types of computational analysis tools Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis The program handles a pair of sequences from putatively homologous regions in different species Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented

52 citations

Journal ArticleDOI
David R. Bentley1, Panagiotis Deloukas1, Andrew Dunham1, Lisa French1, Simon G. Gregory1, Sean Humphray1, Andrew J. Mungall1, Mark T. Ross1, Nigel P. Carter1, Ian Dunham1, Carol Scott1, K. J. Ashcroft1, A. L. Atkinson1, K. Aubin1, David Beare1, Graeme Bethel1, N. Brady1, J. C. Brook1, D. C. Burford1, W. D. Burrill1, C. Burrows1, Adam Butler1, C. Carder1, J. J. Catanese2, C M Clee1, S. M. Clegg1, V. Cobley1, A. J. Coffey1, Charlotte G. Cole1, John E. Collins1, J. S. Conquer1, R. A. Cooper1, K. M. Culley1, Elisabeth Dawson1, F. L. Dearden1, Richard Durbin1, P. J. De Jong2, P. D. Dhami1, M. E. Earthrowl1, Carol A. Edwards1, R Evans1, Christopher J. Gillson1, J. Ghori1, L D Green1, Rhian Gwilliam1, K. S. Halls1, S. Hammond1, G. L. Harper1, R. W. Heathcott1, Jane L. Holden1, E. Holloway1, B. L. Hopkins1, P. J. Howard1, Gareth R. Howell1, E. J. Huckle1, Jaime Hughes1, P. J. Hunt1, Sarah E. Hunt1, M. Izmajlowicz1, C. A. Jones1, Soumi Joseph1, G. Laird1, Cordelia Langford1, M. H. Lehvaslaiho1, M.A. Leversha1, Owen T. McCann1, Louise McDonald1, Jennifer McDowall1, G. L. Maslen1, D. Mistry1, Nicholas K. Moschonas3, Vassos Neocleous4, D. M. Pearson1, K. J. Phillips1, K. M. Porter1, S. R. Prathalingam1, Y. H. Ramsey1, S. A. Ranby1, C. M. Rice1, Jane Rogers1, L. J. Rogers1, Theologia Sarafidou3, D. J. Scott1, G. J. Sharp1, C. J. Shaw-Smith1, Luc J. Smink1, Carol Soderlund1, E. C. Sotheran1, Helen E. Steingruber1, John Sulston1, A. Taylor1, Rohan Taylor1, A. A. Thorpe1, E. J. Tinsley1, Georgina Warry1, Adam Whittaker1, Pamela Whittaker1, S. H. Williams1, T. E. Wilmer1, Richard Wooster1, C. L. Wright1 
15 Feb 2001-Nature
TL;DR: By measuring the remaining gaps, this work can assess chromosome length and coverage in sequenced clones and establish the long-range organization of the maps early in the project.
Abstract: We constructed maps for eight chromosomes (1, 6, 9, 10, 13, 20, X and (previously) 22), representing one-third of the genome, by building landmark maps, isolating bacterial clones and assembling contigs. By this approach, we could establish the long-range organization of the maps early in the project, and all contig extension, gap closure and problem-solving was simplified by containment within local regions. The maps currently represent more than 94% of the euchromatic (gene-containing) regions of these chromosomes in 176 contigs, and contain 96% of the chromosome-specific markers in the human gene map. By measuring the remaining gaps, we can assess chromosome length and coverage in sequenced clones.

50 citations

Journal ArticleDOI
TL;DR: The mitoVGP as discussed by the authors is a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10-kbp, PacBio or Nanopore) and short (100-300-bp, Illumina) reads, leading to successful complete mitogenome assemblies of 100 vertebrate species of the VGP.
Abstract: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.

48 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Abstract: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.

43,540 citations

Journal ArticleDOI
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Abstract: Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic Contact: ed.nehcaa-htwr.1oib@ledasu Supplementary information: Supplementary data are available at Bioinformatics online.

39,291 citations