scispace - formally typeset
Search or ask a question
Author

Richard Durbin

Bio: Richard Durbin is an academic researcher from University of Cambridge. The author has contributed to research in topics: Genome & Population. The author has an hindex of 125, co-authored 319 publications receiving 207192 citations. Previous affiliations of Richard Durbin include Wellcome Trust Sanger Institute & University of Manchester.
Topics: Genome, Population, Genomics, Gene, Sequence assembly


Papers
More filters
Journal ArticleDOI
Anna-Sapfo Malaspinas1, Anna-Sapfo Malaspinas2, Anna-Sapfo Malaspinas3, Michael C. Westaway4, Craig Muller1, Vitor C. Sousa3, Vitor C. Sousa2, Oscar Lao5, Isabel Alves2, Isabel Alves3, Isabel Alves6, Anders Bergström7, Georgios Athanasiadis8, Jade Yu Cheng9, Jade Yu Cheng8, Jacob E. Crawford9, Tim H. Heupink4, Enrico Macholdt10, Stephan Peischl2, Stephan Peischl3, Simon Rasmussen11, Stephan Schiffels10, Sankar Subramanian4, Joanne L. Wright4, Anders Albrechtsen1, Chiara Barbieri10, Isabelle Dupanloup3, Isabelle Dupanloup2, Anders Eriksson12, Anders Eriksson13, Ashot Margaryan1, Ida Moltke1, Irina Pugach10, Thorfinn Sand Korneliussen1, Ivan P. Levkivskyi14, J. Víctor Moreno-Mayar1, Shengyu Ni10, Fernando Racimo9, Martin Sikora1, Yali Xue7, Farhang Aghakhanian15, Nicolas Brucato16, Søren Brunak1, Paula F. Campos1, Paula F. Campos17, Warren Clark, Sturla Ellingvåg, Gudjugudju Fourmile, Pascale Gerbault18, Darren Injie, George Koki19, Matthew Leavesley20, Betty Logan, Aubrey Lynch, Elizabeth Matisoo-Smith21, Peter McAllister, Alexander J. Mentzer22, Mait Metspalu23, Andrea Bamberg Migliano18, Les Murgha, Maude E. Phipps15, William Pomat19, Doc Reynolds, François-Xavier Ricaut16, Peter Siba19, Mark G. Thomas18, Thomas Wales, Colleen Ma Run Wall, Stephen Oppenheimer24, Chris Tyler-Smith7, Richard Durbin7, Joe Dortch25, Andrea Manica12, Mikkel H. Schierup8, Robert Foley1, Robert Foley12, Marta Mirazón Lahr12, Marta Mirazón Lahr1, Claire Bowern26, Jeffrey D. Wall27, Thomas Mailund8, Mark Stoneking10, Rasmus Nielsen1, Rasmus Nielsen9, Manjinder S. Sandhu7, Laurent Excoffier3, Laurent Excoffier2, David M. Lambert4, Eske Willerslev7, Eske Willerslev12, Eske Willerslev1 
13 Oct 2016-Nature
TL;DR: A population expansion in northeast Australia during the Holocene epoch associated with limited gene flow from this region to the rest of Australia, consistent with the spread of the Pama–Nyungan languages is inferred.
Abstract: The population history of Aboriginal Australians remains largely uncharacterized. Here we generate high-coverage genomes for 83 Aboriginal Australians (speakers of Pama–Nyungan languages) and 25 Papuans from the New Guinea Highlands. We find that Papuan and Aboriginal Australian ancestors diversified 25–40 thousand years ago (kya), suggesting pre-Holocene population structure in the ancient continent of Sahul (Australia, New Guinea and Tasmania). However, all of the studied Aboriginal Australians descend from a single founding population that differentiated ~10–32 kya. We infer a population expansion in northeast Australia during the Holocene epoch (past 10,000 years) associated with limited gene flow from this region to the rest of Australia, consistent with the spread of the Pama–Nyungan languages. We estimate that Aboriginal Australians and Papuans diverged from Eurasians 51–72 kya, following a single out-of-Africa dispersal, and subsequently admixed with archaic populations. Finally, we report evidence of selection in Aboriginal Australians potentially associated with living in the desert.

389 citations

Journal ArticleDOI
TL;DR: New species and data types now available at WormBase are described and enhancements to the curatorial pipeline and website infrastructure are detail to accommodate new genomes and an extensive user base.
Abstract: WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base.

389 citations

Journal ArticleDOI
TL;DR: Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of S. cerevisiae shows a few well defined geographically isolated lineages and many different mosaics of these lineages, supporting the notion that human influence provided the opportunity for outbreeding and production of new combinations of pre-existing variation.
Abstract: The natural genetics of an organism is determined by the distribution of sequences of its genome. Here we present one- to four-fold, with some deeper, coverage of the genome sequences of over seventy isolates of the domesticated baker's yeast, Saccharomyces cerevisiae, and its closest relative, the wild S. paradoxus, which has never been associated with human activity. These were collected from numerous geographic locations and sources (including wild, clinical, baking, wine, laboratory and food spoilage). These sequences provide an unprecedented view of the population structure, natural (and artificial) selection and genome evolution in these species. Variation in gene content, SNPs, indels, copy numbers and transposable elements provide insights into the evolution of different lineages. Phenotypic variation broadly correlates with global genome-wide phylogenetic relationships however there is no correlation with source. S. paradoxus populations are well delineated along geographic boundaries while the variation among worldwide S. cerevisiae isolates show less differentiation and is comparable to a single S. paradoxus population. Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of S. cerevisiae shows a few well defined geographically isolated lineages and many different mosaics of these lineages, supporting the notion that human influence provided the opportunity for outbreeding and production of new combinations of pre-existing variation.

377 citations

Journal ArticleDOI
TL;DR: Investigation indicates that many of the incorrect gene predictions from GeneWise were due to transposons with valid protein-coding genes and the remaining cases are pseudogenes or possible annotation oversights.
Abstract: The GeneWise method for combining gene prediction and homology searches was applied to the 2.9-Mb region from Drosophila melanogaster. The results from the Genome Annotation Assessment Project (GASP) showed that GeneWise provided reasonably accurate gene predictions. Further investigation indicates that many of the incorrect gene predictions from GeneWise were due to transposons with valid protein-coding genes and the remaining cases are pseudogenes or possible annotation oversights.

374 citations

Journal ArticleDOI
TL;DR: In this paper, the authors compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that the average Finn has more low-frequency loss-of-function variants and complete gene knockouts.
Abstract: Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5-5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10⁻⁸) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P = 1.5×10⁻¹¹⁷). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR = 0.84, P = 3×10⁻⁴), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.

367 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

45,957 citations

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Abstract: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.

43,540 citations

Journal ArticleDOI
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Abstract: Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic Contact: ed.nehcaa-htwr.1oib@ledasu Supplementary information: Supplementary data are available at Bioinformatics online.

39,291 citations