Showing papers by "Richard Durbin published in 2008"

PDF

Open Access

Journal Article•DOI•

Accurate whole human genome sequencing using reversible terminator chemistry

[...]

David R. Bentley¹, Shankar Balasubramanian², Harold Swerdlow¹, Harold Swerdlow³ +198 more•Institutions (4)

06 Nov 2008-Nature

TL;DR: An approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost is reported, effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

...read moreread less

Abstract: DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

...read moreread less

3,802 citations

Journal Article•DOI•

Mapping short DNA sequencing reads and calling variants using mapping quality scores

[...]

Heng Li¹, Jue Ruan, Richard Durbin•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Nov 2008-Genome Research

TL;DR: This work describes the software MAQ, software that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample.

...read moreread less

Abstract: New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

...read moreread less

2,927 citations

Journal Article•DOI•

The diploid genome sequence of an Asian individual.

[...]

Jun Wang, Wei Wang¹, Ruiqiang Li¹, Ruiqiang Li², Yingrui Li³, Yingrui Li⁴, Yingrui Li¹, Geng Tian¹, Geng Tian⁵, Laurie Goodman¹, Wei Fan¹, Junqing Zhang¹, Jun Li¹, Juanbin Zhang¹, Yiran Guo⁵, Yiran Guo¹, Binxiao Feng¹, Heng Li⁶, Heng Li¹, Yao Lu¹, Xiaodong Fang¹, Huiqing Liang¹, Zhenglin Du¹, Dong Li¹, Yiqing Zhao¹, Yiqing Zhao⁵, Yujie Hu⁵, Yujie Hu¹, Zhenzhen Yang¹, Hancheng Zheng¹, Ines Hellmann⁷, Michael Inouye⁶, John E. Pool⁷, Xin Yi¹, Xin Yi⁵, Jing Zhao¹, Jinjie Duan¹, Yan Zhou¹, Junjie Qin⁵, Junjie Qin¹, Lijia Ma¹, Lijia Ma⁵, Guoqing Li¹, Zhentao Yang¹, Guojie Zhang⁵, Guojie Zhang¹, Bin Yang¹, Chang Yu¹, Fang Liang⁵, Fang Liang¹, Wenjie Li¹, Shaochuan Li¹, Dawei Li¹, Peixiang Ni¹, Jue Ruan¹, Jue Ruan⁵, Qibin Li⁵, Qibin Li¹, Hongmei Zhu¹, Dongyuan Liu¹, Zhike Lu¹, Ning Li⁵, Ning Li¹, Guangwu Guo⁵, Guangwu Guo¹, Jianguo Zhang¹, Jia Ye¹, Lin Fang¹, Qin Hao⁵, Qin Hao¹, Quan Chen¹, Quan Chen⁴, Yu Liang⁵, Yu Liang¹, Yeyang Su⁵, Yeyang Su¹, A. san¹, A. san⁵, Cuo Ping¹, Cuo Ping⁵, Shuang Yang¹, Fang Chen¹, Fang Chen⁵, Li Li¹, Ke Zhou¹, Hongkun Zheng¹, Hongkun Zheng², Yuanyuan Ren¹, Ling Yang¹, Yang Gao¹, Yang Gao³, Guohua Yang⁸, Guohua Yang¹, Zhuo Li¹, Xiaoli Feng¹, Karsten Kristiansen², Gane Ka-Shu Wong¹, Gane Ka-Shu Wong⁹, Rasmus Nielsen⁷, Richard Durbin⁶, Lars Bolund¹, Lars Bolund¹⁰, Xiuqing Zhang³, Xiuqing Zhang¹, Songgang Li⁴, Songgang Li¹, Songgang Li⁸, Huanming Yang¹, Huanming Yang⁸, Jian Wang⁸, Jian Wang¹ - Show less +107 more•Institutions (10)

Beijing Genomics Institute¹, University of Southern Denmark², Beijing Institute of Genomics³, Peking University⁴, Chinese Academy of Sciences⁵, Wellcome Trust Sanger Institute⁶, University of California, Berkeley⁷, Shenzhen University⁸, University of Alberta⁹, Aarhus University¹⁰

06 Nov 2008-Nature

TL;DR: Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly, and the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

Abstract: Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.

...read moreread less

963 citations

Journal Article•DOI•

Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing

[...]

Peter J. Campbell¹, Philip J. Stephens¹, Erin Pleasance¹, Sarah O’Meara¹, Heng Li¹, Thomas Santarius¹, Thomas Santarius², Lucy Stebbings¹, Catherine Leroy¹, Sarah Edkins¹, Claire Hardy¹, Jon W. Teague¹, Andrew Menzies¹, Ian Goodhead¹, Daniel J. Turner¹, C M Clee¹, Michael A. Quail¹, Antony V. Cox¹, Clive Gavin Brown¹, Richard Durbin¹, Matthew E. Hurles¹, Paul A.W. Edwards², Graham R. Bignell¹, Michael R. Stratton¹, P. Andrew Futreal¹ - Show less +21 more•Institutions (2)

Wellcome Trust Sanger Institute¹, University of Cambridge²

01 Jun 2008-Nature Genetics

TL;DR: The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.

...read moreread less

Abstract: Human cancers often carry many somatically acquired genomic rearrangements, some of which may be implicated in cancer development. However, conventional strategies for characterizing rearrangements are laborious and low-throughput and have low sensitivity or poor resolution. We used massively parallel sequencing to generate sequence reads from both ends of short DNA fragments derived from the genomes of two individuals with lung cancer. By investigating read pairs that did not align correctly with respect to each other on the reference human genome, we characterized 306 germline structural variants and 103 somatic rearrangements to the base-pair level of resolution. The patterns of germline and somatic rearrangement were markedly different. Many somatic rearrangements were from amplicons, although rearrangements outside these regions, notably including tandem duplications, were also observed. Some somatic rearrangements led to abnormal transcripts, including two from internal tandem duplications and two fusion transcripts created by interchromosomal rearrangements. Germline variants were predominantly mediated by retrotransposition, often involving AluY and LINE elements. The results demonstrate the feasibility of systematic, genome-wide characterization of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.

...read moreread less

899 citations

Journal Article•DOI•

A large genome center's improvements to the Illumina sequencing system.

[...]

Michael A. Quail¹, Iwanka Kozarewa¹, Frances Smith¹, Aylwyn Scally¹, Philip J. Stephens¹, Richard Durbin¹, Harold Swerdlow¹, Daniel J. Turner¹ - Show less +4 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Dec 2008-Nature Methods

TL;DR: A set of improvements are described to the standard Illumina protocols to make the library preparation more reliable in a high-throughput environment, to reduce bias, tighten insert size distribution and reliably obtain high yields of data.

...read moreread less

Abstract: The Wellcome Trust Sanger Institute is one of the world's largest genome centers, and a substantial amount of our sequencing is performed with 'next-generation' massively parallel sequencing technologies: in June 2008 the quantity of purity-filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64 gigabases. Here we describe a set of improvements we have made to the standard Illumina protocols to make the library preparation more reliable in a high-throughput environment, to reduce bias, tighten insert size distribution and reliably obtain high yields of data.

...read moreread less

730 citations

Journal Article•DOI•

A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis

[...]

Thomas A. Down¹, Vardhman K. Rakyan², Daniel J. Turner³, Paul Flicek⁴, Heng Li³, Eugene Kulesha⁴, Stefan Gräf⁴, Nathan Johnson⁴, Javier Herrero⁴, Eleni M. Tomazou³, Natalie P. Thorne⁵, Liselotte Bäckdahl⁶, Marlis Herberth⁵, Kevin L. Howe⁵, David K. Jackson³, Marcos Mateo Miretti³, John C. Marioni⁵, Ewan Birney⁴, Tim Hubbard³, Richard Durbin³, Simon Tavaré⁵, Stephan Beck⁶ - Show less +18 more•Institutions (6)

Wellcome Trust/Cancer Research UK Gurdon Institute¹, Queen Mary University of London², Wellcome Trust Sanger Institute³, European Bioinformatics Institute⁴, University of Cambridge⁵, University College London⁶

01 Jul 2008-Nature Biotechnology

TL;DR: This work has developed a cross-platform algorithm—Bayesian tool for methylation analysis (Batman)—for analyzing methylated DNA immunoprecipitation profiles generated using oligonucleotide arrays or next-generation sequencing, developed to provide a high-resolution whole-genome DNA methylation profile (DNA methylome) of a mammalian genome.

...read moreread less

Abstract: DNA methylation is an indispensible epigenetic modification required for regulating the expression of mammalian genomes. Immunoprecipitation-based methods for DNA methylome analysis are rapidly shifting the bottleneck in this field from data generation to data analysis, necessitating the development of better analytical tools. In particular, an inability to estimate absolute methylation levels remains a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling. To address this issue, we developed a cross-platform algorithm-Bayesian tool for methylation analysis (Batman)-for analyzing methylated DNA immunoprecipitation (MeDIP) profiles generated using oligonucleotide arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). We developed the latter approach to provide a high-resolution whole-genome DNA methylation profile (DNA methylome) of a mammalian genome. Strong correlation of our data, obtained using mature human spermatozoa, with those obtained using bisulfite sequencing suggest that combining MeDIP-seq or MeDIP-chip with Batman provides a robust, quantitative and cost-effective functional genomic strategy for elucidating the function of DNA methylation.

...read moreread less

651 citations

Journal Article•DOI•

BAC TransgeneOmics: a high-throughput method for exploration of protein function in mammals.

[...]

Ina Poser¹, Mihail Sarov¹, Mihail Sarov², James R. A. Hutchins³, Jean-Karim Hériché⁴, Yusuke Toyoda¹, Andrei Pozniakovsky¹, Daniela Weigl⁵, Anja Nitzsche¹, Björn Hegemann³, Alexander W. Bird¹, Laurence Pelletier⁶, Laurence Pelletier¹, Ralf Kittler⁷, Ralf Kittler¹, Sujun Hua, Ronald Naumann¹, Martina Augsburg¹, Martina M. Sykora³, Helmut Hofemeister², Youming Zhang, Kim Nasmyth⁸, Kevin P. White⁷, Steffen Dietzel⁵, Karl Mechtler³, Richard Durbin⁴, A. Francis Stewart², Jan-Michael Peters³, Frank Buchholz¹, Anthony A. Hyman¹ - Show less +26 more•Institutions (8)

Max Planck Society¹, Dresden University of Technology², Research Institute of Molecular Pathology³, Wellcome Trust Sanger Institute⁴, Ludwig Maximilian University of Munich⁵, University of Toronto⁶, University of Chicago⁷, University of Oxford⁸

01 May 2008-Nature Methods

TL;DR: A fast and reliable pipeline to study protein function in mammalian cells based on protein tagging in bacterial artificial chromosomes (BACs) is described and it is shown that BAC transgenes can be rapidly and reliably generated using 96-well-format recombineering.

...read moreread less

Abstract: The interpretation of genome sequences requires reliable and standardized methods to assess protein function at high throughput. Here we describe a fast and reliable pipeline to study protein function in mammalian cells based on protein tagging in bacterial artificial chromosomes (BACs). The large size of the BAC transgenes ensures the presence of most, if not all, regulatory elements and results in expression that closely matches that of the endogenous gene. We show that BAC transgenes can be rapidly and reliably generated using 96-well-format recombineering. After stable transfection of these transgenes into human tissue culture cells or mouse embryonic stem cells, the localization, protein-protein and/or protein-DNA interactions of the tagged protein are studied using generic, tag-based assays. The same high-throughput approach will be generally applicable to other model systems. NOTE: In the version of this article initially published online, the name of one individual was misspelled in the Acknowledgments. The second sentence of the Acknowledgments paragraph should read, “We thank I. Cheesman for helpful discussions.” The error has been corrected for all versions of the article.

...read moreread less

617 citations

Journal Article•DOI•

Population genomics of domestic and wild yeasts

[...]

David M. Carter¹, Gianni Liti², Alan M. Moses³, Leopold Parts¹, Stephen A. James, Robert P. Davey, Ian N. Roberts, Anders Blomberg⁴, Jonas Warringer⁴, Austin Burt, Vassiliki Koufopanou, Isheng J. Tsai, Casey M. Bergman³, Douda Bensasson³, Michael J. T. O’Kelly², Alexander van Oudenaarden², David B. H. Barton², Elizabeth Bailes², Matthew Jones¹, Michael A. Quail¹, Ian Goodhead¹, Sarah Sims¹, Frances Smith⁵, Richard Durbin⁶, Edward J. Louis⁷ - Show less +21 more•Institutions (7)

Wellcome Trust Sanger Institute¹, University of Nottingham², University of Toronto³, University of Gothenburg⁴, Imperial College London⁵, University of Manchester⁶, Massachusetts Institute of Technology⁷

19 Jun 2008-Nature Precedings

TL;DR: Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of S. cerevisiae shows a few well defined geographically isolated lineages and many different mosaics of these lineages, supporting the notion that human influence provided the opportunity for outbreeding and production of new combinations of pre-existing variation.

...read moreread less

Abstract: The natural genetics of an organism is determined by the distribution of sequences of its genome. Here we present one- to four-fold, with some deeper, coverage of the genome sequences of over seventy isolates of the domesticated baker's yeast, Saccharomyces cerevisiae, and its closest relative, the wild S. paradoxus, which has never been associated with human activity. These were collected from numerous geographic locations and sources (including wild, clinical, baking, wine, laboratory and food spoilage). These sequences provide an unprecedented view of the population structure, natural (and artificial) selection and genome evolution in these species. Variation in gene content, SNPs, indels, copy numbers and transposable elements provide insights into the evolution of different lineages. Phenotypic variation broadly correlates with global genome-wide phylogenetic relationships however there is no correlation with source. S. paradoxus populations are well delineated along geographic boundaries while the variation among worldwide S. cerevisiae isolates show less differentiation and is comparable to a single S. paradoxus population. Rather than one or two domestication events leading to the extant baker's yeasts, the population structure of S. cerevisiae shows a few well defined geographically isolated lineages and many different mosaics of these lineages, supporting the notion that human influence provided the opportunity for outbreeding and production of new combinations of pre-existing variation.

...read moreread less

377 citations

Book Chapter•DOI•

Accounting for non-genetic factors improves the power of eQTL studies

[...]

Oliver Stegle¹, Anitha Kannan², Richard Durbin³, John Winn²•Institutions (3)

University of Cambridge¹, Microsoft², Wellcome Trust Sanger Institute³

30 Mar 2008

TL;DR: This work presents a model that explicitly accounts for non-genetic factors so as to improve significantly the power of an expression Quantitative Trait Loci study, and exploits the inherent block structure of haplotype data to further enhance its sensitivity.

...read moreread less

Abstract: The recent availability of large scale data sets profiling single nucleotide polymorphisms (SNPs) and gene expression across different human populations, has directed much attention towards discovering patterns of genetic variation and their association with gene regulation. The influence of environmental, developmental and other factors on gene expression can obscure such associations. We present a model that explicitly accounts for non-genetic factors so as to improve significantly the power of an expression Quantitative Trait Loci (eQTL) study. Our method also exploits the inherent block structure of haplotype data to further enhance its sensitivity. On data from the HapMap project, we find more than three times as many significant associations than a standard eQTL method.

...read moreread less

35 citations

Journal Article•DOI•

Inferring Selection on Amino Acid Preference in Protein Domains

[...]

Alan M. Moses¹, Richard Durbin¹•Institutions (1)

Wellcome Trust Sanger Institute¹

18 Dec 2008-Molecular Biology and Evolution

TL;DR: It is shown that it is possible to assign preferred and unpreferred states to amino acid changing mutations that occur in protein domains, and that this effect is quantitative, such that there is a correlation between the shift in frequency of preferred alleles and the predicted fitness effect.

...read moreread less

Abstract: Models that explicitly account for the effect of selection on new mutations have been proposed to account for "codon bias" or the excess of "preferred" codons that results from selection for translational efficiency and/or accuracy. In principle, such models can be applied to any mutation that results in a preferred allele, but in most cases, the fitness effect of a specific mutation cannot be predicted. Here we show that it is possible to assign preferred and unpreferred states to amino acid changing mutations that occur in protein domains. We propose that mutations that lead to more common amino acids (at a given position in a domain) can be considered "preferred alleles" just as are synonymous mutations leading to codons for more abundant tRNAs. We use genome-scale polymorphism data to show that alleles for preferred amino acids in protein domains occur at higher frequencies in the population, as has been shown for preferred codons. We show that this effect is quantitative, such that there is a correlation between the shift in frequency of preferred alleles and the predicted fitness effect. As expected, we also observe a reduction in the numbers of polymorphisms and substitutions at more important positions in domains, consistent with stronger selection at those positions. We examine the derived allele frequency distribution and polymorphism to divergence ratios of preferred and unpreferred differences and find evidence for both negative and positive selections acting to maintain protein domains in the human population. Finally, we analyze a model for selection on amino acid preferences in protein domains and find that it is consistent with the quantitative effects that we observe.

...read moreread less

11 citations

Journal Article•DOI•

Erratum: BAC TransgeneOmics: A high-throughput method for exploration of protein function in mammals (Nature Methods (2008) vol. 5 (409-415))

[...]

01 Aug 2008-Nature Methods

TL;DR: In the version of this supplementary file originally posted online, the supplementary figure legends were missing and the error has been corrected online as of 30 July 2008 as discussed by the authors, which is the date of the publication of this article.

...read moreread less

Abstract: Nat. Methods 5, 409–415 (2008). In the version of this supplementary file originally posted online, the supplementary figure legends were missing. The error has been corrected online as of 30 July 2008. The authors also originally omitted an acknowledgment thanking Roberto Iacone for helpful discussions in setting up the 96-well format procedure.

...read moreread less