scispace - formally typeset
Search or ask a question

Showing papers by "Richard Durbin published in 2015"


Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations


01 Oct 2015
TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

3,247 citations


Journal ArticleDOI
01 Oct 2015-Nature
TL;DR: In extensively phenotyped cohorts, insights from sequencing whole genomes or exomes of nearly 10,000 individuals from population-based and disease collections are described and population structure and functional annotation of rare and low-frequency variants are described.
Abstract: The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.

948 citations


01 Jan 2015
TL;DR: The contribution of rare and low-frequency variants to human traits is largely unexplored as mentioned in this paper, but the contribution of these variants to the human traits has not yet been fully explored.
Abstract: The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.

824 citations


Journal ArticleDOI
Maanasa Raghavan1, Matthias Steinrücken2, Matthias Steinrücken3, Kelley Harris2, Stephan Schiffels4, Simon Rasmussen5, Michael DeGiorgio6, Anders Albrechtsen1, Cristina Valdiosera1, Cristina Valdiosera7, María C. Ávila-Arcos1, María C. Ávila-Arcos8, Anna-Sapfo Malaspinas1, Anders Eriksson9, Anders Eriksson10, Ida Moltke1, Mait Metspalu11, Mait Metspalu12, Julian R. Homburger8, Jeffrey D. Wall13, Omar E. Cornejo14, J. Víctor Moreno-Mayar1, Thorfinn Sand Korneliussen1, Tracey Pierre1, Morten Rasmussen8, Morten Rasmussen1, Paula F. Campos1, Paula F. Campos15, Peter de Barros Damgaard1, Morten E. Allentoft1, John Lindo16, Ene Metspalu12, Ene Metspalu11, Ricardo Rodríguez-Varela17, Josefina Mansilla, Celeste Henrickson18, Andaine Seguin-Orlando1, Helena Malmström19, Thomas W. Stafford20, Thomas W. Stafford1, Suyash Shringarpure8, Andrés Moreno-Estrada8, Monika Karmin12, Monika Karmin11, Kristiina Tambets11, Anders Bergström4, Yali Xue4, Vera Warmuth21, Andrew D. Friend9, Joy S. Singarayer22, Paul J. Valdes23, Francois Balloux, Ilán Leboreiro, Jose Luis Vera, Héctor Rangel-Villalobos24, Davide Pettener25, Donata Luiselli25, Loren G. Davis26, Evelyne Heyer27, Christoph P. E. Zollikofer28, Marcia S. Ponce de León28, Colin Smith7, Vaughan Grimes29, Vaughan Grimes30, Kelly-Anne Pike29, Michael Deal29, Benjamin T. Fuller31, Bernardo Arriaza32, Vivien G. Standen32, Maria F. Luz, Francois Ricaut33, Niede Guidon, Ludmila P. Osipova34, Ludmila P. Osipova35, Mikhail Voevoda35, Mikhail Voevoda34, Olga L. Posukh34, Olga L. Posukh35, Oleg Balanovsky, Maria Lavryashina36, Yuri Bogunov, Elza Khusnutdinova34, Elza Khusnutdinova37, Marina Gubina, Elena Balanovska, Sardana A. Fedorova38, Sergey Litvinov34, Sergey Litvinov11, Boris Malyarchuk34, Miroslava Derenko34, M. J. Mosher39, David Archer40, Jerome S. Cybulski41, Jerome S. Cybulski42, Barbara Petzelt, Joycelynn Mitchell, Rosita Worl, Paul Norman8, Peter Parham8, Brian M. Kemp14, Toomas Kivisild9, Toomas Kivisild11, Chris Tyler-Smith4, Manjinder S. Sandhu4, Manjinder S. Sandhu43, Michael H. Crawford44, Richard Villems12, Richard Villems11, David Glenn Smith45, Michael R. Waters46, Ted Goebel46, John R. Johnson47, Ripan S. Malhi16, Mattias Jakobsson19, David J. Meltzer1, David J. Meltzer48, Andrea Manica9, Richard Durbin4, Carlos Bustamante8, Yun S. Song2, Rasmus Nielsen2, Eske Willerslev1 
21 Aug 2015-Science
TL;DR: The results suggest that there has been gene flow between some Native Americans from both North and South America and groups related to East Asians and Australo-Melanesians, the latter possibly through an East Asian route that might have included ancestors of modern Aleutian Islanders.
Abstract: How and when the Americas were populated remains contentious. Using ancient and modern genome-wide data, we found that the ancestors of all present-day Native Americans, including Athabascans and Amerindians, entered the Americas as a single migration wave from Siberia no earlier than 23 thousand years ago (ka) and after no more than an 8000-year isolation period in Beringia. After their arrival to the Americas, ancestral Native Americans diversified into two basal genetic branches around 13 ka, one that is now dispersed across North and South America and the other restricted to North America. Subsequent gene flow resulted in some Native Americans sharing ancestry with present-day East Asians (including Siberians) and, more distantly, Australo-Melanesians. Putative "Paleoamerican" relict populations, including the historical Mexican Pericues and South American Fuego-Patagonians, are not directly related to modern Australo-Melanesians as suggested by the Paleoamerican Model.

459 citations


Journal ArticleDOI
Hou-Feng Zheng1, Vincenzo Forgetta1, Yi-Hsiang Hsu2, Yi-Hsiang Hsu3  +171 moreInstitutions (55)
01 Oct 2015-Nature
TL;DR: Evidence is provided that low‐frequency non‐coding variants have large effects on BMD and fracture, thereby providing rationale for whole‐genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.
Abstract: The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10(-14)), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10(-11)). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.

410 citations


Journal ArticleDOI
TL;DR: It is shown that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling, and a method for combining WGS panels to improve variant coverage and downstream imputations accuracy is presented.
Abstract: Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.

318 citations


Journal ArticleDOI
18 Dec 2015-Science
TL;DR: The discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small crater lake in Tanzania are reported and mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi are suggested.
Abstract: The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700 meters in diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet, and trophic morphology. With whole-genome sequences of 146 fish, we identified 98 clearly demarcated genomic “islands” of high differentiation and demonstrated the association of genotypes across these islands with divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight-vision–associated genes), hormone signaling, and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.

316 citations


Journal ArticleDOI
TL;DR: A model where ASE requires genetic variability in cis, a difference in the sequence of both alleles, but where the magnitude of the ASE effect depends on trans genetic and environmental factors that interact with the cis genetic variants is proposed.
Abstract: Understanding the genetic architecture of gene expression is an intermediate step in understanding the genetic architecture of complex diseases. RNA sequencing technologies have improved the quantification of gene expression and allow measurement of allele-specific expression (ASE). ASE is hypothesized to result from the direct effect of cis regulatory variants, but a proper estimation of the causes of ASE has not been performed thus far. In this study, we take advantage of a sample of twins to measure the relative contributions of genetic and environmental effects to ASE, and we find substantial effects from gene × gene (G×G) and gene × environment (G×E) interactions. We propose a model where ASE requires genetic variability in cis, a difference in the sequence of both alleles, but where the magnitude of the ASE effect depends on trans genetic and environmental factors that interact with the cis genetic variants.

200 citations


Journal ArticleDOI
TL;DR: The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain, so this analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.
Abstract: Natural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the usefulness of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, finding moderate genetic diversity (π = 3 × 10(-3) substitutions/site) and weak global population structure. We estimate that dispersal of S. pombe began during human antiquity (∼340 BCE), and ancestors of these strains reached the Americas at ∼1623 CE. We quantified 74 traits, finding substantial heritable phenotypic diversity. We conducted 223 genome-wide association studies, with 89 traits showing at least one association. The most significant variant for each trait explained 22% of the phenotypic variance on average, with indels having larger effects than SNPs. This analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.

174 citations


Journal ArticleDOI
TL;DR: The models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity and improved analysis tools and updated data reporting formats are required.
Abstract: The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.

Journal ArticleDOI
TL;DR: Both the haplotype and MSMC analyses suggest a predominant northern route out of Africa via Egypt, pointing to Egypt as the more likely gateway in the exodus to the rest of the world.
Abstract: The predominantly African origin of all modern human populations is well established, but the route taken out of Africa is still unclear Two alternative routes, via Egypt and Sinai or across the Bab el Mandeb strait into Arabia, have traditionally been proposed as feasible gateways in light of geographic, paleoclimatic, archaeological, and genetic evidence Distinguishing among these alternatives has been difficult We generated 225 whole-genome sequences (225 at 8× depth, of which 8 were increased to 30×; Illumina HiSeq 2000) from six modern Northeast African populations (100 Egyptians and five Ethiopian populations each represented by 25 individuals) West Eurasian components were masked out, and the remaining African haplotypes were compared with a panel of sub-Saharan African and non-African genomes We showed that masked Northeast African haplotypes overall were more similar to non-African haplotypes and more frequently present outside Africa than were any sets of haplotypes derived from a West African population Furthermore, the masked Egyptian haplotypes showed these properties more markedly than the masked Ethiopian haplotypes, pointing to Egypt as the more likely gateway in the exodus to the rest of the world Using five Ethiopian and three Egyptian high-coverage masked genomes and the multiple sequentially Markovian coalescent (MSMC) approach, we estimated the genetic split times of Egyptians and Ethiopians from non-African populations at 55,000 and 65,000 years ago, respectively, whereas that of West Africans was estimated to be 75,000 years ago Both the haplotype and MSMC analyses thus suggest a predominant northern route out of Africa via Egypt

Journal ArticleDOI
TL;DR: Immunofluorescence analysis can improve diagnosis of PCD in patients with loss-of-function mutations as well as missense variants, and performed high-resolution immunofluorescent analysis of human respiratory cilia.
Abstract: Primary ciliary dyskinesia (PCD) is a genetically heterogeneous recessive disorder caused by several distinct defects in genes responsible for ciliary beating, leading to defective mucociliary clearance often associated with randomization of left/right body asymmetry. Individuals with PCD caused by defective radial spoke (RS) heads are difficult to diagnose owing to lack of gross ultrastructural defects and absence of situs inversus. Thus far, most mutations identified in human radial spoke genes (RSPH) are loss-of-function mutations, and missense variants have been rarely described. We studied the consequences of different RSPH9, RSPH4A, and RSPH1 mutations on the assembly of the RS complex to improve diagnostics in PCD. We report 21 individuals with PCD (16 families) with biallelic mutations in RSPH9, RSPH4A, and RSPH1, including seven novel mutations comprising missense variants, and performed high-resolution immunofluorescence analysis of human respiratory cilia. Missense variants are frequent genetic defects in PCD with RS defects. Absence of RSPH4A due to mutations in RSPH4A results in deficient axonemal assembly of the RS head components RSPH1 and RSPH9. RSPH1 mutant cilia, lacking RSPH1, fail to assemble RSPH9, whereas RSPH9 mutations result in axonemal absence of RSPH9, but do not affect the assembly of the other head proteins, RSPH1 and RSPH4A. Interestingly, our results were identical in individuals carrying loss-of-function mutations, missense variants, or one amino acid deletion. Immunofluorescence analysis can improve diagnosis of PCD in patients with loss-of-function mutations as well as missense variants. RSPH4A is the core protein of the RS head.

Journal ArticleDOI
TL;DR: It is demonstrated that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function as well as common variants that explain ≥20% of the variance in TSH and FT4.
Abstract: Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

Journal ArticleDOI
TL;DR: The broad phenotypic spectrum and pathobiochemistry of individuals with autosomal‐recessive ECHS1 deficiency is described.
Abstract: OBJECTIVE Short-chain enoyl-CoA hydratase (ECHS1) is a multifunctional mitochondrial matrix enzyme that is involved in the oxidation of fatty acids and essential amino acids such as valine. Here, we describe the broad phenotypic spectrum and pathobiochemistry of individuals with autosomal-recessive ECHS1 deficiency. METHODS Using exome sequencing, we identified ten unrelated individuals carrying compound heterozygous or homozygous mutations in ECHS1. Functional investigations in patient-derived fibroblast cell lines included immunoblotting, enzyme activity measurement, and a palmitate loading assay. RESULTS Patients showed a heterogeneous phenotype with disease onset in the first year of life and course ranging from neonatal death to survival into adulthood. The most prominent clinical features were encephalopathy (10/10), deafness (9/9), epilepsy (6/9), optic atrophy (6/10), and cardiomyopathy (4/10). Serum lactate was elevated and brain magnetic resonance imaging showed white matter changes or a Leigh-like pattern resembling disorders of mitochondrial energy metabolism. Analysis of patients' fibroblast cell lines (6/10) provided further evidence for the pathogenicity of the respective mutations by showing reduced ECHS1 protein levels and reduced 2-enoyl-CoA hydratase activity. While serum acylcarnitine profiles were largely normal, in vitro palmitate loading of patient fibroblasts revealed increased butyrylcarnitine, unmasking the functional defect in mitochondrial β-oxidation of short-chain fatty acids. Urinary excretion of 2-methyl-2,3-dihydroxybutyrate - a potential derivative of acryloyl-CoA in the valine catabolic pathway - was significantly increased, indicating impaired valine oxidation. INTERPRETATION In conclusion, we define the phenotypic spectrum of a new syndrome caused by ECHS1 deficiency. We speculate that both the β-oxidation defect and the block in l-valine metabolism, with accumulation of toxic methacrylyl-CoA and acryloyl-CoA, contribute to the disorder that may be amenable to metabolic treatment approaches.


Journal ArticleDOI
Miriam Schmidts1, Yuqing Hou2, Claudio Cortes3, Dorus A. Mans4  +183 moreInstitutions (37)
TL;DR: TCTEX1D2 mutations causing Jeune asphyxiating thoracic dystrophy with partially penetrant inheritance are identified and defined as an integral component of the evolutionarily conserved retrograde IFT machinery.
Abstract: The analysis of individuals with ciliary chondrodysplasias can shed light on sensitive mechanisms controlling ciliogenesis and cell signalling that are essential to embryonic development and survival. Here we identify TCTEX1D2 mutations causing Jeune asphyxiating thoracic dystrophy with partially penetrant inheritance. Loss of TCTEX1D2 impairs retrograde intraflagellar transport (IFT) in humans and the protist Chlamydomonas, accompanied by destabilization of the retrograde IFT dynein motor. We thus define TCTEX1D2 as an integral component of the evolutionarily conserved retrograde IFT machinery. In complex with several IFT dynein light chains, it is required for correct vertebrate skeletal formation but may be functionally redundant under certain conditions.

Posted ContentDOI
14 Nov 2015-bioRxiv
TL;DR: Exome sequenced 3,222 British Pakistani-heritage adults with high parental relatedness, discovering 1,111 rare-variant homozygous likely loss of function (rhLOF) genotypes predicted to disrupt (knockout) 781 genes, and showed meiotic recombination sites localised away from PRDM9-dependent hotspots, demonstratingPRDM9 redundancy in humans.
Abstract: Complete gene knockouts are highly informative about gene function. We exome sequenced 3,222 British Pakistani-heritage adults with high parental relatedness, discovering 1,111 rare-variant homozygous likely loss of function (rhLOF) genotypes predicted to disrupt (knockout) 781 genes. Based on depletion of rhLOF genotypes, we estimate that 13.6% of knockouts are incompatible with adult life, finding on average 1.6 heterozygous recessive lethal LOF variants per adult. Linking to lifelong health records, we observed no association of rhLOF genotypes with prescription- or doctor-consultation rate, and no disease-related phenotypes in 33 of 42 individuals with rhLOF genotypes in recessive Mendelian disease genes. Phased genome sequencing of a healthy PRDM9 knockout mother, her child and controls, showed meiotic recombination sites localised away from PRDM9-dependent hotspots, demonstrating PRDM9 redundancy in humans.

Posted ContentDOI
17 Jul 2015-bioRxiv
TL;DR: Today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and it is estimated that the contemporary East English population derives 30% of its ancestry from Anglo- Saxon migrations, with a lower fraction in Wales and Scotland.
Abstract: British population history has been shaped by a series of immigrations and internal movements, including the early Anglo-Saxon migrations following the breakdown of the Roman administration after 410CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences generated from ten ancient individuals found in archaeological excavations close to Cambridge in the East of England, ranging from 2,300 until 1,200 years before present (Iron Age to Anglo-Saxon period). We use present-day genetic data to characterize the relationship of these ancient individuals to contemporary British and other European populations. By analyzing the distribution of shared rare variants across ancient and modern individuals, we find that today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and estimate that the contemporary East English population derives 30% of its ancestry from Anglo-Saxon migrations, with a lower fraction in Wales and Scotland. We gain further insight with a new method, rarecoal, which fits a demographic model to the distribution of shared rare variants across a large number of samples, enabling fine scale analysis of subtle genetic differences and yielding explicit estimates of population sizes and split times. Using rarecoal we find that the ancestors of the Anglo-Saxon samples are closest to modern Danish and Dutch populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain.

Journal ArticleDOI
TL;DR: Overall HLOF genes are enriched for olfactory receptor function and are expressed in testes more often than expected, consistent with reduced purifying selection and incipient pseudogenisation.
Abstract: Homozygous loss of function (HLOF) variants provide a valuable window on gene function in humans, as well as an inventory of the human genes that are not essential for survival and reproduction. All humans carry at least a few HLOF variants, but the exact number of inactivated genes that can be tolerated is currently unknown—as are the phenotypic effects of losing function for most human genes. Here, we make use of 1432 whole exome sequences from five European populations to expand the catalogue of known human HLOF mutations; after stringent filtering of variants in our dataset, we identify a total of 173 HLOF mutations, 76 (44%) of which have not been observed previously. We find that population isolates are particularly well suited to surveys of novel HLOF genes because individuals in such populations carry extensive runs of homozygosity, which we show are enriched for novel, rare HLOF variants. Further, we make use of extensive phenotypic data to show that most HLOFs, ascertained in population-based samples, appear to have little detectable effect on the phenotype. On the contrary, we document several genes directly implicated in disease that seem to tolerate HLOF variants. Overall HLOF genes are enriched for olfactory receptor function and are expressed in testes more often than expected, consistent with reduced purifying selection and incipient pseudogenisation.

Posted ContentDOI
23 Dec 2015-bioRxiv
TL;DR: A reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry is described, leading to a large increase in the number of SNPs tested in association studies.
Abstract: We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

Journal ArticleDOI
TL;DR: It is demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases the power to discover biologically relevant associations.
Abstract: Statistical factor analysis methods have previously been used to remove noise components from high-dimensional data prior to genetic association mapping and, in a guided fashion, to summarize biologically relevant sources of variation. Here, we show how the derived factors summarizing pathway expression can be used to analyze the relationships between expression, heritability, and aging. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarize patterns of gene expression to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" that summarized patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold ([Formula: see text]). These phenotypes are more heritable ([Formula: see text]) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolizing sugars and fatty acids; others relate to insulin signaling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.

Posted ContentDOI
21 Jul 2015-bioRxiv
TL;DR: This study carried out low-read depth whole-genome sequencing in 568 individuals from three Italian founder populations and compared it to data from other Italian and European populations from the 1000 Genomes Project to conclude that genetic drift and the founder effect should be responsible for the observed purging of deleterious variants.
Abstract: Purging through inbreeding defines the process through which deleterious alleles can be removed from populations by natural selection when exposed in homozygosis through the occurrence of consanguineous marriage. In this study we carried out low-read depth (4-10x) whole-genome sequencing in 568 individuals from three Italian founder populations, and compared it to data from other Italian and European populations from the 1000 Genomes Project. We show depletion of homozygous genotypes at potentially detrimental sites in the founder populations compared to outbred populations and observe patterns consistent with consanguinity driving the accelerated purging of highly deleterious mutations.

01 Jan 2015
TL;DR: The status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and the present and future vision of the landscape of Genome 10K are provided.
Abstract: The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ∼26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species.

Posted ContentDOI
06 Mar 2015-bioRxiv
TL;DR: It is demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases the power to discover biologically relevant associations.
Abstract: Statistical factor analysis methods have previously been used to remove noise components from high dimensional data prior to genetic association mapping, and in a guided fashion to summarise biologically relevant sources of variation. Here we show how the derived factors summarising pathway expression can be used to analyse the relationships between expression, heritability and ageing. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarise patterns of gene expression, both to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" which summarised patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38E-5). These phenotypes are more heritable (h^2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolising sugars and fatty acids, others with insulin signalling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.

Posted ContentDOI
12 Oct 2015-bioRxiv
TL;DR: In this article, the authors carried out low-read depth (4-10x) whole-genome sequencing in 568 individuals from three Italian founder populations, and compared it to data from other Italian and European populations from the 1000 Genomes Project.
Abstract: Purging through inbreeding occurs when consanguineous marriages increases the rate at which deleterious alleles are present in a homozygous state. In this study we carried out low-read depth (4-10x) whole-genome sequencing in 568 individuals from three Italian founder populations, and compared it to data from other Italian and European populations from the 1000 Genomes Project. We show extended consanguinity and depletion of homozygous genotypes at potentially detrimental sites in the founder populations compared to outbred populations. However these patterns are not compatible with the hypothesis of consanguinity driving the purging of highly deleterious mutations according to simulations. Therefore we conclude that genetic drift and the founder effect should be responsible for the observed purging of deleterious variants.


Journal ArticleDOI
TL;DR: The original version of this article noted incorrect affiliations for members of the UK10K Consortium, and contained typographical errors in the spelling of UK10k Consortium and consortium members Valentina Iotchkova and Michael Quail as discussed by the authors.
Abstract: Nature Communications 6: Article number: 5681 10.1038/ncomms6681 (2015); Published March062015; Updated May202015 The original version of this Article noted incorrect affiliations for members of the UK10K Consortium, and contained typographical errors in the spelling of the UK10K Consortium and consortium members Valentina Iotchkova and Michael Quail. In addition, the author J. Brent Richards was incorrectly duplicated in the list of consortium members as Brent Richards. These errors have now been corrected in the PDF and HTML versions of this Article.

Journal ArticleDOI
TL;DR: This research presents a novel probabilistic approach to estimating the response of the immune system to laser-spot assisted, 3D image analysis of central nervous system injury.
Abstract: [This corrects the article DOI: 10.1371/journal.pgen.1004798.].