scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Genetics in 2014"


Journal ArticleDOI
TL;DR: A novel statistical methodology to assess whether two association signals are consistent with a shared causal variant and the ability to derive the output statistics from single SNP summary statistics, making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets is developed.
Abstract: Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.

1,711 citations


Journal ArticleDOI
TL;DR: It is found that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations, and a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals is developed.
Abstract: Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally ‘unrelated’ individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.

555 citations


Journal ArticleDOI
TL;DR: It is found that none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade, suggesting that a re-evaluation of past hypotheses regarding dog origins is necessary.
Abstract: To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary.

504 citations


Journal ArticleDOI
TL;DR: A probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation and introduces a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays.
Abstract: Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.

487 citations


Journal ArticleDOI
TL;DR: This work hypothesized that environmental fluctuations among seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster that would drive repeatable adaptive oscillations at balanced polymorphisms, and identified hundreds of polymorphisms whose frequency oscillates among seasons and argued that these loci are subject to strong, temporal variable selection.
Abstract: In many species, genomic data have revealed pervasive adaptive evolution indicated by the fixation of beneficial alleles. However, when selection pressures are highly variable along a species' range or through time adaptive alleles may persist at intermediate frequencies for long periods. So called “balanced polymorphisms” have long been understood to be an important component of standing genetic variation, yet direct evidence of the strength of balancing selection and the stability and prevalence of balanced polymorphisms has remained elusive. We hypothesized that environmental fluctuations among seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster that would drive repeatable adaptive oscillations at balanced polymorphisms. We identified hundreds of polymorphisms whose frequency oscillates among seasons and argue that these loci are subject to strong, temporally variable selection. We show that these polymorphisms respond to acute and persistent changes in climate and are associated in predictable ways with seasonally variable phenotypes. In addition, our results suggest that adaptively oscillating polymorphisms are likely millions of years old, with some possibly predating the divergence between D. melanogaster and D. simulans. Taken together, our results are consistent with a model of balancing selection wherein rapid temporal fluctuations in climate over generational time promotes adaptive genetic diversity at loci underlying polygenic variation in fitness related phenotypes.

486 citations


Journal ArticleDOI
TL;DR: This analysis uncovers a number of putative signals of local adaptation, and develops methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of comparisons to test for over-dispersion of genetic values among populations.
Abstract: Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that GWAS provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We use a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model, we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also significantly outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.

482 citations


Journal ArticleDOI
TL;DR: Mutations of the SHANK genes were detected in the whole spectrum of autism with a gradient of severity in cognitive impairment and the clinical relevance of these genes remains to be ascertained.
Abstract: SHANK genes code for scaffold proteins located at the post-synaptic density of glutamatergic synapses. In neurons, SHANK2 and SHANK3 have a positive effect on the induction and maturation of dendritic spines, whereas SHANK1 induces the enlargement of spine heads. Mutations in SHANK genes have been associated with autism spectrum disorders (ASD), but their prevalence and clinical relevance remain to be determined. Here, we performed a new screen and a meta-analysis of SHANK copy-number and coding-sequence variants in ASD. Copy-number variants were analyzed in 5,657 patients and 19,163 controls, coding-sequence variants were ascertained in 760 to 2,147 patients and 492 to 1,090 controls (depending on the gene), and, individuals carrying de novo or truncating SHANK mutations underwent an extensive clinical investigation. Copy-number variants and truncating mutations in SHANK genes were present in ∼1% of patients with ASD: mutations in SHANK1 were rare (0.04%) and present in males with normal IQ and autism; mutations in SHANK2 were present in 0.17% of patients with ASD and mild intellectual disability; mutations in SHANK3 were present in 0.69% of patients with ASD and up to 2.12% of the cases with moderate to profound intellectual disability. In summary, mutations of the SHANK genes were detected in the whole spectrum of autism with a gradient of severity in cognitive impairment. Given the rare frequency of SHANK1 and SHANK2 deleterious mutations, the clinical relevance of these genes remains to be ascertained. In contrast, the frequency and the penetrance of SHANK3 mutations in individuals with ASD and intellectual disability-more than 1 in 50-warrant its consideration for mutation screening in clinical practice.

452 citations


Journal ArticleDOI
TL;DR: Results show that both chondrocytes prior to initial ossification and growth plate chondROcytes before or after birth have the capacity to undergo transdifferentiation to become osteoblasts.
Abstract: One of the crucial steps in endochondral bone formation is the replacement of a cartilage matrix produced by chondrocytes with bone trabeculae made by osteoblasts. However, the precise sources of osteoblasts responsible for trabecular bone formation have not been fully defined. To investigate whether cells derived from hypertrophic chondrocytes contribute to the osteoblast pool in trabecular bones, we genetically labeled either hypertrophic chondrocytes by Col10a1-Cre or chondrocytes by tamoxifen-induced Agc1-CreERT2 using EGFP, LacZ or Tomato expression. Both Cre drivers were specifically active in chondrocytic cells and not in perichondrium, in periosteum or in any of the osteoblast lineage cells. These in vivo experiments allowed us to follow the fate of cells labeled in Col10a1-Cre or Agc1-CreERT2 -expressing chondrocytes. After the labeling of chondrocytes, both during prenatal development and after birth, abundant labeled non-chondrocytic cells were present in the primary spongiosa. These cells were distributed throughout trabeculae surfaces and later were present in the endosteum, and embedded within the bone matrix. Co-expression studies using osteoblast markers indicated that a proportion of the non-chondrocytic cells derived from chondrocytes labeled by Col10a1-Cre or by Agc1-CreERT2 were functional osteoblasts. Hence, our results show that both chondrocytes prior to initial ossification and growth plate chondrocytes before or after birth have the capacity to undergo transdifferentiation to become osteoblasts. The osteoblasts derived from Col10a1-expressing hypertrophic chondrocytes represent about sixty percent of all mature osteoblasts in endochondral bones of one month old mice. A similar process of chondrocyte to osteoblast transdifferentiation was involved during bone fracture healing in adult mice. Thus, in addition to cells in the periosteum chondrocytes represent a major source of osteoblasts contributing to endochondral bone formation in vivo.

433 citations


Journal ArticleDOI
TL;DR: Functional analyses demonstrated that identified candidate genes affect pancreatic β- and α-cells as Exoc3l silencing reduced exocytosis and overexpression of Cdkn1a, Pde7b and Sept9 perturbed insulin and glucagon secretion in clonal α- and β-cells, respectively.
Abstract: Impaired insulin secretion is a hallmark of type 2 diabetes (T2D). Epigenetics may affect disease susceptibility. To describe the human methylome in pancreatic islets and determine the epigenetic basis of T2D, we analyzed DNA methylation of 479,927 CpG sites and the transcriptome in pancreatic islets from T2D and non-diabetic donors. We provide a detailed map of the global DNA methylation pattern in human islets, β- and α-cells. Genomic regions close to the transcription start site showed low degrees of methylation and regions further away from the transcription start site such as the gene body, 3'UTR and intergenic regions showed a higher degree of methylation. While CpG islands were hypomethylated, the surrounding 2 kb shores showed an intermediate degree of methylation, whereas regions further away (shelves and open sea) were hypermethylated in human islets, β- and α-cells. We identified 1,649 CpG sites and 853 genes, including TCF7L2, FTO and KCNQ1, with differential DNA methylation in T2D islets after correction for multiple testing. The majority of the differentially methylated CpG sites had an intermediate degree of methylation and were underrepresented in CpG islands (∼ 7%) and overrepresented in the open sea (∼ 60%). 102 of the differentially methylated genes, including CDKN1A, PDE7B, SEPT9 and EXOC3L2, were differentially expressed in T2D islets. Methylation of CDKN1A and PDE7B promoters in vitro suppressed their transcriptional activity. Functional analyses demonstrated that identified candidate genes affect pancreatic β- and α-cells as Exoc3l silencing reduced exocytosis and overexpression of Cdkn1a, Pde7b and Sept9 perturbed insulin and glucagon secretion in clonal β- and α-cells, respectively. Together, our data can serve as a reference methylome in human islets. We provide new target genes with altered DNA methylation and expression in human T2D islets that contribute to perturbed insulin and glucagon secretion. These results highlight the importance of epigenetics in the pathogenesis of T2D.

411 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that the average Finn has more low-frequency loss-of-function variants and complete gene knockouts.
Abstract: Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5-5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10⁻⁸) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P = 1.5×10⁻¹¹⁷). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR = 0.84, P = 3×10⁻⁴), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.

367 citations


Journal ArticleDOI
TL;DR: The major technological and biological breakthroughs achieved are reviewed, the remaining challenges to overcome are described, and a glimpse into the promise of recent and future developments are provided.
Abstract: Advances in whole-genome and whole-transcriptome amplification have permitted the sequencing of the minute amounts of DNA and RNA present in a single cell, offering a window into the extent and nature of genomic and transcriptomic heterogeneity which occurs in both normal development and disease. Single-cell approaches stand poised to revolutionise our capacity to understand the scale of genomic, epigenomic, and transcriptomic diversity that occurs during the lifetime of an individual organism. Here, we review the major technological and biological breakthroughs achieved, describe the remaining challenges to overcome, and provide a glimpse into the promise of recent and future developments.

Journal ArticleDOI
TL;DR: C-NHEJ is conservative but adaptable, and the accuracy of the repair is dictated by the structure of the DNA ends rather than by the C-N HEJ machinery, which is beneficial for the development of the immune repertoire and the resistance to ionizing radiation.
Abstract: DNA double-strand breaks (DSBs) are harmful lesions leading to genomic instability or diversity. Non-homologous end-joining (NHEJ) is a prominent DSB repair pathway, which has long been considered to be error-prone. However, recent data have pointed to the intrinsic precision of NHEJ. Three reasons can account for the apparent fallibility of NHEJ: 1) the existence of a highly error-prone alternative end-joining process; 2) the adaptability of canonical C-NHEJ (Ku- and Xrcc4/ligase IV–dependent) to imperfect complementary ends; and 3) the requirement to first process chemically incompatible DNA ends that cannot be ligated directly. Thus, C-NHEJ is conservative but adaptable, and the accuracy of the repair is dictated by the structure of the DNA ends rather than by the C-NHEJ machinery. We present data from different organisms that describe the conservative/versatile properties of C-NHEJ. The advantages of the adaptability/versatility of C-NHEJ are discussed for the development of the immune repertoire and the resistance to ionizing radiation, especially at low doses, and for targeted genome manipulation.

Journal ArticleDOI
TL;DR: It is concluded that DNMs represent a major cause of moderate or severe ID.
Abstract: Genetics is believed to have an important role in intellectual disability (ID). Recent studies have emphasized the involvement of de novo mutations (DNMs) in ID but the extent to which they contribute to its pathogenesis and the identity of the corresponding genes remain largely unknown. Here, we report a screen for DNMs in subjects with moderate or severe ID. We sequenced the exomes of 41 probands and their parents, and confirmed 81 DNMs affecting the coding sequence or consensus splice sites (1.98 DNMs/proband). We observed a significant excess of de novo single nucleotide substitutions and loss-of-function mutations in these cases compared to control subjects, suggesting that at least a subset of these variations are pathogenic. A total of 12 likely pathogenic DNMs were identified in genes previously associated with ID (ARID1B, CHD2, FOXG1, GABRB3, GATAD2B, GRIN2B, MBD5, MED13L, SETBP1, TBR1, TCF4, WDR45), resulting in a diagnostic yield of ∼29%. We also identified 12 possibly pathogenic DNMs in genes (HNRNPU, WAC, RYR2, SET, EGR1, MYH10, EIF2C1, COL4A3BP, CHMP2A, PPP1CB, VPS4A, PPP2R2B) that have not previously been causally linked to ID. Interestingly, no case was explained by inherited mutations. Protein network analysis indicated that the products of many of these known and candidate genes interact with each other or with products of other ID-associated genes further supporting their involvement in ID. We conclude that DNMs represent a major cause of moderate or severe ID.

Journal ArticleDOI
TL;DR: The spectrum of mutations identified provides insights into the genetics underlying the micro-evolution of a laboratory strain, and identifies mutations involved in stress responses, mating efficiency, and virulence.
Abstract: Cryptococcus neoformans is a pathogenic basidiomycetous yeast responsible for more than 600,000 deaths each year. It occurs as two serotypes (A and D) representing two varieties (i.e. grubii and neoformans, respectively). Here, we sequenced the genome and performed an RNA-Seq-based analysis of the C. neoformans var. grubii transcriptome structure. We determined the chromosomal locations, analyzed the sequence/structural features of the centromeres, and identified origins of replication. The genome was annotated based on automated and manual curation. More than 40,000 introns populating more than 99% of the expressed genes were identified. Although most of these introns are located in the coding DNA sequences (CDS), over 2,000 introns in the untranslated regions (UTRs) were also identified. Poly(A)-containing reads were employed to locate the polyadenylation sites of more than 80% of the genes. Examination of the sequences around these sites revealed a new poly(A)-site-associated motif (AUGHAH). In addition, 1,197 miscRNAs were identified. These miscRNAs can be spliced and/or polyadenylated, but do not appear to have obvious coding capacities. Finally, this genome sequence enabled a comparative analysis of strain H99 variants obtained after laboratory passage. The spectrum of mutations identified provides insights into the genetics underlying the micro-evolution of a laboratory strain, and identifies mutations involved in stress responses, mating efficiency, and virulence.

Journal ArticleDOI
TL;DR: The geographic distribution of admixture proportions in this sample reveals extensive population structure, illustrating the continuing impact of demographic history on the genetic diversity of Latin America.
Abstract: The current genetic makeup of Latin America has been shaped by a history of extensive admixture between Africans, Europeans and Native Americans, a process taking place within the context of extensive geographic and social stratification. We estimated individual ancestry proportions in a sample of 7,342 subjects ascertained in five countries (Brazil, Chile, Colombia, Mexico and Peru). These individuals were also characterized for a range of physical appearance traits and for self-perception of ancestry. The geographic distribution of admixture proportions in this sample reveals extensive population structure, illustrating the continuing impact of demographic history on the genetic diversity of Latin America. Significant ancestry effects were detected for most phenotypes studied. However, ancestry generally explains only a modest proportion of total phenotypic variation. Genetically estimated and self-perceived ancestry correlate significantly, but certain physical attributes have a strong impact on self-perception and bias self-perception of ancestry relative to genetically estimated ancestry.

Journal ArticleDOI
TL;DR: It is argued that cattle migration, movement and trading followed by admixture have been important forces in shaping modern bovine genomic variation.
Abstract: The domestication and development of cattle has considerably impacted human societies, but the histories of cattle breeds and populations have been poorly understood especially for African, Asian, and American breeds. Using genotypes from 43,043 autosomal single nucleotide polymorphism markers scored in 1,543 animals, we evaluate the population structure of 134 domesticated bovid breeds. Regardless of the analytical method or sample subset, the three major groups of Asian indicine, Eurasian taurine, and African taurine were consistently observed. Patterns of geographic dispersal resulting from co-migration with humans and exportation are recognizable in phylogenetic networks. All analytical methods reveal patterns of hybridization which occurred after divergence. Using 19 breeds, we map the cline of indicine introgression into Africa. We infer that African taurine possess a large portion of wild African auroch ancestry, causing their divergence from Eurasian taurine. We detect exportation patterns in Asia and identify a cline of Eurasian taurine/indicine hybridization in Asia. We also identify the influence of species other than Bos taurus taurus and B. t. indicus in the formation of Asian breeds. We detect the pronounced influence of Shorthorn cattle in the formation of European breeds. Iberian and Italian cattle possess introgression from African taurine. American Criollo cattle originate from Iberia, and not directly from Africa with African ancestry inherited via Iberian ancestors. Indicine introgression into American cattle occurred in the Americas, and not Europe. We argue that cattle migration, movement and trading followed by admixture have been important forces in shaping modern bovine genomic variation.

Journal ArticleDOI
TL;DR: An online tool is provided for calculating the power of detecting genetic (co)variation using genome-wide SNP data and it is shown that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships.
Abstract: We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.

Journal ArticleDOI
TL;DR: Large-scale proteomic analysis of the proteome in the mouse liver revealed a high temporal coordination in the abundance of proteins involved in the same metabolic process, such as xenobiotic detoxification, and revealed many other essential cellular processes in which protein levels are under circadian control.
Abstract: Circadian clocks are endogenous oscillators that drive the rhythmic expression of a broad array of genes, orchestrating metabolism and physiology. Recent evidence indicates that post-transcriptional and post-translational mechanisms play essential roles in modulating temporal gene expression for proper circadian function, particularly for the molecular mechanism of the clock. Due to technical limitations in large-scale, quantitative protein measurements, it remains unresolved to what extent the circadian clock regulates metabolism by driving rhythms of protein abundance. Therefore, we aimed to identify global circadian oscillations of the proteome in the mouse liver by applying in vivo SILAC mouse technology in combination with state of the art mass spectrometry. Among the 3000 proteins accurately quantified across two consecutive cycles, 6% showed circadian oscillations with a defined phase of expression. Interestingly, daily rhythms of one fifth of the liver proteins were not accompanied by changes at the transcript level. The oscillations of almost half of the cycling proteome were delayed by more than six hours with respect to the corresponding, rhythmic mRNA. Strikingly we observed that the length of the time lag between mRNA and protein cycles varies across the day. Our analysis revealed a high temporal coordination in the abundance of proteins involved in the same metabolic process, such as xenobiotic detoxification. Apart from liver specific metabolic pathways, we identified many other essential cellular processes in which protein levels are under circadian control, for instance vesicle trafficking and protein folding. Our large-scale proteomic analysis reveals thus that circadian post-transcriptional and post-translational mechanisms play a key role in the temporal orchestration of liver metabolism and physiology.

Journal ArticleDOI
TL;DR: It is found that different Arabidopsis accessions exhibited different communities, indicating that plant host genetic factors shape the associated microbiota, thus harboring significant potential for the identification of novel plant factors affecting the microbiota of the communities.
Abstract: The identity of plant host genetic factors controlling the composition of the plant microbiota and the extent to which plant genes affect associated microbial populations is currently unknown. Here, we use a candidate gene approach to investigate host effects on the phyllosphere community composition and abundance. To reduce the environmental factors that might mask genetic factors, the model plant Arabidopsis thaliana was used in a gnotobiotic system and inoculated with a reduced complexity synthetic bacterial community composed of seven strains representing the most abundant phyla in the phyllosphere. From a panel of 55 plant mutants with alterations in the surface structure, cell wall, defense signaling, secondary metabolism, and pathogen recognition, a small number of single host mutations displayed an altered microbiota composition and/or abundance. Host alleles that resulted in the strongest perturbation of the microbiota relative to the wild-type were lacs2 and pec1. These mutants affect cuticle formation and led to changes in community composition and an increased bacterial abundance relative to the wild-type plants, suggesting that different bacteria can benefit from a modified cuticle to different extents. Moreover, we identified ein2, which is involved in ethylene signaling, as a host factor modulating the community's composition. Finally, we found that different Arabidopsis accessions exhibited different communities, indicating that plant host genetic factors shape the associated microbiota, thus harboring significant potential for the identification of novel plant factors affecting the microbiota of the communities.

Journal ArticleDOI
TL;DR: The largest genomic survey to date of 101 EFT (65 tumors and 36 cell lines) is reported, finding that EFT has a very low mutational burden but frequent deleterious mutations in the cohesin complex subunit STAG2 and that 11% of tumors pathologically diagnosed as EFT lack a typical EWSR1 fusion oncogene and these tumors do not have a characteristic Ewing sarcoma gene expression signature.
Abstract: The Ewing sarcoma family of tumors (EFT) is a group of highly malignant small round blue cell tumors occurring in children and young adults. We report here the largest genomic survey to date of 101 EFT (65 tumors and 36 cell lines). Using a combination of whole genome sequencing and targeted sequencing approaches, we discover that EFT has a very low mutational burden (0.15 mutations/Mb) but frequent deleterious mutations in the cohesin complex subunit STAG2 (21.5% tumors, 44.4% cell lines), homozygous deletion of CDKN2A (13.8% and 50%) and mutations of TP53 (6.2% and 71.9%). We additionally note an increased prevalence of the BRCA2 K3326X polymorphism in EFT patient samples (7.3%) compared to population data (OR 7.1, p = 0.006). Using whole transcriptome sequencing, we find that 11% of tumors pathologically diagnosed as EFT lack a typical EWSR1 fusion oncogene and that these tumors do not have a characteristic Ewing sarcoma gene expression signature. We identify samples harboring novel fusion genes including FUS-NCATc2 and CIC-FOXO4 that may represent distinct small round blue cell tumor variants. In an independent EFT tissue microarray cohort, we show that STAG2 loss as detected by immunohistochemistry may be associated with more advanced disease (p = 0.15) and a modest decrease in overall survival (p = 0.10). These results significantly advance our understanding of the genomic and molecular underpinnings of Ewing sarcoma and provide a foundation towards further efforts to improve diagnosis, prognosis, and precision therapeutics testing.

Journal ArticleDOI
TL;DR: In this paper, a threading operation based on hidden Markov models is proposed to sample an ARG of [Formula: see text] chromosomes conditional on an this paper.
Abstract: The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.

Journal ArticleDOI
TL;DR: It is demonstrated that by directly targeting a common promoter cis-element (G-box), HY5 and PIFs form a dynamic activation-suppression transcriptional module responsive to light and temperature cues that provides a simple, direct mechanism through which environmental change can redirect transcriptional control of genes required for photosynthesis and photoprotection.
Abstract: The ability to interpret daily and seasonal alterations in light and temperature signals is essential for plant survival. This is particularly important during seedling establishment when the phytochrome photoreceptors activate photosynthetic pigment production for photoautotrophic growth. Phytochromes accomplish this partly through the suppression of phytochrome interacting factors (PIFs), negative regulators of chlorophyll and carotenoid biosynthesis. While the bZIP transcription factor long hypocotyl 5 (HY5), a potent PIF antagonist, promotes photosynthetic pigment accumulation in response to light. Here we demonstrate that by directly targeting a common promoter cis-element (G-box), HY5 and PIFs form a dynamic activation-suppression transcriptional module responsive to light and temperature cues. This antagonistic regulatory module provides a simple, direct mechanism through which environmental change can redirect transcriptional control of genes required for photosynthesis and photoprotection. In the regulation of photopigment biosynthesis genes, HY5 and PIFs do not operate alone, but with the circadian clock. However, sudden changes in light or temperature conditions can trigger changes in HY5 and PIFs abundance that adjust the expression of common target genes to optimise photosynthetic performance and growth.

Journal ArticleDOI
TL;DR: Two emerging opportunities now stand to revolutionize the understanding of the everyday life of the human genome: network genomics analyses examining how systems-level capabilities emerge from groups of individual socially sensitive genomes and near-real-time transcriptional biofeedback to empirically optimize individual well-being.
Abstract: A growing literature in human social genomics has begun to analyze how everyday life circumstances influence human gene expression. Social-environmental conditions such as urbanity, low socioeconomic status, social isolation, social threat, and low or unstable social status have been found to associate with differential expression of hundreds of gene transcripts in leukocytes and diseased tissues such as metastatic cancers. In leukocytes, diverse types of social adversity evoke a common conserved transcriptional response to adversity (CTRA) characterized by increased expression of proinflammatory genes and decreased expression of genes involved in innate antiviral responses and antibody synthesis. Mechanistic analyses have mapped the neural “social signal transduction” pathways that stimulate CTRA gene expression in response to social threat and may contribute to social gradients in health. Research has also begun to analyze the functional genomics of optimal health and thriving. Two emerging opportunities now stand to revolutionize our understanding of the everyday life of the human genome: network genomics analyses examining how systems-level capabilities emerge from groups of individual socially sensitive genomes and near-real-time transcriptional biofeedback to empirically optimize individual well-being in the context of the unique genetic, geographic, historical, developmental, and social contexts that jointly shape the transcriptional realization of our innate human genomic potential for thriving.

Journal ArticleDOI
TL;DR: A genome-wide association study and analysis of known genetic risk loci for AD dementia using neuropathologic data from 4,914 brain autopsies discovered new genetic associations with specific neuro Pathologic features and aligned known geneticrisk for AD Alzheimer's disease with specific Neuropathologic changes in the largest brain autopsy study of AD and related dementias.
Abstract: Alzheimer's disease (AD) and related dementias are a major public health challenge and present a therapeutic imperative for which we need additional insight into molecular pathogenesis. We performed a genome-wide association study and analysis of known genetic risk loci for AD dementia using neuropathologic data from 4,914 brain autopsies. Neuropathologic data were used to define clinico-pathologic AD dementia or controls, assess core neuropathologic features of AD (neuritic plaques, NPs; neurofibrillary tangles, NFTs), and evaluate commonly co-morbid neuropathologic changes: cerebral amyloid angiopathy (CAA), Lewy body disease (LBD), hippocampal sclerosis of the elderly (HS), and vascular brain injury (VBI). Genome-wide significance was observed for clinico-pathologic AD dementia, NPs, NFTs, CAA, and LBD with a number of variants in and around the apolipoprotein E gene (APOE). GalNAc transferase 7 (GALNT7), ATP-Binding Cassette, Sub-Family G (WHITE), Member 1 (ABCG1), and an intergenic region on chromosome 9 were associated with NP score; and Potassium Large Conductance Calcium-Activated Channel, Subfamily M, Beta Member 2 (KCNMB2) was strongly associated with HS. Twelve of the 21 non-APOE genetic risk loci for clinically-defined AD dementia were confirmed in our clinico-pathologic sample: CR1, BIN1, CLU, MS4A6A, PICALM, ABCA7, CD33, PTK2B, SORL1, MEF2C, ZCWPW1, and CASS4 with 9 of these 12 loci showing larger odds ratio in the clinico-pathologic sample. Correlation of effect sizes for risk of AD dementia with effect size for NFTs or NPs showed positive correlation, while those for risk of VBI showed a moderate negative correlation. The other co-morbid neuropathologic features showed only nominal association with the known AD loci. Our results discovered new genetic associations with specific neuropathologic features and aligned known genetic risk for AD dementia with specific neuropathologic changes in the largest brain autopsy study of AD and related dementias.

Journal ArticleDOI
TL;DR: FGFR2 fusions and ERRFI mutations may represent novel targets in sporadic intrahepatic cholangiocarcinoma and trials should be characterized in larger cohorts of patients with these aberrations.
Abstract: Advanced cholangiocarcinoma continues to harbor a difficult prognosis and therapeutic options have been limited. During the course of a clinical trial of whole genomic sequencing seeking druggable targets, we examined six patients with advanced cholangiocarcinoma. Integrated genome-wide and whole transcriptome sequence analyses were performed on tumors from six patients with advanced, sporadic intrahepatic cholangiocarcinoma (SIC) to identify potential therapeutically actionable events. Among the somatic events captured in our analysis, we uncovered two novel therapeutically relevant genomic contexts that when acted upon, resulted in preliminary evidence of anti-tumor activity. Genome-wide structural analysis of sequence data revealed recurrent translocation events involving the FGFR2 locus in three of six assessed patients. These observations and supporting evidence triggered the use of FGFR inhibitors in these patients. In one example, preliminary anti-tumor activity of pazopanib (in vitro FGFR2 IC50≈350 nM) was noted in a patient with an FGFR2-TACC3 fusion. After progression on pazopanib, the same patient also had stable disease on ponatinib, a pan-FGFR inhibitor (in vitro, FGFR2 IC50≈8 nM). In an independent non-FGFR2 translocation patient, exome and transcriptome analysis revealed an allele specific somatic nonsense mutation (E384X) in ERRFI1, a direct negative regulator of EGFR activation. Rapid and robust disease regression was noted in this ERRFI1 inactivated tumor when treated with erlotinib, an EGFR kinase inhibitor. FGFR2 fusions and ERRFI mutations may represent novel targets in sporadic intrahepatic cholangiocarcinoma and trials should be characterized in larger cohorts of patients with these aberrations.

Journal ArticleDOI
TL;DR: Increasing R-loop levels by treatment with DNA topoisomerase inhibitor camptothecin leads to up-regulation of repressive chromatin marks, resulting in FXN transcriptional silencing, suggesting that R-loops act as an initial trigger to promote FXN and FMR1 silencing.
Abstract: Friedreich ataxia (FRDA) and Fragile X syndrome (FXS) are among 40 diseases associated with expansion of repeated sequences (TREDs). Although their molecular pathology is not well understood, formation of repressive chromatin and unusual DNA structures over repeat regions were proposed to play a role. Our study now shows that RNA/DNA hybrids (R-loops) form in patient cells on expanded repeats of endogenous FXN and FMR1 genes, associated with FRDA and FXS. These transcription-dependent R-loops are stable, co-localise with repressive H3K9me2 chromatin mark and impede RNA Polymerase II transcription in patient cells. We investigated the interplay between repressive chromatin marks and R-loops on the FXN gene. We show that decrease in repressive H3K9me2 chromatin mark has no effect on R-loop levels. Importantly, increasing R-loop levels by treatment with DNA topoisomerase inhibitor camptothecin leads to up-regulation of repressive chromatin marks, resulting in FXN transcriptional silencing. This provides a direct molecular link between R-loops and the pathology of TREDs, suggesting that R-loops act as an initial trigger to promote FXN and FMR1 silencing. Thus R-loops represent a common feature of nucleotide expansion disorders and provide a new target for therapeutic interventions.

Journal ArticleDOI
TL;DR: The mechanism developed by ONSEN, an LTR-copia type retrotransposon in Arabidopsis thaliana, has acquired a heat-responsive element recognized by plant-derived heat stress defense factors, resulting in transcription and production of full length extrachromosomal DNA under elevated temperatures.
Abstract: Retrotransposons are major components of plant and animal genomes. They amplify by reverse transcription and reintegration into the host genome but their activity is usually epigenetically silenced. In plants, genomic copies of retrotransposons are typically associated with repressive chromatin modifications installed and maintained by RNA-directed DNA methylation. To escape this tight control, retrotransposons employ various strategies to avoid epigenetic silencing. Here we describe the mechanism developed by ONSEN, an LTR-copia type retrotransposon in Arabidopsis thaliana. ONSEN has acquired a heat-responsive element recognized by plant-derived heat stress defense factors, resulting in transcription and production of full length extrachromosomal DNA under elevated temperatures. Further, the ONSEN promoter is free of CG and CHG sites, and the reduction of DNA methylation at the CHH sites is not sufficient to activate the element. Since dividing cells have a more pronounced heat response, the extrachromosomal ONSEN DNA, capable of reintegrating into the genome, accumulates preferentially in the meristematic tissue of the shoot. The recruitment of a major plant heat shock transcription factor in periods of heat stress exploits the plant's heat stress response to achieve the transposon's activation, making it impossible for the host to respond appropriately to stress without losing control over the invader.

Journal ArticleDOI
TL;DR: The findings failed to demonstrate evidence for recent clonal transmission of cephalosporin-resistant E. coli strains from poultry to humans, as has been suggested based on traditional, low-resolution typing methods, and suggest that cepinghalosporain resistance genes are mainly disseminated in animals and humans via distinct plasmids.
Abstract: Third-generation cephalosporins are a class of β-lactam antibiotics that are often used for the treatment of human infections caused by Gram-negative bacteria, especially Escherichia coli. Worryingly, the incidence of human infections caused by third-generation cephalosporin-resistant E. coli is increasing worldwide. Recent studies have suggested that these E. coli strains, and their antibiotic resistance genes, can spread from food-producing animals, via the food-chain, to humans. However, these studies used traditional typing methods, which may not have provided sufficient resolution to reliably assess the relatedness of these strains. We therefore used whole-genome sequencing (WGS) to study the relatedness of cephalosporin-resistant E. coli from humans, chicken meat, poultry and pigs. One strain collection included pairs of human and poultry-associated strains that had previously been considered to be identical based on Multi-Locus Sequence Typing, plasmid typing and antibiotic resistance gene sequencing. The second collection included isolates from farmers and their pigs. WGS analysis revealed considerable heterogeneity between human and poultry-associated isolates. The most closely related pairs of strains from both sources carried 1263 Single-Nucleotide Polymorphisms (SNPs) per Mbp core genome. In contrast, epidemiologically linked strains from humans and pigs differed by only 1.8 SNPs per Mbp core genome. WGS-based plasmid reconstructions revealed three distinct plasmid lineages (IncI1- and IncK-type) that carried cephalosporin resistance genes of the Extended-Spectrum Beta-Lactamase (ESBL)- and AmpC-types. The plasmid backbones within each lineage were virtually identical and were shared by genetically unrelated human and animal isolates. Plasmid reconstructions from short-read sequencing data were validated by long-read DNA sequencing for two strains. Our findings failed to demonstrate evidence for recent clonal transmission of cephalosporin-resistant E. coli strains from poultry to humans, as has been suggested based on traditional, low-resolution typing methods. Instead, our data suggest that cephalosporin resistance genes are mainly disseminated in animals and humans via distinct plasmids.

Journal ArticleDOI
TL;DR: Underlying genetic background variation is responsible for most heterogeneity between human iPS cell lines, and hIPSCs are a stable, robust and powerful platform for large-scale studies of the function of genetic differences between individuals.
Abstract: Human iPS cells have been generated using a diverse range of tissues from a variety of donors using different reprogramming vectors. However, these cell lines are heterogeneous, which presents a limitation for their use in disease modeling and personalized medicine. To explore the basis of this heterogeneity we generated 25 iPS cell lines under normalised conditions from the same set of somatic tissues across a number of donors. RNA-seq data sets from each cell line were compared to identify the majority contributors to transcriptional heterogeneity. We found that genetic differences between individual donors were the major cause of transcriptional variation between lines. In contrast, residual signatures from the somatic cell of origin, so called epigenetic memory, contributed relatively little to transcriptional variation. Thus, underlying genetic background variation is responsible for most heterogeneity between human iPS cell lines. We conclude that epigenetic effects in hIPSCs are minimal, and that hIPSCs are a stable, robust and powerful platform for large-scale studies of the function of genetic differences between individuals. Our data also suggest that future studies using hIPSCs as a model system should focus most effort on collection of large numbers of donors, rather than generating large numbers of lines from the same donor.

Journal ArticleDOI
TL;DR: This study indicates that combining IBD based projection and KNN algorithm is an efficient imputation method for inferring large missing genotype segments and shows that the A-D test is a useful complement for GWAS analysis of complex quantitative traits.
Abstract: Association mapping is a powerful approach for dissecting the genetic architecture of complex quantitative traits using high-density SNP markers in maize. Here, we expanded our association panel size from 368 to 513 inbred lines with 0.5 million high quality SNPs using a two-step data-imputation method which combines identity by descent (IBD) based projection and k-nearest neighbor (KNN) algorithm. Genome-wide association studies (GWAS) were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model (MLM) and a new method, the Anderson-Darling (A-D) test. Ten loci for five traits were identified using the MLM method at the Bonferroni-corrected threshold −log10 (P) >5.74 (α = 1). Many loci ranging from one to 34 loci (107 loci for plant height) were identified for 17 traits using the A-D test at the Bonferroni-corrected threshold −log10 (P) >7.05 (α = 0.05) using 556809 SNPs. Many known loci and new candidate loci were only observed by the A-D test, a few of which were also detected in independent linkage analysis. This study indicates that combining IBD based projection and KNN algorithm is an efficient imputation method for inferring large missing genotype segments. In addition, we showed that the A-D test is a useful complement for GWAS analysis of complex quantitative traits. Especially for traits with abnormal phenotype distribution, controlled by moderate effect loci or rare variations, the A-D test balances false positives and statistical power. The candidate SNPs and associated genes also provide a rich resource for maize genetics and breeding.