scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Genetics in 2015"


Journal ArticleDOI
Thomas W. Winkler1, Anne E. Justice2, Mariaelisa Graff2, Llilda Barata3  +435 moreInstitutions (106)
TL;DR: In this paper, the authors performed meta-analyses of 114 studies with genome-wide chip and/or Metabochip data by the Genetic Investigation of Anthropometric Traits (GIANT) Consortium.
Abstract: Genome-wide association studies (GWAS) have identified more than 100 genetic variants contributing to BMI, a measure of body size, or waist-to-hip ratio (adjusted for BMI, WHRadjBMI), a measure of body shape. Body size and shape change as people grow older and these changes differ substantially between men and women. To systematically screen for age- and/or sex-specific effects of genetic variants on BMI and WHRadjBMI, we performed meta-analyses of 114 studies (up to 320,485 individuals of European descent) with genome-wide chip and/or Metabochip data by the Genetic Investigation of Anthropometric Traits (GIANT) Consortium. Each study tested the association of up to ~2.8M SNPs with BMI and WHRadjBMI in four strata (men ≤50y, men >50y, women ≤50y, women >50y) and summary statistics were combined in stratum-specific meta-analyses. We then screened for variants that showed age-specific effects (G x AGE), sex-specific effects (G x SEX) or age-specific effects that differed between men and women (G x AGE x SEX). For BMI, we identified 15 loci (11 previously established for main effects, four novel) that showed significant (FDR<5%) age-specific effects, of which 11 had larger effects in younger (<50y) than in older adults (≥50y). No sex-dependent effects were identified for BMI. For WHRadjBMI, we identified 44 loci (27 previously established for main effects, 17 novel) with sex-specific effects, of which 28 showed larger effects in women than in men, five showed larger effects in men than in women, and 11 showed opposite effects between sexes. No age-dependent effects were identified for WHRadjBMI. This is the first genome-wide interaction meta-analysis to report convincing evidence of age-dependent genetic effects on BMI. In addition, we confirm the sex-specificity of genetic effects on WHRadjBMI. These results may provide further insights into the biology that underlies weight change with age or the sexually dimorphism of body shape.

584 citations


Journal ArticleDOI
TL;DR: The results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.
Abstract: Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.

439 citations


Journal ArticleDOI
TL;DR: In this paper, a statistical test based on a measure of haplotype homozygosity (H12) was developed to detect both hard and soft sweeps with similar power, and they used H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the DGRP.
Abstract: Adaptation from standing genetic variation or recurrent de novo mutation in large populations should commonly generate soft rather than hard selective sweeps. In contrast to a hard selective sweep, in which a single adaptive haplotype rises to high population frequency, in a soft selective sweep multiple adaptive haplotypes sweep through the population simultaneously, producing distinct patterns of genetic variation in the vicinity of the adaptive site. Current statistical methods were expressly designed to detect hard sweeps and most lack power to detect soft sweeps. This is particularly unfortunate for the study of adaptation in species such as Drosophila melanogaster, where all three confirmed cases of recent adaptation resulted in soft selective sweeps and where there is evidence that the effective population size relevant for recent and strong adaptation is large enough to generate soft sweeps even when adaptation requires mutation at a specific single site at a locus. Here, we develop a statistical test based on a measure of haplotype homozygosity (H12) that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a large population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 candidates reveals that in all cases multiple haplotypes are present at high frequencies, consistent with signatures of soft sweeps. We further develop a second haplotype homozygosity statistic (H2/H1) that, in combination with H12, is capable of differentiating hard from soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are much more easily generated by soft rather than hard sweeps. We discuss the implications of these results for the study of adaptation in Drosophila and in species with large census population sizes.

394 citations


Journal ArticleDOI
TL;DR: It is suggested that TEs provide an important source of allelic regulatory variation in gene response to abiotic stress in maize and may provide local enhancer activities that stimulate stress-responsive gene expression.
Abstract: Transposable elements (TEs) account for a large portion of the genome in many eukaryotic species. Despite their reputation as “junk” DNA or genomic parasites deleterious for the host, TEs have complex interactions with host genes and the potential to contribute to regulatory variation in gene expression. It has been hypothesized that TEs and genes they insert near may be transcriptionally activated in response to stress conditions. The maize genome, with many different types of TEs interspersed with genes, provides an ideal system to study the genome-wide influence of TEs on gene regulation. To analyze the magnitude of the TE effect on gene expression response to environmental changes, we profiled gene and TE transcript levels in maize seedlings exposed to a number of abiotic stresses. Many genes exhibit up- or down-regulation in response to these stress conditions. The analysis of TE families inserted within upstream regions of up-regulated genes revealed that between four and nine different TE families are associated with up-regulated gene expression in each of these stress conditions, affecting up to 20% of the genes up-regulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress. Expression of many of these same TE families also responds to the same stress conditions. The analysis of the stress-induced transcripts and proximity of the transposon to the gene suggests that these TEs may provide local enhancer activities that stimulate stress-responsive gene expression. Our data on allelic variation for insertions of several of these TEs show strong correlation between the presence of TE insertions and stress-responsive up-regulation of gene expression. Our findings suggest that TEs provide an important source of allelic regulatory variation in gene response to abiotic stress in maize.

352 citations


Journal ArticleDOI
TL;DR: A Bayesian mixture model is used that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples, and that can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs.
Abstract: Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.

351 citations


Journal ArticleDOI
TL;DR: Results indicate that plant GAPDHs can affect multiple aspects of plant immunity in diverse sub-cellular compartments.
Abstract: Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is an important enzyme in energy metabolism with diverse cellular regulatory roles in vertebrates, but few reports have investigated the importance of plant GAPDH isoforms outside of their role in glycolysis. While animals possess one GAPDH isoform, plants possess multiple isoforms. In this study, cell biological and genetic approaches were used to investigate the role of GAPDHs during plant immune responses. Individual Arabidopsis GAPDH knockouts (KO lines) exhibited enhanced disease resistance phenotypes upon inoculation with the bacterial plant pathogen Pseudomonas syringae pv. tomato. KO lines exhibited accelerated programmed cell death and increased electrolyte leakage in response to effector triggered immunity. Furthermore, KO lines displayed increased basal ROS accumulation as visualized using the fluorescent probe H2DCFDA. The gapa1-2 and gapc1 KOs exhibited constitutive autophagy phenotypes in the absence of nutrient starvation. Due to the high sequence conservation between vertebrate and plant cytosolic GAPDH, our experiments focused on cytosolic GAPC1 cellular dynamics using a complemented GAPC1-GFP line. Confocal imaging coupled with an endocytic membrane marker (FM4-64) and endosomal trafficking inhibitors (BFA, Wortmannin) demonstrated cytosolic GAPC1 is localized to the plasma membrane and the endomembrane system, in addition to the cytosol and nucleus. After perception of bacterial flagellin, GAPC1 dynamically responded with a significant increase in size of fluorescent puncta and enhanced nuclear accumulation. Taken together, these results indicate that plant GAPDHs can affect multiple aspects of plant immunity in diverse sub-cellular compartments.

310 citations


Journal ArticleDOI
TL;DR: It is shown that amhy, a Y-specific duplicate of the anti-Müllerian hormone (amh) gene, induces male sex determination in Nile tilapia and the conserved roles of TGF-β signaling pathway in fish sex determination are highlighted.
Abstract: Variation in the TGF-β signaling pathway is emerging as an important mechanism by which gonadal sex determination is controlled in teleosts. Here we show that amhy, a Y-specific duplicate of the anti-Mullerian hormone (amh) gene, induces male sex determination in Nile tilapia. amhy is a tandem duplicate located immediately downstream of amhΔ-y on the Y chromosome. The coding sequence of amhy was identical to the X-linked amh (amh) except a missense SNP (C/T) which changes an amino acid (Ser/Leu92) in the N-terminal region. amhy lacks 5608 bp of promoter sequence that is found in the X-linked amh homolog. The amhΔ-y contains several insertions and deletions in the promoter region, and even a 5 bp insertion in exonVI that results in a premature stop codon and thus a truncated protein product lacking the TGF-β binding domain. Both amhy and amhΔ-y expression is restricted to XY gonads from 5 days after hatching (dah) onwards. CRISPR/Cas9 knockout of amhy in XY fish resulted in male to female sex reversal, while mutation of amhΔ-y alone could not. In contrast, overexpression of Amhy in XX fish, using a fosmid transgene that carries the amhy/amhΔ-y haplotype or a vector containing amhy ORF under the control of CMV promoter, resulted in female to male sex reversal, while overexpression of AmhΔ-y alone in XX fish could not. Knockout of the anti-Mullerian hormone receptor type II (amhrII) in XY fish also resulted in 100% complete male to female sex reversal. Taken together, these results strongly suggest that the duplicated amhy with a missense SNP is the candidate sex determining gene and amhy/amhrII signal is essential for male sex determination in Nile tilapia. These findings highlight the conserved roles of TGF-β signaling pathway in fish sex determination.

280 citations


Journal ArticleDOI
TL;DR: It is established that YAP1 largely exerts its transcriptional control via distal enhancers that are marked by H3K27 acetylation and that Yap1 is necessary for this chromatin mark at bound enhancers and the activity of the associated genes.
Abstract: YAP1 is a major effector of the Hippo pathway and a well-established oncogene. Elevated YAP1 activity due to mutations in Hippo pathway components or YAP1 amplification is observed in several types of human cancers. Here we investigated its genomic binding landscape in YAP1-activated cancer cells, as well as in non-transformed cells. We demonstrate that TEAD transcription factors mediate YAP1 chromatin-binding genome-wide, further explaining their dominant role as primary mediators of YAP1-transcriptional activity. Moreover, we show that YAP1 largely exerts its transcriptional control via distal enhancers that are marked by H3K27 acetylation and that YAP1 is necessary for this chromatin mark at bound enhancers and the activity of the associated genes. This work establishes YAP1-mediated transcriptional regulation at distal enhancers and provides an expanded set of target genes resulting in a fundamental source to study YAP1 function in a normal and cancer setting.

276 citations


Journal ArticleDOI
TL;DR: The results indicate that caution is needed when using marker-based regressions for inferences about population parameters such as the genomic heritability, and develops theory that leads to a precise definition of parameters arising in high dimensional genomic regressions.
Abstract: Whole-genome regression methods are being increasingly used for the analysis and prediction of complex traits and diseases. In human genetics, these methods are commonly used for inferences about genetic parameters, such as the amount of genetic variance among individuals or the proportion of phenotypic variance that can be explained by regression on molecular markers. This is so even though some of the assumptions commonly adopted for data analysis are at odds with important quantitative genetic concepts. In this article we develop theory that leads to a precise definition of parameters arising in high dimensional genomic regressions; we focus on the so-called genomic heritability: the proportion of variance of a trait that can be explained (in the population) by a linear regression on a set of markers. We propose a definition of this parameter that is framed within the classical quantitative genetics theory and show that the genomic heritability and the trait heritability parameters are equal only when all causal variants are typed. Further, we discuss how the genomic variance and genomic heritability, defined as quantitative genetic parameters, relate to parameters of statistical models commonly used for inferences, and indicate potential inferential problems that are assessed further using simulations. When a large proportion of the markers used in the analysis are in LE with QTL the likelihood function can be misspecified. This can induce a sizable finite-sample bias and, possibly, lack of consistency of likelihood (or Bayesian) estimates. This situation can be encountered if the individuals in the sample are distantly related and linkage disequilibrium spans over short regions. This bias does not negate the use of whole-genome regression models as predictive machines; however, our results indicate that caution is needed when using marker-based regressions for inferences about population parameters such as the genomic heritability.

255 citations


Journal ArticleDOI
TL;DR: Proteotoxic stress imposed by the proteasome inhibition or expression of polyglutamine expanded huntingtin induces p62 phosphorylation at its ubiquitin-association (UBA) domain that regulates its binding to ubiquitinated proteins, suggesting a potential novel drug target for the treatment of proteinopathies including Huntington's disease.
Abstract: Disruption of proteostasis, or protein homeostasis, is often associated with aberrant accumulation of misfolded proteins or protein aggregates. Autophagy offers protection to cells by removing toxic protein aggregates and injured organelles in response to proteotoxic stress. However, the exact mechanism whereby autophagy recognizes and degrades misfolded or aggregated proteins has yet to be elucidated. Mounting evidence demonstrates the selectivity of autophagy, which is mediated through autophagy receptor proteins (e.g. p62/SQSTM1) linking autophagy cargos and autophagosomes. Here we report that proteotoxic stress imposed by the proteasome inhibition or expression of polyglutamine expanded huntingtin (polyQ-Htt) induces p62 phosphorylation at its ubiquitin-association (UBA) domain that regulates its binding to ubiquitinated proteins. We find that autophagy-related kinase ULK1 phosphorylates p62 at a novel phosphorylation site S409 in UBA domain. Interestingly, phosphorylation of p62 by ULK1 does not occur upon nutrient starvation, in spite of its role in canonical autophagy signaling. ULK1 also phosphorylates S405, while S409 phosphorylation critically regulates S405 phosphorylation. We find that S409 phosphorylation destabilizes the UBA dimer interface, and increases binding affinity of p62 to ubiquitin. Furthermore, lack of S409 phosphorylation causes accumulation of p62, aberrant localization of autophagy proteins and inhibition of the clearance of ubiquitinated proteins or polyQ-Htt. Therefore, our data provide mechanistic insights into the regulation of selective autophagy by ULK1 and p62 upon proteotoxic stress. Our study suggests a potential novel drug target in developing autophagy-based therapeutics for the treatment of proteinopathies including Huntington's disease.

243 citations


Journal ArticleDOI
TL;DR: It is proposed that mtDNA content represents a novel biomarker with potential value for in vitro fertilisation (IVF) treatment, revealing chromosomally normal blastocysts incapable of producing a viable pregnancy.
Abstract: Mitochondria play a vital role in embryo development. They are the principal site of energy production and have various other critical cellular functions. Despite the importance of this organelle, little is known about the extent of variation in mitochondrial DNA (mtDNA) between individual human embryos prior to implantation. This study investigated the biological and clinical relevance of the quantity of mtDNA in 379 embryos. These were examined via a combination of microarray comparative genomic hybridisation (aCGH), quantitative PCR and next generation sequencing (NGS), providing information on chromosomal status, amount of mtDNA, and presence of mutations in the mitochondrial genome. The quantity of mtDNA was significantly higher in embryos from older women (P=0.003). Additionally, mtDNA levels were elevated in aneuploid embryos, independent of age (P=0.025). Assessment of clinical outcomes after transfer of euploid embryos to the uterus revealed that blastocysts that successfully implanted tended to contain lower mtDNA quantities than those failing to implant (P=0.007). Importantly, an mtDNA quantity threshold was established, above which implantation was never observed. Subsequently, the predictive value of this threshold was confirmed in an independent blinded prospective study, indicating that abnormal mtDNA levels are present in 30% of non-implanting euploid embryos, but are not seen in embryos forming a viable pregnancy. NGS did not reveal any increase in mutation in blastocysts with elevated mtDNA levels. The results of this study suggest that increased mtDNA may be related to elevated metabolism and are associated with reduced viability, a possibility consistent with the ‘quiet embryo’ hypothesis. Importantly, the findings suggest a potential role for mitochondria in female reproductive aging and the genesis of aneuploidy. Of clinical significance, we propose that mtDNA content represents a novel biomarker with potential value for in vitro fertilisation (IVF) treatment, revealing chromosomally normal blastocysts incapable of producing a viable pregnancy.

Journal ArticleDOI
TL;DR: This work reveals that EIN3, ORE1 and CCGs constitute a coherent feed-forward loop involving in the robust regulation of ethylene-mediated chlorophyll degradation during leaf senescence in Arabidopsis.
Abstract: Degreening, caused by chlorophyll degradation, is the most obvious symptom of senescing leaves. Chlorophyll degradation can be triggered by endogenous and environmental cues, and ethylene is one of the major inducers. ETHYLENE INSENSITIVE3 (EIN3) is a key transcription factor in the ethylene signaling pathway. It was previously reported that EIN3, miR164, and a NAC (NAM, ATAF, and CUC) transcription factor ORE1/NAC2 constitute a regulatory network mediating leaf senescence. However, how this network regulates chlorophyll degradation at molecular level is not yet elucidated. Here we report a feed-forward regulation of chlorophyll degradation that involves EIN3, ORE1, and chlorophyll catabolic genes (CCGs). Gene expression analysis showed that the induction of three major CCGs, NYE1, NYC1 and PAO, by ethylene was largely repressed in ein3 eil1 double mutant. Dual-luciferase assay revealed that EIN3 significantly enhanced the promoter activity of NYE1, NYC1 and PAO in Arabidopsis protoplasts. Furthermore, Electrophoretic mobility shift assay (EMSA) indicated that EIN3 could directly bind to NYE1, NYC1 and PAO promoters. These results reveal that EIN3 functions as a positive regulator of CCG expression during ethylene-mediated chlorophyll degradation. Interestingly, ORE1, a senescence regulator which is a downstream target of EIN3, could also activate the expression of NYE1, NYC1 and PAO by directly binding to their promoters in EMSA and chromatin immunoprecipitation (ChIP) assays. In addition, EIN3 and ORE1 promoted NYE1 and NYC1 transcriptions in an additive manner. These results suggest that ORE1 is also involved in the direct regulation of CCG transcription. Moreover, ORE1 activated the expression of ACS2, a major ethylene biosynthesis gene, and subsequently promoted ethylene production. Collectively, our work reveals that EIN3, ORE1 and CCGs constitute a coherent feed-forward loop involving in the robust regulation of ethylene-mediated chlorophyll degradation during leaf senescence in Arabidopsis.

Journal ArticleDOI
TL;DR: It is concluded that co-transcriptional R loops and R loop-mediated DNA damage greatly contribute to genome instability and that one major function of the FA pathway is to protect cells from R loops.
Abstract: Co-transcriptional RNA-DNA hybrids (R loops) cause genome instability. To prevent harmful R loop accumulation, cells have evolved specific eukaryotic factors, one being the BRCA2 double-strand break repair protein. As BRCA2 also protects stalled replication forks and is the FANCD1 member of the Fanconi Anemia (FA) pathway, we investigated the FA role in R loop-dependent genome instability. Using human and murine cells defective in FANCD2 or FANCA and primary bone marrow cells from FANCD2 deficient mice, we show that the FA pathway removes R loops, and that many DNA breaks accumulated in FA cells are R loop-dependent. Importantly, FANCD2 foci in untreated and MMC-treated cells are largely R loop dependent, suggesting that the FA functions at R loop-containing sites. We conclude that co-transcriptional R loops and R loop-mediated DNA damage greatly contribute to genome instability and that one major function of the FA pathway is to protect cells from R loops.

Journal ArticleDOI
TL;DR: Using a binomial model to assess allelic expression, a continuum between complete silencing and expression from the inactive X (Xi) is demonstrated, and common escape genes and genes with significant differences in XCI status between tissues were identified.
Abstract: X chromosome inactivation (XCI) silences most genes on one X chromosome in female mammals, but some genes escape XCI. To identify escape genes in vivo and to explore molecular mechanisms that regulate this process we analyzed the allele-specific expression and chromatin structure of X-linked genes in mouse tissues and cells with skewed XCI and distinguishable alleles based on single nucleotide polymorphisms. Using a binomial model to assess allelic expression, we demonstrate a continuum between complete silencing and expression from the inactive X (Xi). The validity of the RNA-seq approach was verified using RT-PCR with species-specific primers or Sanger sequencing. Both common escape genes and genes with significant differences in XCI status between tissues were identified. Such genes may be candidates for tissue-specific sex differences. Overall, few genes (3–7%) escape XCI in any of the mouse tissues examined, suggesting stringent silencing and escape controls. In contrast, an in vitro system represented by the embryonic-kidney-derived Patski cell line showed a higher density of escape genes (21%), representing both kidney-specific escape genes and cell-line specific escape genes. Allele-specific RNA polymerase II occupancy and DNase I hypersensitivity at the promoter of genes on the Xi correlated well with levels of escape, consistent with an open chromatin structure at escape genes. Allele-specific CTCF binding on the Xi clustered at escape genes and was denser in brain compared to the Patski cell line, possibly contributing to a more compartmentalized structure of the Xi and fewer escape genes in brain compared to the cell line where larger domains of escape were observed.

Journal ArticleDOI
TL;DR: BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations, is introduced.
Abstract: Phylogeographic methods aim to infer migration trends and the history of sampled lineages from genetic data. Applications of phylogeography are broad, and in the context of pathogens include the reconstruction of transmission histories and the origin and emergence of outbreaks. Phylogeographic inference based on bottom-up population genetics models is computationally expensive, and as a result faster alternatives based on the evolution of discrete traits have become popular. In this paper, we show that inference of migration rates and root locations based on discrete trait models is extremely unreliable and sensitive to biased sampling. To address this problem, we introduce BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations. We illustrate the potentially severe implications of poor model choice for phylogeographic analyses by investigating the zoonotic transmission of Ebola virus. Whereas the structured coalescent analysis correctly infers that successive human Ebola outbreaks have been seeded by a large unsampled non-human reservoir population, the discrete trait analysis implausibly concludes that undetected human-to-human transmission has allowed the virus to persist over the past four decades. As genomics takes on an increasingly prominent role informing the control and prevention of infectious diseases, it will be vital that phylogeographic inference provides robust insights into transmission history.

Journal ArticleDOI
TL;DR: The results reveal the molecular and genetic basis of fish adaptation and response to hypoxia and air exposure, and reveal new aspects of neuro-endocrine-immune/metabolism regulatory networks that may help the fish to avoid cerebral inflammatory injury and maintain energy balance under Hypoxia.
Abstract: The large yellow croaker Larimichthys crocea (L. crocea) is one of the most economically important marine fish in China and East Asian countries. It also exhibits peculiar behavioral and physiological characteristics, especially sensitive to various environmental stresses, such as hypoxia and air exposure. These traits may render L. crocea a good model for investigating the response mechanisms to environmental stress. To understand the molecular and genetic mechanisms underlying the adaptation and response of L. crocea to environmental stress, we sequenced and assembled the genome of L. crocea using a bacterial artificial chromosome and whole-genome shotgun hierarchical strategy. The final genome assembly was 679 Mb, with a contig N50 of 63.11 kb and a scaffold N50 of 1.03 Mb, containing 25,401 protein-coding genes. Gene families underlying adaptive behaviours, such as vision-related crystallins, olfactory receptors, and auditory sense-related genes, were significantly expanded in the genome of L. crocea relative to those of other vertebrates. Transcriptome analyses of the hypoxia-exposed L. crocea brain revealed new aspects of neuro-endocrine-immune/metabolism regulatory networks that may help the fish to avoid cerebral inflammatory injury and maintain energy balance under hypoxia. Proteomics data demonstrate that skin mucus of the air-exposed L. crocea had a complex composition, with an unexpectedly high number of proteins (3,209), suggesting its multiple protective mechanisms involved in antioxidant functions, oxygen transport, immune defence, and osmotic and ionic regulation. Our results reveal the molecular and genetic basis of fish adaptation and response to hypoxia and air exposure. The data generated by this study will provide valuable resources for the genetic improvement of stress resistance and yield potential in L. crocea.

Journal ArticleDOI
TL;DR: An APA code where an APA event in a given cellular context is regulated by a number of parameters, including relative location to the TSS, splicing context, distance between competing pAs, surrounding cis elements and concentrations of core C/P factors are supported.
Abstract: Alternative cleavage and polyadenylation (APA) results in mRNA isoforms containing different 3’ untranslated regions (3’UTRs) and/or coding sequences. How core cleavage/polyadenylation (C/P) factors regulate APA is not well understood. Using siRNA knockdown coupled with deep sequencing, we found that several C/P factors can play significant roles in 3’UTR-APA. Whereas Pcf11 and Fip1 enhance usage of proximal poly(A) sites (pAs), CFI-25/68, PABPN1 and PABPC1 promote usage of distal pAs. Strong cis element biases were found for pAs regulated by CFI-25/68 or Fip1, and the distance between pAs plays an important role in APA regulation. In addition, intronic pAs are substantially regulated by splicing factors, with U1 mostly inhibiting C/P events in introns near the 5’ end of gene and U2 suppressing those in introns with features for efficient splicing. Furthermore, PABPN1 inhibits expression of transcripts with pAs near the transcription start site (TSS), a property possibly related to its role in RNA degradation. Finally, we found that groups of APA events regulated by C/P factors are also modulated in cell differentiation and development with distinct trends. Together, our results support an APA code where an APA event in a given cellular context is regulated by a number of parameters, including relative location to the TSS, splicing context, distance between competing pAs, surrounding cis elements and concentrations of core C/P factors.

Journal ArticleDOI
TL;DR: It is shown that in all such experiments, codons decoded by less abundant tRNAs were in fact being translated more slowly before the addition of CHX disrupted these dynamics, suggesting that conclusions from experiments in yeast using CHX may need reexamination.
Abstract: Ribosome profiling produces snapshots of the locations of actively translating ribosomes on messenger RNAs. These snapshots can be used to make inferences about translation dynamics. Recent ribosome profiling studies in yeast, however, have reached contradictory conclusions regarding the average translation rate of each codon. Some experiments have used cycloheximide (CHX) to stabilize ribosomes before measuring their positions, and these studies all counterintuitively report a weak negative correlation between the translation rate of a codon and the abundance of its cognate tRNA. In contrast, some experiments performed without CHX report strong positive correlations. To explain this contradiction, we identify unexpected patterns in ribosome density downstream of each type of codon in experiments that use CHX. These patterns are evidence that elongation continues to occur in the presence of CHX but with dramatically altered codon-specific elongation rates. The measured positions of ribosomes in these experiments therefore do not reflect the amounts of time ribosomes spend at each position in vivo. These results suggest that conclusions from experiments in yeast using CHX may need reexamination. In particular, we show that in all such experiments, codons decoded by less abundant tRNAs were in fact being translated more slowly before the addition of CHX disrupted these dynamics.

Journal ArticleDOI
TL;DR: A strong gradient in the Native American ancestry component of South American Latinos associated with country of origin and the geography of local indigenous populations is found, which can impact the understanding of population-level differences in biomedical traits and, thus, inform future medical genetic studies in the region.
Abstract: South America has a complex demographic history shaped by multiple migration and admixture events in pre- and post-colonial times. Settled over 14,000 years ago by Native Americans, South America has experienced migrations of European and African individuals, similar to other regions in the Americas. However, the timing and magnitude of these events resulted in markedly different patterns of admixture throughout Latin America. We use genome-wide SNP data for 437 admixed individuals from 5 countries (Colombia, Ecuador, Peru, Chile, and Argentina) to explore the population structure and demographic history of South American Latinos. We combined these data with population reference panels from Africa, Asia, Europe and the Americas to perform global ancestry analysis and infer the subcontinental origin of the European and Native American ancestry components of the admixed individuals. By applying ancestry-specific PCA analyses we find that most of the European ancestry in South American Latinos is from the Iberian Peninsula; however, many individuals trace their ancestry back to Italy, especially within Argentina. We find a strong gradient in the Native American ancestry component of South American Latinos associated with country of origin and the geography of local indigenous populations. For example, Native American genomic segments in Peruvians show greater affinities with Andean indigenous peoples like Quechua and Aymara, whereas Native American haplotypes from Colombians tend to cluster with Amazonian and coastal tribes from northern South America. Using ancestry tract length analysis we modeled post-colonial South American migration history as the youngest in Latin America during European colonization (9–14 generations ago), with an additional strong pulse of European migration occurring between 3 and 9 generations ago. These genetic footprints can impact our understanding of population-level differences in biomedical traits and, thus, inform future medical genetic studies in the region.

Journal ArticleDOI
TL;DR: The findings illuminate the extent of mosaicism in TSC, indicate the importance of full gene coverage and next generation sequencing for mutation detection, show that analysis of TSC-related tumors can increase the mutation detection rate, and indicate that it is not likely that a third TSC gene exists.
Abstract: Tuberous sclerosis complex (TSC) is an autosomal dominant tumor suppressor gene syndrome due to germline mutations in either TSC1 or TSC2. 10-15% of TSC individuals have no mutation identified (NMI) after thorough conventional molecular diagnostic assessment. 53 TSC subjects who were NMI were studied using next generation sequencing to search for mutations in these genes. Blood/saliva DNA including parental samples were available from all subjects, and skin tumor biopsy DNA was available from six subjects. We identified mutations in 45 of 53 subjects (85%). Mosaicism was observed in the majority (26 of 45, 58%), and intronic mutations were also unusually common, seen in 18 of 45 subjects (40%). Seventeen (38%) mutations were seen at an allele frequency < 5%, five at an allele frequency < 1%, and two were identified in skin tumor biopsies only, and were not seen at appreciable frequency in blood or saliva DNA. These findings illuminate the extent of mosaicism in TSC, indicate the importance of full gene coverage and next generation sequencing for mutation detection, show that analysis of TSC-related tumors can increase the mutation detection rate, indicate that it is not likely that a third TSC gene exists, and enable provision of genetic counseling to the substantial population of TSC individuals who are currently NMI.

Journal ArticleDOI
TL;DR: High-resolution analyses are provided showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific.
Abstract: Understanding how genetic variation affects distinct cellular phenotypes, such as gene expression levels, alternative splicing and DNA methylation levels, is essential for better understanding of complex diseases and traits. Furthermore, how inter-individual variation of DNA methylation is associated to gene expression is just starting to be studied. In this study, we use the GenCord cohort of 204 newborn Europeans' lymphoblastoid cell lines, T-cells and fibroblasts derived from umbilical cords. The samples were previously genotyped for 2.5 million SNPs, mRNA-sequenced, and assayed for methylation levels in 482,421 CpG sites. We observe that methylation sites associated to expression levels are enriched in enhancers, gene bodies and CpG island shores. We show that while the correlation between DNA methylation and gene expression can be positive or negative, it is very consistent across cell-types. However, this epigenetic association to gene expression appears more tissue-specific than the genetic effects on gene expression or DNA methylation (observed in both sharing estimations based on P-values and effect size correlations between cell-types). This predominance of genetic effects can also be reflected by the observation that allele specific expression differences between individuals dominate over tissue-specific effects. Additionally, we discover genetic effects on alternative splicing and interestingly, a large amount of DNA methylation correlating to alternative splicing, both in a tissue-specific manner. The locations of the SNPs and methylation sites involved in these associations highlight the participation of promoter proximal and distant regulatory regions on alternative splicing. Overall, our results provide high-resolution analyses showing how genome sequence variation has a broad effect on cellular phenotypes across cell-types, whereas epigenetic factors provide a secondary layer of variation that is more tissue-specific. Furthermore, the details of how this tissue-specificity may vary across inter-relations of molecular traits, and where these are occurring, can yield further insights into gene regulation and cellular biology as a whole.

Journal ArticleDOI
TL;DR: This module is not only activated after cell death induction but also after physical damage and reveals one of the earliest responses for imaginal disc regeneration, which restores tissue homeostasis.
Abstract: Upon apoptotic stimuli, epithelial cells compensate the gaps left by dead cells by activating proliferation. This has led to the proposal that dying cells signal to surrounding living cells to maintain homeostasis. Although the nature of these signals is not clear, reactive oxygen species (ROS) could act as a signaling mechanism as they can trigger pro-inflammatory responses to protect epithelia from environmental insults. Whether ROS emerge from dead cells and what is the genetic response triggered by ROS is pivotal to understand regeneration of Drosophila imaginal discs. We genetically induced cell death in wing imaginal discs, monitored the production of ROS and analyzed the signals required for repair. We found that cell death generates a burst of ROS that propagate to the nearby surviving cells. Propagated ROS activate p38 and induce tolerable levels of JNK. The activation of JNK and p38 results in the expression of the cytokines Unpaired (Upd), which triggers the JAK/STAT signaling pathway required for regeneration. Our findings demonstrate that this ROS/JNK/p38/Upd stress responsive module restores tissue homeostasis. This module is not only activated after cell death induction but also after physical damage and reveals one of the earliest responses for imaginal disc regeneration.

Journal ArticleDOI
TL;DR: A crucial role for ALP and ABCC genes in field-evolved resistance to Cry1Ac is highlighted and a novel trans-regulatory signaling mechanism responsible for modulating the expression of these pivotal genes in P. xylostella is revealed.
Abstract: Insecticidal crystal toxins derived from the soil bacterium Bacillus thuringiensis (Bt) are widely used as biopesticide sprays or expressed in transgenic crops to control insect pests. However, large-scale use of Bt has led to field-evolved resistance in several lepidopteran pests. Resistance to Bt Cry1Ac toxin in the diamondback moth, Plutella xylostella (L.), was previously mapped to a multigenic resistance locus (BtR-1). Here, we assembled the 3.15 Mb BtR-1 locus and found high-level resistance to Cry1Ac and Bt biopesticide in four independent P. xylostella strains were all associated with differential expression of a midgut membrane-bound alkaline phosphatase (ALP) outside this locus and a suite of ATP-binding cassette transporter subfamily C (ABCC) genes inside this locus. The interplay between these resistance genes is controlled by a previously uncharacterized trans-regulatory mechanism via the mitogen-activated protein kinase (MAPK) signaling pathway. Molecular, biochemical, and functional analyses have established ALP as a functional Cry1Ac receptor. Phenotypic association experiments revealed that the recessive Cry1Ac resistance was tightly linked to down-regulation of ALP, ABCC2 and ABCC3, whereas it was not linked to up-regulation of ABCC1. Silencing of ABCC2 and ABCC3 in susceptible larvae reduced their susceptibility to Cry1Ac but did not affect the expression of ALP, whereas suppression of MAP4K4, a constitutively transcriptionally-activated MAPK upstream gene within the BtR-1 locus, led to a transient recovery of gene expression thereby restoring the susceptibility in resistant larvae. These results highlight a crucial role for ALP and ABCC genes in field-evolved resistance to Cry1Ac and reveal a novel trans-regulatory signaling mechanism responsible for modulating the expression of these pivotal genes in P. xylostella.

Journal ArticleDOI
TL;DR: The translational-reporter system indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5’ UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation.
Abstract: RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS) mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5’ untranslated region (UTR) and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5’ end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5’ ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5’ UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression.

Journal ArticleDOI
TL;DR: This work finds a consistent positive relationship between the GC-content of a gene and evidence of intra-genic recombination throughout a broad spectrum of bacterial clades and shows that the evolutionary force responsible for this pattern is acting independently from selection on codon usage, and could potentially interfere with selection in favor of optimal AU-ending codons.
Abstract: The characterization of functional elements in genomes relies on the identification of the footprints of natural selection. In this quest, taking into account neutral evolutionary processes such as mutation and genetic drift is crucial because these forces can generate patterns that may obscure or mimic signatures of selection. In mammals, and probably in many eukaryotes, another such confounding factor called GC-Biased Gene Conversion (gBGC) has been documented. This mechanism generates patterns identical to what is expected under selection for higher GC-content, specifically in highly recombining genomic regions. Recent results have suggested that a mysterious selective force favouring higher GC-content exists in Bacteria but the possibility that it could be gBGC has been excluded. Here, we show that gBGC is probably at work in most if not all bacterial species. First we find a consistent positive relationship between the GC-content of a gene and evidence of intra-genic recombination throughout a broad spectrum of bacterial clades. Second, we show that the evolutionary force responsible for this pattern is acting independently from selection on codon usage, and could potentially interfere with selection in favor of optimal AU-ending codons. A comparison with data from human populations shows that the intensity of gBGC in Bacteria is comparable to what has been reported in mammals. We propose that gBGC is not restricted to sexual Eukaryotes but also widespread among Bacteria and could therefore be an ancestral feature of cellular organisms. We argue that if gBGC occurs in bacteria, it can account for previously unexplained observations, such as the apparent non-equilibrium of base substitution patterns and the heterogeneity of gene composition within bacterial genomes. Because gBGC produces patterns similar to positive selection, it is essential to take this process into account when studying the evolutionary forces at work in bacterial genomes.

Journal ArticleDOI
TL;DR: This finding serves as both a cautionary tale about interpreting results from unsupervised clustering algorithms, and suggests that social constructions are contributing directly to genetic differentiation over a relatively short time period among previously genetically similar groups.
Abstract: The Ari peoples of Ethiopia are comprised of different occupational groups that can be distinguished genetically, with Ari Cultivators and the socially marginalised Ari Blacksmiths recently shown to have a similar level of genetic differentiation between them (FST ≈ 0.023 - 0.04) as that observed among multiple ethnic groups sampled throughout Ethiopia. Anthropologists have proposed two competing theories to explain the origins of the Ari Blacksmiths as (i) remnants of a population that inhabited Ethiopia prior to the arrival of agriculturists (e.g. Cultivators), or (ii) relatively recently related to the Cultivators but presently marginalized in the community due to their trade. Two recent studies by different groups analysed genome-wide DNA from samples of Ari Blacksmiths and Cultivators and suggested that genetic patterns between the two groups were more consistent with model (i) and subsequent assimilation of the indigenous peoples into the expanding agriculturalist community. We analysed the same samples using approaches designed to attenuate signals of genetic differentiation that are attributable to allelic drift within a population. By doing so, we provide evidence that the genetic differences between Ari Blacksmiths and Cultivators can be entirely explained by bottleneck effects consistent with hypothesis (ii). This finding serves as both a cautionary tale about interpreting results from unsupervised clustering algorithms, and suggests that social constructions are contributing directly to genetic differentiation over a relatively short time period among previously genetically similar groups.

Journal ArticleDOI
TL;DR: This study sequenced the genomes of all 14 accepted species and used comparative genomics against a broad panel of fungal genomes to comprehensively identify distinct features that define the Malassezia gene repertoire: gene gain and loss; selection signatures; and lineage-specific gene family expansions.
Abstract: Malassezia is a unique lipophilic genus in class Malasseziomycetes in Ustilaginomycotina, (Basidiomycota, fungi) that otherwise consists almost exclusively of plant pathogens. Malassezia are typically isolated from warm-blooded animals, are dominant members of the human skin mycobiome and are associated with common skin disorders. To characterize the genetic basis of the unique phenotypes of Malassezia spp., we sequenced the genomes of all 14 accepted species and used comparative genomics against a broad panel of fungal genomes to comprehensively identify distinct features that define the Malassezia gene repertoire: gene gain and loss; selection signatures; and lineage-specific gene family expansions. Our analysis revealed key gene gain events (64) with a single gene conserved across all Malassezia but absent in all other sequenced Basidiomycota. These likely horizontally transferred genes provide intriguing gain-of-function events and prime candidates to explain the emergence of Malassezia. A larger set of genes (741) were lost, with enrichment for glycosyl hydrolases and carbohydrate metabolism, concordant with adaptation to skin's carbohydrate-deficient environment. Gene family analysis revealed extensive turnover and underlined the importance of secretory lipases, phospholipases, aspartyl proteases, and other peptidases. Combining genomic analysis with a re-evaluation of culture characteristics, we establish the likely lipid-dependence of all Malassezia. Our phylogenetic analysis sheds new light on the relationship between Malassezia and other members of Ustilaginomycotina, as well as phylogenetic lineages within the genus. Overall, our study provides a unique genomic resource for understanding Malassezia niche-specificity and potential virulence, as well as their abundance and distribution in the environment and on human skin.

Journal ArticleDOI
TL;DR: It is shown that perturbation of Z MIZ1 expression in human islets and beta-cells influences exocytosis and insulin secretion, highlighting a novel role for ZMIZ1 in the maintenance of glucose homeostasis.
Abstract: The intersection of genome-wide association analyses with physiological and functional data indicates that variants regulating islet gene transcription influence type 2 diabetes (T2D) predisposition and glucose homeostasis. However, the specific genes through which these regulatory variants act remain poorly characterized. We generated expression quantitative trait locus (eQTL) data in 118 human islet samples using RNA-sequencing and high-density genotyping. We identified fourteen loci at which cis-exon-eQTL signals overlapped active islet chromatin signatures and were coincident with established T2D and/or glycemic trait associations. ‎At some, these data provide an experimental link between GWAS signals and biological candidates, such as DGKB and ADCY5. At others, the cis-signals implicate genes with no prior connection to islet biology, including WARS and ZMIZ1. At the ZMIZ1 locus, we show that perturbation of ZMIZ1 expression in human islets and beta-cells influences exocytosis and insulin secretion, highlighting a novel role for ZMIZ1 in the maintenance of glucose homeostasis. Together, these findings provide a significant advance in the mechanistic insights of T2D and glycemic trait association loci.

Journal ArticleDOI
TL;DR: Noise-robust analyses of 24 studies of budding yeast reveal that mRNA levels explain more than 85% of the variation in steady-state protein levels, substantially revise widely credited models of protein-level regulation, and introduce multiple noise-aware approaches essential for proper analysis of many biological phenomena.
Abstract: Cells respond to their environment by modulating protein levels through mRNA transcription and post-transcriptional control. Modest observed correlations between global steady-state mRNA and protein measurements have been interpreted as evidence that mRNA levels determine roughly 40% of the variation in protein levels, indicating dominant post-transcriptional effects. However, the techniques underlying these conclusions, such as correlation and regression, yield biased results when data are noisy, missing systematically, and collinear---properties of mRNA and protein measurements---which motivated us to revisit this subject. Noise-robust analyses of 24 studies of budding yeast reveal that mRNA levels explain more than 85% of the variation in steady-state protein levels. Protein levels are not proportional to mRNA levels, but rise much more rapidly. Regulation of translation suffices to explain this nonlinear effect, revealing post-transcriptional amplification of, rather than competition with, transcriptional signals. These results substantially revise widely credited models of protein-level regulation, and introduce multiple noise-aware approaches essential for proper analysis of many biological phenomena.

Journal ArticleDOI
TL;DR: This work determines for the first time the E. coli RNA structurome, highlighting the contribution of mRNA secondary structure as a direct effector of a variety of processes, including translation and mRNA degradation.
Abstract: Messenger RNA acts as an informational molecule between DNA and translating ribosomes. Emerging evidence places mRNA in central cellular processes beyond its major function as informational entity. Although individual examples show that specific structural features of mRNA regulate translation and transcript stability, their role and function throughout the bacterial transcriptome remains unknown. Combining three sequencing approaches to provide a high resolution view of global mRNA secondary structure, translation efficiency and mRNA abundance, we unraveled structural features in E. coli mRNA with implications in translation and mRNA degradation. A poorly structured site upstream of the coding sequence serves as an additional unspecific binding site of the ribosomes and the degree of its secondary structure propensity negatively correlates with gene expression. Secondary structures within coding sequences are highly dynamic and influence translation only within a very small subset of positions. A secondary structure upstream of the stop codon is enriched in genes terminated by UAA codon with likely implications in translation termination. The global analysis further substantiates a common recognition signature of RNase E to initiate endonucleolytic cleavage. This work determines for the first time the E. coli RNA structurome, highlighting the contribution of mRNA secondary structure as a direct effector of a variety of processes, including translation and mRNA degradation.