scispace - formally typeset
Search or ask a question

Showing papers in "Nature Genetics in 2015"


Journal ArticleDOI
TL;DR: It is found that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size, and the LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control.
Abstract: Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.

3,708 citations


Journal ArticleDOI
TL;DR: This work introduces a technique—cross-trait LD Score regression—for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap, and uses this method to estimate 276 genetic correlations among 24 traits.
Abstract: Identifying genetic correlations between complex traits and diseases can provide useful etiological insights and help prioritize likely causal relationships. The major challenges preventing estimation of genetic correlation from genome-wide association study (GWAS) data with current methods are the lack of availability of individual-level genotype data and widespread sample overlap among meta-analyses. We circumvent these difficulties by introducing a technique-cross-trait LD Score regression-for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap. We use this method to estimate 276 genetic correlations among 24 traits. The results include genetic correlations between anorexia nervosa and schizophrenia, anorexia and obesity, and educational attainment and several diseases. These results highlight the power of genome-wide analyses, as there currently are no significantly associated SNPs for anorexia nervosa and only three for educational attainment.

2,993 citations


Journal ArticleDOI
TL;DR: The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.
Abstract: Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.

2,209 citations


Journal ArticleDOI
TL;DR: A new method is introduced, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers, which is computationally tractable at very large sample sizes and leverages genome-wide information.
Abstract: Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.

1,939 citations


Journal ArticleDOI
Majid Nikpay1, Anuj Goel2, Won H-H.3, Leanne M. Hall4  +164 moreInstitutions (60)
TL;DR: This article conducted a meta-analysis of coronary artery disease (CAD) cases and controls, interrogating 6.7 million common (minor allele frequency (MAF) > 0.05) and 2.7 millions low-frequency (0.005 < MAF < 0.5) variants.
Abstract: Existing knowledge of genetic variants affecting risk of coronary artery disease (CAD) is largely based on genome-wide association study (GWAS) analysis of common SNPs. Leveraging phased haplotypes from the 1000 Genomes Project, we report a GWAS meta-analysis of ∼185,000 CAD cases and controls, interrogating 6.7 million common (minor allele frequency (MAF) > 0.05) and 2.7 million low-frequency (0.005 < MAF < 0.05) variants. In addition to confirming most known CAD-associated loci, we identified ten new loci (eight additive and two recessive) that contain candidate causal genes newly implicating biological processes in vessel walls. We observed intralocus allelic heterogeneity but little evidence of low-frequency variants with larger effects and no evidence of synthetic association. Our analysis provides a comprehensive survey of the fine genetic architecture of CAD, showing that genetic susceptibility to this common disease is largely determined by common SNPs of small effect size.

1,839 citations


Journal ArticleDOI
TL;DR: The first trans-ancestry association study of IBD is reported, with genome-wide or Immunochip genotype data from an extended cohort of 86,640 European individuals and immunochip data from 9,846 individuals of East Asian, Indian or Iranian descent, implicate 38 loci in IBD risk for the first time.
Abstract: Ulcerative colitis and Crohn's disease are the two main forms of inflammatory bowel disease (IBD). Here we report the first trans-ancestry association study of IBD, with genome-wide or Immunochip genotype data from an extended cohort of 86,640 European individuals and Immunochip data from 9,846 individuals of East Asian, Indian or Iranian descent. We implicate 38 loci in IBD risk for the first time. For the majority of the IBD risk loci, the direction and magnitude of effect are consistent in European and non-European cohorts. Nevertheless, we observe genetic heterogeneity between divergent populations at several established risk loci driven by differences in allele frequency (NOD2) or effect size (TNFSF15 and ATG16L1) or a combination of these factors (IL23R and IRGM). Our results provide biological insights into the pathogenesis of IBD and demonstrate the usefulness of trans-ancestry association studies for mapping loci associated with complex diseases and understanding genetic architecture across diverse populations.

1,826 citations


Journal ArticleDOI
TL;DR: This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts.
Abstract: Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All the results can be visualized using the MaTCH webtool.

1,607 citations


Journal ArticleDOI
TL;DR: The results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.
Abstract: Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual's genetic profile and correlates 'imputed' gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys the benefits of gene-based approaches such as reduced multiple-testing burden and a principled approach to the design of follow-up experiments. Our results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.

1,372 citations


Journal ArticleDOI
TL;DR: By digitally separating tumor, stromal and normal gene expression, two tumor subtypes are identified and validated, including a 'basal-like' subtype that has worse outcome and is molecularly similar to basal tumors in bladder and breast cancers.
Abstract: Pancreatic ductal adenocarcinoma (PDAC) remains a lethal disease with a 5-year survival rate of 4%. A key hallmark of PDAC is extensive stromal involvement, which makes capturing precise tumor-specific molecular information difficult. Here we have overcome this problem by applying blind source separation to a diverse collection of PDAC gene expression microarray data, including data from primary tumor, metastatic and normal samples. By digitally separating tumor, stromal and normal gene expression, we have identified and validated two tumor subtypes, including a 'basal-like' subtype that has worse outcome and is molecularly similar to basal tumors in bladder and breast cancers. Furthermore, we define 'normal' and 'activated' stromal subtypes, which are independently prognostic. Our results provide new insights into the molecular composition of PDAC, which may be used to tailor therapies or provide decision support in a clinical setting where the choice and timing of therapies are critical.

1,333 citations


Journal ArticleDOI
TL;DR: Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1, and defined the extensive landscape of altered genes and pathways in HCC.
Abstract: Genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. Analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereas FGF3, FGF4, FGF19 or CCND1 amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)-approved drugs. In conclusion, we identified risk factor-specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.

1,265 citations


Journal ArticleDOI
TL;DR: BOLT-LMM is presented, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes.
Abstract: Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

Journal ArticleDOI
TL;DR: It is estimated that selecting genetically supported targets could double the success rate in clinical development, and using the growing wealth of human genetic data to select the best targets and indications should have a measurable impact on the successful development of new drugs.
Abstract: Over a quarter of drugs that enter clinical development fail because they are ineffective. Growing insight into genes that influence human disease may affect how drug targets and indications are selected. However, there is little guidance about how much weight should be given to genetic evidence in making these key decisions. To answer this question, we investigated how well the current archive of genetic evidence predicts drug mechanisms. We found that, among well-studied indications, the proportion of drug mechanisms with direct genetic support increases significantly across the drug development pipeline, from 2.0% at the preclinical stage to 8.2% among mechanisms for approved drugs, and varies dramatically among disease areas. We estimate that selecting genetically supported targets could double the success rate in clinical development. Therefore, using the growing wealth of human genetic data to select the best targets and indications should have a measurable impact on the successful development of new drugs.

Journal ArticleDOI
TL;DR: In this article, the authors use Capture Hi-C (CHi-C) to examine the long-range interactions of almost 22,000 promoters in 2 human blood cell types and identify over 1.6 million shared and cell type-restricted interactions spanning hundreds of kilobases between promoters and distal loci.
Abstract: Transcriptional control in large genomes often requires looping interactions between distal DNA elements, such as enhancers and target promoters. Current chromosome conformation capture techniques do not offer sufficiently high resolution to interrogate these regulatory interactions on a genomic scale. Here we use Capture Hi-C (CHi-C), an adapted genome conformation assay, to examine the long-range interactions of almost 22,000 promoters in 2 human blood cell types. We identify over 1.6 million shared and cell type-restricted interactions spanning hundreds of kilobases between promoters and distal loci. Transcriptionally active genes contact enhancer-like elements, whereas transcriptionally inactive genes interact with previously uncharacterized elements marked by repressive features that may act as long-range silencers. Finally, we show that interacting loci are enriched for disease-associated SNPs, suggesting how distal mutations may disrupt the regulation of relevant genes. This study provides new insights and accessible tools to dissect the regulatory interactions that underlie normal and aberrant gene regulation.

Journal ArticleDOI
TL;DR: A 'Big Bang' model is presented, whereby tumors grow predominantly as a single expansion producing numerous intermixed subclones that are not subject to stringent selection and where both public and most detectable private alterations arise early during growth.
Abstract: What happens in early, still undetectable human malignancies is unknown because direct observations are impractical. Here we present and validate a 'Big Bang' model, whereby tumors grow predominantly as a single expansion producing numerous intermixed subclones that are not subject to stringent selection and where both public (clonal) and most detectable private (subclonal) alterations arise early during growth. Genomic profiling of 349 individual glands from 15 colorectal tumors showed an absence of selective sweeps, uniformly high intratumoral heterogeneity (ITH) and subclone mixing in distant regions, as postulated by our model. We also verified the prediction that most detectable ITH originates from early private alterations and not from later clonal expansions, thus exposing the profile of the primordial tumor. Moreover, some tumors appear 'born to be bad', with subclone mixing indicative of early malignant potential. This new model provides a quantitative framework to interpret tumor growth dynamics and the origins of ITH, with important clinical implications.

Journal ArticleDOI
TL;DR: It is shown that the use of TGF-β signaling inhibitors to block the cross-talk between cancer cells and the microenvironment halts disease progression, and all poor-prognosis CRC subtypes share a gene program induced by T GF-β in tumor stromal cells.
Abstract: Recent molecular classifications of colorectal cancer (CRC) based on global gene expression profiles have defined subtypes displaying resistance to therapy and poor prognosis. Upon evaluation of these classification systems, we discovered that their predictive power arises from genes expressed by stromal cells rather than epithelial tumor cells. Bioinformatic and immunohistochemical analyses identify stromal markers that associate robustly with disease relapse across the various classifications. Functional studies indicate that cancer-associated fibroblasts (CAFs) increase the frequency of tumor-initiating cells, an effect that is dramatically enhanced by transforming growth factor (TGF)-β signaling. Likewise, we find that all poor-prognosis CRC subtypes share a gene program induced by TGF-β in tumor stromal cells. Using patient-derived tumor organoids and xenografts, we show that the use of TGF-β signaling inhibitors to block the cross-talk between cancer cells and the microenvironment halts disease progression.

Journal ArticleDOI
TL;DR: The subgroup with the poorest prognosis had significant enrichment of hypermutated tumors and a characteristic elevation in the expression of immune checkpoint molecules, suggesting immune-modulating therapies might also be potentially promising options for these patients.
Abstract: The incidence of biliary tract cancer (BTC), including intrahepatic (ICC) and extrahepatic (ECC) cholangiocarcinoma and gallbladder cancer, has increased globally; however, no effective targeted molecular therapies have been approved at the present time. Here we molecularly characterized 260 BTCs and uncovered spectra of genomic alterations that included new potential therapeutic targets. Gradient spectra of mutational signatures with a higher burden of the APOBEC-associated mutation signature were observed in gallbladder cancer and ECC. Thirty-two significantly altered genes, including ELF3, were identified, and nearly 40% of cases harbored targetable genetic alterations. Gene fusions involving FGFR2 and PRKACA or PRKACB preferentially occurred in ICC and ECC, respectively, and the subtype-associated prevalence of actionable growth factor-mediated signals was noteworthy. The subgroup with the poorest prognosis had significant enrichment of hypermutated tumors and a characteristic elevation in the expression of immune checkpoint molecules. Accordingly, immune-modulating therapies might also be potentially promising options for these patients.

Journal ArticleDOI
TL;DR: A pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas is performed using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches.
Abstract: Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.

Journal ArticleDOI
TL;DR: Convergent evolution of the mycorrhizal habit in fungi occurred via the repeated evolution of a 'symbiosis toolkit', with reduced numbers of PCWDEs and lineage-specific suites of myCorrhiza-induced genes.
Abstract: To elucidate the genetic bases of mycorrhizal lifestyle evolution, we sequenced new fungal genomes, including 13 ectomycorrhizal (ECM), orchid (ORM) and ericoid (ERM) species, and five saprotrophs, which we analyzed along with other fungal genomes. Ectomycorrhizal fungi have a reduced complement of genes encoding plant cell wall-degrading enzymes (PCWDEs), as compared to their ancestral wood decayers. Nevertheless, they have retained a unique array of PCWDEs, thus suggesting that they possess diverse abilities to decompose lignocellulose. Similar functional categories of nonorthologous genes are induced in symbiosis. Of induced genes, 7-38% are orphan genes, including genes that encode secreted effector-like proteins. Convergent evolution of the mycorrhizal habit in fungi occurred via the repeated evolution of a 'symbiosis toolkit', with reduced numbers of PCWDEs and lineage-specific suites of mycorrhiza-induced genes.

Journal ArticleDOI
TL;DR: This study provides the first survey of clock-like mutational processes operating in human somatic cells, using mutations from 10,250 cancer genomes across 36 cancer types to investigate Clock-like Mutational processes that have been operating in normal human cells.
Abstract: During the course of a lifetime, somatic cells acquire mutations. Different mutational processes may contribute to the mutations accumulated in a cell, with each imprinting a mutational signature on the cell's genome. Some processes generate mutations throughout life at a constant rate in all individuals, and the number of mutations in a cell attributable to these processes will be proportional to the chronological age of the person. Using mutations from 10,250 cancer genomes across 36 cancer types, we investigated clock-like mutational processes that have been operating in normal human cells. Two mutational signatures show clock-like properties. Both exhibit different mutation rates in different tissues. However, their mutation rates are not correlated, indicating that the underlying processes are subject to different biological influences. For one signature, the rate of cell division may influence its mutation rate. This study provides the first survey of clock-like mutational processes operating in human somatic cells.

Journal ArticleDOI
TL;DR: It is demonstrated using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation, and evidence that height- and BMI-associated variants have been under natural selection is found.
Abstract: We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ∼17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60-70% for height and 30-40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.

Journal ArticleDOI
TL;DR: NetWAS is introduced, which combines genes with nominally significant genome-wide association study (GWAS) P values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone.
Abstract: Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, identify the changing functional roles of genes across tissues and illuminate relationships among diseases. We introduce NetWAS, which combines genes with nominally significant genome-wide association study (GWAS) P values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than a hundred human tissues and cell types.

Journal ArticleDOI
TL;DR: In this article, the authors delineate the entire picture of genetic alterations and affected pathways in these glioma types, with sensitive detection of driver genes Grade II and III gliomas comprise three distinct subtypes characterized by discrete sets of mutations and distinct clinical behaviors, suggesting that there is functional interplay between the mutations that drive clonal selection.
Abstract: Grade II and III gliomas are generally slowly progressing brain cancers, many of which eventually transform into more aggressive tumors Despite recent findings of frequent mutations in IDH1 and other genes, knowledge about their pathogenesis is still incomplete Here, combining two large sets of high-throughput sequencing data, we delineate the entire picture of genetic alterations and affected pathways in these glioma types, with sensitive detection of driver genes Grade II and III gliomas comprise three distinct subtypes characterized by discrete sets of mutations and distinct clinical behaviors Mutations showed significant positive and negative correlations and a chronological hierarchy, as inferred from different allelic burdens among coexisting mutations, suggesting that there is functional interplay between the mutations that drive clonal selection Extensive serial and multi-regional sampling analyses further supported this finding and also identified a high degree of temporal and spatial heterogeneity generated during tumor expansion and relapse, which is likely shaped by the complex but ordered processes of multiple clonal selection and evolutionary events

Journal ArticleDOI
TL;DR: The insights gained from sequencing the whole genomes of Icelanders to a median depth of 20× provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.
Abstract: Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.

Journal ArticleDOI
TL;DR: This study comprised 7,219 cases and 15,991 controls of European ancestry, constituting a new genome-wide association study, a meta-analysis with a published GWAS and a replication study, which mapped 43 susceptibility loci, including ten new associations.
Abstract: Systemic lupus erythematosus (SLE) is a genetically complex autoimmune disease characterized by loss of immune tolerance to nuclear and cell surface antigens. Previous genome-wide association studies (GWAS) had modest sample sizes, reducing their scope and reliability. Our study comprised 7,219 cases and 15,991 controls of European ancestry, constituting a new GWAS, a meta-analysis with a published GWAS and a replication study. We have mapped 43 susceptibility loci, including ten new associations. Assisted by dense genome coverage, imputation provided evidence for missense variants underpinning associations in eight genes. Other likely causal genes were established by examining associated alleles for cis-acting eQTL effects in a range of ex vivo immune cells. We found an over-representation (n = 16) of transcription factors among SLE susceptibility genes. This finding supports the view that aberrantly regulated gene expression networks in multiple cell types in both the innate and adaptive immune response contribute to the risk of developing SLE.

Journal ArticleDOI
TL;DR: The identified alterations overlap significantly with the HTLV-1 Tax interactome and are highly enriched for T cell receptor–NF-κB signaling, T cell trafficking and other T cell–related pathways as well as immunosurveillance.
Abstract: Adult T cell leukemia/lymphoma (ATL) is a peripheral T cell neoplasm of largely unknown genetic basis, associated with human T cell leukemia virus type-1 (HTLV-1) infection. Here we describe an integrated molecular study in which we performed whole-genome, exome, transcriptome and targeted resequencing, as well as array-based copy number and methylation analyses, in a total of 426 ATL cases. The identified alterations overlap significantly with the HTLV-1 Tax interactome and are highly enriched for T cell receptor-NF-κB signaling, T cell trafficking and other T cell-related pathways as well as immunosurveillance. Other notable features include a predominance of activating mutations (in PLCG1, PRKCB, CARD11, VAV1, IRF4, FYN, CCR4 and CCR7) and gene fusions (CTLA4-CD28 and ICOS-CD28). We also discovered frequent intragenic deletions involving IKZF2, CARD11 and TP73 and mutations in GATA3, HNRNPA2B1, GPR183, CSNK2A1, CSNK2B and CSNK1A1. Our findings not only provide unique insights into key molecules in T cell signaling but will also guide the development of new diagnostics and therapeutics in this intractable tumor.

Journal ArticleDOI
TL;DR: It is that there is not yet strong evidence that super-enhancers are a novel paradigm in gene regulation and that use of the term in this context is not currently justified, but the term likely identifies strong enhancers that exhibit behaviors consistent with previous models and concepts of transcriptional regulation.
Abstract: The term 'super-enhancer' has been used to describe groups of putative enhancers in close genomic proximity with unusually high levels of Mediator binding, as measured by chromatin immunoprecipitation and sequencing (ChIP-seq). Here we review the identification and composition of super-enhancers, describe links between super-enhancers, gene regulation and disease, and discuss the functional significance of enhancer clustering. We also provide our perspective regarding the proposition that super-enhancers are a regulatory entity conceptually distinct from what was known before the introduction of the term. Our opinion is that there is not yet strong evidence that super-enhancers are a novel paradigm in gene regulation and that use of the term in this context is not currently justified. However, the term likely identifies strong enhancers that exhibit behaviors consistent with previous models and concepts of transcriptional regulation. In this respect, the super-enhancer definition is useful in identifying regulatory elements likely to control genes important for cell type specification.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach was used to define credible sets for the T1D-associated SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34(+) stem cells.
Abstract: Genetic studies of type 1 diabetes (T1D) have identified 50 susceptibility regions, finding major pathways contributing to risk, with some loci shared across immune disorders. To make genetic comparisons across autoimmune disorders as informative as possible, a dense genotyping array, the Immunochip, was developed, from which we identified four new T1D-associated regions (P < 5 × 10(-8)). A comparative analysis with 15 immune diseases showed that T1D is more similar genetically to other autoantibody-positive diseases, significantly most similar to juvenile idiopathic arthritis and significantly least similar to ulcerative colitis, and provided support for three additional new T1D risk loci. Using a Bayesian approach, we defined credible sets for the T1D-associated SNPs. The associated SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34(+) stem cells. Enhancer-promoter interactions can now be analyzed in these cell types to identify which particular genes and regulatory sequences are causal.

Journal ArticleDOI
TL;DR: It is found that NRF2 controls the expression of the key serine/glycine biosynthesis enzyme genes PHGDH, PSAT1 and SHMT2 via ATF4 to support glutathione and nucleotide production and it is shown that expression of these genes confers poor prognosis in human NSCLC.
Abstract: Tumors have high energetic and anabolic needs for rapid cell growth and proliferation, and the serine biosynthetic pathway was recently identified as an important source of metabolic intermediates for these processes. We integrated metabolic tracing and transcriptional profiling of a large panel of non-small cell lung cancer (NSCLC) cell lines to characterize the activity and regulation of the serine/glycine biosynthetic pathway in NSCLC. Here we show that the activity of this pathway is highly heterogeneous and is regulated by NRF2, a transcription factor frequently deregulated in NSCLC. We found that NRF2 controls the expression of the key serine/glycine biosynthesis enzyme genes PHGDH, PSAT1 and SHMT2 via ATF4 to support glutathione and nucleotide production. Moreover, we show that expression of these genes confers poor prognosis in human NSCLC. Thus, a substantial fraction of human NSCLCs activates an NRF2-dependent transcriptional program that regulates serine and glycine metabolism and is linked to clinical aggressiveness.

Journal ArticleDOI
TL;DR: This analysis identifies a second class of candidate genes (for example, RIMS1, CUL7 and LZTR1) where transmitted mutations may create a sensitized background but are unlikely to be completely penetrant, and private truncating SNVs and rare, inherited CNVs are statistically independent risk factors for autism.
Abstract: To assess the relative impact of inherited and de novo variants on autism risk, we generated a comprehensive set of exonic single-nucleotide variants (SNVs) and copy number variants (CNVs) from 2,377 families with autism. We find that private, inherited truncating SNVs in conserved genes are enriched in probands (odds ratio = 1.14, P = 0.0002) in comparison to unaffected siblings, an effect involving significant maternal transmission bias to sons. We also observe a bias for inherited CNVs, specifically for small (<100 kb), maternally inherited events (P = 0.01) that are enriched in CHD8 target genes (P = 7.4 × 10(-3)). Using a logistic regression model, we show that private truncating SNVs and rare, inherited CNVs are statistically independent risk factors for autism, with odds ratios of 1.11 (P = 0.0002) and 1.23 (P = 0.01), respectively. This analysis identifies a second class of candidate genes (for example, RIMS1, CUL7 and LZTR1) where transmitted mutations may create a sensitized background but are unlikely to be completely penetrant.

Journal ArticleDOI
TL;DR: Analysis of CRC expression data from patient-derived xenografts shows that the distinctive transcriptional and clinical features of the SSM subtype can be ascribed to its particularly abundant stromal component.
Abstract: Recent studies identified a poor-prognosis stem/serrated/mesenchymal (SSM) transcriptional subtype of colorectal cancer (CRC). We noted that genes upregulated in this subtype are also prominently expressed by stromal cells, suggesting that SSM transcripts could derive from stromal rather than epithelial cancer cells. To test this hypothesis, we analyzed CRC expression data from patient-derived xenografts, where mouse stroma supports human cancer cells. Species-specific expression analysis showed that the mRNA levels of SSM genes were mostly due to stromal expression. Transcriptional signatures built to specifically report the abundance of cancer-associated fibroblasts (CAFs), leukocytes or endothelial cells all had significantly higher expression in human CRC samples of the SSM subtype. High expression of the CAF signature was associated with poor prognosis in untreated CRC, and joint high expression of the stromal signatures predicted resistance to radiotherapy in rectal cancer. These data show that the distinctive transcriptional and clinical features of the SSM subtype can be ascribed to its particularly abundant stromal component.