scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Genetic association studies of alterations in protein function expose recessive effects on cancer predisposition

21 Jul 2021-Scientific Reports (Nature Publishing Group)-Vol. 11, Iss: 1, pp 14901
TL;DR: In this article, a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable was conducted.
Abstract: The characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Studies of genetic cancer predisposition typically identify significant genomic regions based on family-based cohorts or genome-wide association studies (GWAS). However, the results of such studies rarely provide biological insight or functional interpretation. In this study, we conducted a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable. Specifically, we conducted proteome-wide association studies (PWAS) to identify genetic associations mediated by alterations to protein function. With PWAS, we identified 110 significant gene-cancer associations in 70 unique genomic regions across nine cancer types and pan-cancer. In 48 of the 110 PWAS associations (44%), estimated gene damage is associated with reduced rather than elevated cancer risk, suggesting a protective effect. Together with standard GWAS, we implicated 145 unique genomic loci with cancer risk. While most of these genomic regions are supported by external evidence, our results also highlight many novel loci. Based on the capacity of PWAS to detect non-additive genetic effects, we found that 46% of the PWAS-significant cancer regions exhibited exclusive recessive inheritance. These results highlight the importance of recessive genetic effects, without relying on familial studies. Finally, we show that many of the detected genes exert substantial cancer risk in the studied cohort determined by a quantitative functional description, suggesting their relevance for diagnosis and genetic consulting.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: In this article, the authors highlight the major open problems that need to be solved to improve our understanding of the genetic variation underlying human traits, and by discussing these challenges provide a primer to the field.
Abstract: Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and opened the door for numerous breakthroughs in biology, medicine and other scientific fields. And yet, the ultimate promise of this area of research is still not fully realized. In this review, we highlight the major open problems that need to be solved to improve our understanding of the genetic variation underlying human traits, and by discussing these challenges provide a primer to the field. Our focus is on concrete analytical problems, both conceptual and technical in nature. We cover general issues in genetic studies such as population structure, epistasis and gene-environment interactions, data-related issues such as ethnic diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies and polygenic risk scores. We emphasize the interconnectedness of these open problems and suggest promising avenues to address them.

17 citations

Journal ArticleDOI
TL;DR: In this paper , the authors highlight the major open problems that need to be solved, and by discussing these challenges provide a primer to the field and suggest promising avenues to address them.
Abstract: Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and yet, the genetics of most traits is still poorly understood. In this review, we highlight the major open problems that need to be solved, and by discussing these challenges provide a primer to the field. We cover general issues such as population structure, epistasis and gene-environment interactions, data-related issues such as ancestry diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies, and polygenic risk scores. We emphasize the interconnectedness of these problems and suggest promising avenues to address them.

15 citations

Posted ContentDOI
26 Aug 2022-bioRxiv
TL;DR: ESM1b was able to distinguish between pathogenic and benign variants across ∼150K variants annotated in ClinVar and HGMD, outperforming existing state-of-the-art methods and exceeded the state of the art at predicting the experimental results of deep mutational scans.
Abstract: Distinguishing between damaging and neutral missense variants is an ongoing challenge in human genetics, with profound implications for clinical diagnosis, genetic studies and protein engineering. Recently, deep-learning models have achieved state-of-the-art performance in classifying variants as pathogenic or benign. However, these models are currently unable to provide predictions over all missense variants, either because of dependency on close protein homologs or due to software limitations. Here we leveraged ESM1b, a 650M-parameter protein language model, to predict the functional impact of human coding variation at scale. To overcome existing technical limitations, we developed a modified ESM1b workflow and functionalized, for the first time, all proteins in the human genome, resulting in predictions for all ∼450M possible missense variant effects. ESM1b was able to distinguish between pathogenic and benign variants across ∼150K variants annotated in ClinVar and HGMD, outperforming existing state-of-the-art methods. ESM1b also exceeded the state of the art at predicting the experimental results of deep mutational scans. We further annotated ∼2M variants across ∼9K alternatively-spliced genes as damaging in certain protein isoforms while neutral in others, demonstrating the importance of considering all isoforms when functionalizing variant effects. The complete catalog of variant effect predictions is available at: https://huggingface.co/spaces/ntranoslab/esm_variants.

12 citations

Journal ArticleDOI
TL;DR: This article proposed a probabilistic approach to infer the parent-of-origin of individual alleles that does not require parental genomes nor prior knowledge of genealogy, using identity-by-descent sharing with second and third-degree relatives to assign alleles to parental groups and leverages chromosome X data in males to distinguish maternal from paternal groups.
Abstract: Identical genetic variations can have different phenotypic effects depending on their parent of origin. Yet, studies focusing on parent-of-origin effects have been limited in terms of sample size due to the lack of parental genomes or known genealogies. We propose a probabilistic approach to infer the parent-of-origin of individual alleles that does not require parental genomes nor prior knowledge of genealogy. Our model uses Identity-By-Descent sharing with second- and third-degree relatives to assign alleles to parental groups and leverages chromosome X data in males to distinguish maternal from paternal groups. We combine this with robust haplotype inference and haploid imputation to infer the parent-of-origin for 26,393 UK Biobank individuals. We screen 99 phenotypes for parent-of-origin effects and replicate the discoveries of 6 GWAS studies, confirming signals on body mass index, type 2 diabetes, standing height and multiple blood biomarkers, including the known maternal effect at the MEG3/DLK1 locus on platelet phenotypes. We also report a novel maternal effect at the TERT gene on telomere length, thereby providing new insights on the heritability of this phenotype. All our summary statistics are publicly available to help the community to better characterize the molecular mechanisms leading to parent-of-origin effects and their implications for human health.

5 citations

Journal ArticleDOI
TL;DR: In this paper , a weighted gene co-expression network analysis (WGCNA) was conducted to obtain the expression profiles of 35 breast cell lines from the HMS LINCS Database.
Abstract: Microgravity changes the gene expression pattern in various cell types. This study focuses on the breast cancer cell lines MCF-7 (less invasive) and MDA-MB-231 (triple-negative, highly invasive). The cells were cultured for 14 days under simulated microgravity (s-µg) conditions using a random positioning machine (RPM). We investigated cytoskeletal and extracellular matrix (ECM) factors as well as focal adhesion (FA) and the transmembrane proteins involved in different cellular signaling pathways (MAPK, PAM and VEGF). The mRNA expressions of 24 genes of interest (TUBB, ACTB, COL1A1, COL4A5, LAMA3, ITGB1, CD44, VEGF, FLK1, EGFR, SRC, FAK1, RAF1, AKT1, ERK1, MAPK14, MAP2K1, MTOR, RICTOR, VCL, PXN, CDKN1, CTNNA1 and CTNNB1) were determined by quantitative real-time PCR (qPCR) and studied using STRING interaction analysis. Histochemical staining was carried out to investigate the morphology of the adherent cells (ADs) and the multicellular spheroids (MCSs) after RPM exposure. To better understand this experimental model in the context of breast cancer patients, a weighted gene co-expression network analysis (WGCNA) was conducted to obtain the expression profiles of 35 breast cell lines from the HMS LINCS Database. The qPCR-verified genes were searched in the mammalian phenotype database and the human genome-wide association studies (GWAS) Catalog. The results demonstrated the positive association between the real metastatic microtumor environment and MCSs with respect to the extracellular matrix, cytoskeleton, morphology, different cellular signaling pathway key proteins and several other components. In summary, the microgravity-engineered three-dimensional MCS model can be utilized to study breast cancer cell behavior and to assess the therapeutic efficacies of drugs against breast cancer in the future.

1 citations

References
More filters
Journal ArticleDOI
04 Mar 2011-Cell
TL;DR: Recognition of the widespread applicability of these concepts will increasingly affect the development of new means to treat human cancer.

51,099 citations

Journal ArticleDOI
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

9,415 citations

Journal ArticleDOI
TL;DR: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility, and for the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
Abstract: Background: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format. Findings: To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O √ n -time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). Conclusions: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

7,038 citations

Journal ArticleDOI
TL;DR: The UK Biobank is described, a large population-based prospective study, established to allow investigation of the genetic and non-genetic determinants of the diseases of middle and old age.
Abstract: Cathie Sudlow and colleagues describe the UK Biobank, a large population-based prospective study, established to allow investigation of the genetic and non-genetic determinants of the diseases of middle and old age.

6,114 citations

Journal ArticleDOI
TL;DR: Physical structure is known to contribute to the appearance of bird plumage through structural color and specular reflection, but a third mechanism, structural absorption, leads to low reflectance and super black color in birds of paradise feathers.
Abstract: Many studies have shown how pigments and internal nanostructures generate color in nature. External surface structures can also influence appearance, such as by causing multiple scattering of light (structural absorption) to produce a velvety, super black appearance. Here we show that feathers from five species of birds of paradise (Aves: Paradisaeidae) structurally absorb incident light to produce extremely low-reflectance, super black plumages. Directional reflectance of these feathers (0.05-0.31%) approaches that of man-made ultra-absorbent materials. SEM, nano-CT, and ray-tracing simulations show that super black feathers have titled arrays of highly modified barbules, which cause more multiple scattering, resulting in more structural absorption, than normal black feathers. Super black feathers have an extreme directional reflectance bias and appear darkest when viewed from the distal direction. We hypothesize that structurally absorbing, super black plumage evolved through sensory bias to enhance the perceived brilliance of adjacent color patches during courtship display.

5,916 citations

Related Papers (5)