scispace - formally typeset

Posted ContentDOI

Cross-species identification of cancer-resistance associated genes uncovers their relevance to human cancer risk

21 May 2021-bioRxiv (Cold Spring Harbor Laboratory)-

Abstract: Cancer is an evolutionarily conserved disease that occurs in a wide variety of species. We applied a comparative genomics approach to systematically characterize the genes whose conservation levels significantly correlates positively (PC) or negatively (NC) with a broad spectrum of cancer-resistance estimates, computed across almost 200 vertebrate species. PC genes are enriched in pathways relevant to tumor suppression including cell cycle, DNA repair, and immune response, while NC genes are enriched with a host of metabolic pathways. The conservation levels of the PC and NC genes in a species serve to build the first genomics-based predictor of its cancer resistance score. We find that PC genes are less tolerant to loss of function (LoF) mutations, are enriched in cancer driver genes and are associated with germline mutations that increase human cancer risk. Furthermore, their expression levels are associated with lifetime cancer risk across human tissues. Finally, their knockout in mice results in increased cancer incidence. In sum, we find that many genes associated with cancer resistance across species are implicated in human cancers, pointing to several additional candidate genes that may have a functional role in human cancer.
Topics: Cancer (59%), Candidate gene (54%), Comparative genomics (52%), Gene (50%), DNA repair (50%)

Content maybe subject to copyright    Report

Cross-species identification of cancer-resistance associated genes
uncovers their relevance to human cancer risk
Nishanth Ulhas Nair
1,*,#
, Kuoyuan Cheng
1,2,*,#
, Lamis Naddaf
3,*
, Elad Sharon
3
, Lipika R. Pal
1
, Padma
S. Rajagopal
4
, Irene Unterman
3
, Kenneth Aldape
5
, Sridhar Hannenhalli
1
, Chi-Ping Day
6
, Yuval
Tabach
3,#
, Eytan Ruppin
1,#
1. Cancer Data Science Laboratory (CDSL), National Cancer Institute (NCI), National Institutes of
Health (NIH), Bethesda, MD, USA.
2. Center for Bioinformatics and Computational Biology, University of Maryland, College Park,
MD, USA.
3. Department of Developmental Biology and Cancer Research, Institute of Medical Research -
Israel-Canada, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel.
4. Section of Hematology/Oncology, Department of Medicine, The University of Chicago,
Chicago, IL, USA.
5. Laboratory of Pathology, National Cancer Institute (NCI), National Institutes of Health (NIH),
Bethesda, MD, USA.
6. Laboratory of Cancer Biology and Genetics, National Cancer Institute (NCI), National
Institutes of Health (NIH), Bethesda, MD, USA.
* These authors contributed equally to this work as co-first authors.
# co-corresponding authors (nishanth.nair@nih.gov, kycheng@terpmail.umd.edu,
tabachy@gmail.com, eytan.ruppin@nih.gov)
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted May 21, 2021. ; https://doi.org/10.1101/2021.05.19.444895doi: bioRxiv preprint

ABSTRACT
Cancer is an evolutionarily conserved disease that occurs in a wide variety of species. We
applied a comparative genomics approach to systematically characterize the genes whose
conservation levels significantly correlates positively (PC) or negatively (NC) with a broad
spectrum of cancer-resistance estimates, computed across almost 200 vertebrate species. PC
genes are enriched in pathways relevant to tumor suppression including cell cycle, DNA repair,
and immune response, while NC genes are enriched with a host of metabolic pathways. The
conservation levels of the PC and NC genes in a species serve to build the first genomics-based
predictor of its cancer resistance score. We find that PC genes are less tolerant to loss of
function (LoF) mutations, are enriched in cancer driver genes and are associated with germline
mutations that increase human cancer risk. Furthermore, their expression levels are associated
with lifetime cancer risk across human tissues. Finally, their knockout in mice results in
increased cancer incidence. In sum, we find that many genes associated with cancer resistance
across species are implicated in human cancers, pointing to several additional candidate genes
that may have a functional role in human cancer.
INTRODUCTION
Animal species are known to have dramatic differences in their cancer rates and lifespans, and
several animals are considered cancer resistant while others are considered to be cancer prone
(Gorbunova
et al.
2014; Albuquerque
et al.
2018). Studying the genomic underpinnings of these
differences across various branches of life may provide insights into cancer development and
cancer prevention/treatment options in humans (Seluanov
et al.
2018).
The multistage carcinogenesis model states that “individual cells become cancerous
after accumulating a specific number of mutational hits” (Seluanov
et al.
2018; Nordling, 1953).
Based on this model, larger (and longer-living) animals are expected to have higher cancer
incidence as they have more stem cell divisions overall, resulting in a higher likelihood of
producing and propagating carcinogenic mutations. For humans, it has been shown that the
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted May 21, 2021. ; https://doi.org/10.1101/2021.05.19.444895doi: bioRxiv preprint

risks of cancer development across different tissue types are correlated with their
corresponding estimated number of lifetime stem cell divisions (Tomasetti
et al.
2015 and
2017); consistent with that, human cancer risk is indeed correlated with body height (Khankari
et al.
2016). However, cancer risk does not correlate with body size across species, a
contradiction known as Peto’s paradox (Peto, 1947; Tollis
et al.
, 2017; Seluanov
et al.
2018). For
example, humans do not have higher cancer risk than mice despite having thousands of times
more cells (Lipman
et al.
2004; Szymanska
et al.
2014; Ikeno
et al.
2009). More drastically, the
cancer-resistant bowhead whale (Keane
et al.
, 2016) can weigh 100 tons, live for over 200 years
(George
et al.
, 1999) and have millions times more cells than mice. It follows that different
species must have evolved different cancer resistance mechanisms to fit their lifestyles,
modifying the “baseline” probability of malignant transformation determined by body size,
lifespan, and tissue stem cell division (see Supp. Note for a short review of such mechanisms).
Numerous studies have adopted comparative genomics approaches to understand the
evolution of cancer resistance mechanisms across mammals. Some have focused on known
human cancer genes and their homologs. For example, Vicens and Posada (2018) found that
genes related to DNA repair and T cell proliferation have evolved under positive selection in
mammals. Tollis
et al.
(2020) found that the number of paralogs of human cancer genes across
mammals is positively correlated with the species’ lifespan, but not body size. Vazquez and
Lynch (2021) reported wide-spread tumor suppressor gene (TSG) duplications across both large
and small Afrotherian species. Other studies focused on body size and longevity, yielding some
insights into Peto’s paradox. Kowalczyk
et al.
(2020) analyzed genes whose evolutionary rates
across mammals correlate with body size and lifespan and discovered cancer resistance-related
genes that are under increased evolutionary constraints in larger and longer-living mammals.
Ferris
et al.
(2018) identified regions with accelerated evolution in specific mammals, including
several cancer resistant species, which provided some insights on the cancer resistance
mechanisms they have developed.
Unlike previous studies that focused exclusively on mammals, here we perform a
comprehensive genome-wide comparative study aimed at identifying genes related to cancer
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted May 21, 2021. ; https://doi.org/10.1101/2021.05.19.444895doi: bioRxiv preprint

resistance across a wide range of vertebrate species. To this end, we estimated the protein
conservation scores across species including mammals, birds and fish, identifying genes whose
conservation levels are associated with cancer resistance estimates. We then use these cancer-
resistance associated genes to build the first genomics-based predictor of cancer resistance for
any species. We show that the biological processes associated with cancer resistance vary
across taxonomic groups (classes and orders of species), pointing to the diversity in the
evolutionary paths and mechanisms for resisting cancer. Finally, the genes identified from this
phylogenetic analysis are enriched for cancer driver genes and in genes associated with cancer
risk in humans. These results show that a comparative genomic approach can help identify
genes involved in human cancers.
RESULTS
Computing
gene conservation
and
species cancer-resistance
estimates
We computed a matrix (Tabach
et al.
Nature 2013; Tabach
et al.
MSB 2013) of gene
conservation scores (phylogenetic profiles) across 240 species for which we had phenotypic
information in the AnAge database (Tacutu
et al.
2018) and sequence information from UniProt
(UnitProt Consortium, 2021), Refseq (O’Leary
et al.
2016), Keane
et al.
(2015), and NCBI (Sayers
et al.
2021) databases. To do this, the protein sequence similarity between each gene in the
genome of a reference species and its orthologs in each of the rest of the species (termed
phylogenetic profiling; Pellegrini
et al.
1999) was measured using the bit score computed with
BLASTP (Altschul
et al.
1990). The BLASTP bit scores were normalized by their gene length
(Tabach
et al.
Nature 2013; Sherill-Rofe
et al.
2019) and then rank-normalized across all genes
within each species to control for the evolutionary distance between the reference and each
species (Methods). These rank-normalized values range from 0 to 1, with higher values
corresponding to higher conservation levels. This method is termed rank-based phylogenetic
profiling. We primarily focused on the human as the reference species (Braun
et al.
, 2020) as
we are interested in making our findings relevant to human cancers. However, we
demonstrated that our conclusions are robust to the choice of reference (Methods, Supp.
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted May 21, 2021. ; https://doi.org/10.1101/2021.05.19.444895doi: bioRxiv preprint

Note), largely because the normalization effectively removes dependency on phylogenetic
distance.
Since the cancer incidence rates of most species are largely unknown, we used two
proxy cancer-resistance estimates that have been proposed in the literature
MTLAW
and
MLCAW
. MLTAW assumes that the level of cancer resistance in a given species needs to roughly
counteract its risk of cancer development due to cell division, which is proportional to ML
6
×
AW, where ML denotes the species maximum longevity and AW denotes its adult weight (Peto
et al.
1977, 2015; Vazquez
et al.
2021; Methods). MLCAW considers the well-established
correlation between lifespan and body weight (AW) across many species (Speakman, 2005) and
thus regresses out the species AW from its ML (Methods). We computed MLTAW and MLCAW
for 193 out of the 240 species for which both ML and AW data was publicly available (Table S1,
Methods). These 193 species are from multiple Vertebrata classes, including Mammalia
(mammals, n=108), Aves (birds, n=55), Teleostei (teleost fishes, n=18), and Reptilia (reptiles,
n=7).
Genes associated with cancer resistance are enriched in cell cycle, DNA repair, immune
response, and different metabolic pathways
For each gene, we computed the Pearson correlation coefficient between its conservation
scores and the cancer-resistance estimates (MLTAW and MLCAW) across all species (Tables
S2A,B; Methods). We then computed the pathway enrichment of the positive and of the
negatively correlated genes (termed PC or NC genes, respectively) (Tables S3A,B; Methods). PC
genes correlated with either the MLCAW (
Fig. 1
) and MLTAW measures (Fig. S1) are enriched
for cell cycle, immune response, DNA repair, and transcription regulation pathways (FDR<0.1),
indicating that many genes in these pathways are more conserved in the relatively long-lived
cancer-resistant species. NC genes are enriched for a diverse range of metabolic pathways
(FDR<0.1,
Figs. 1
,S1).
was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (whichthis version posted May 21, 2021. ; https://doi.org/10.1101/2021.05.19.444895doi: bioRxiv preprint

References
More filters

Journal ArticleDOI
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

26,320 citations


Journal ArticleDOI
TL;DR: Overall cancer incidence trends are stable in women, but declining by 3.1% per year in men, much of which is because of recent rapid declines in prostate cancer diagnoses, and brain cancer has surpassed leukemia as the leading cause of cancer death among children and adolescents.
Abstract: Each year, the American Cancer Society estimates the numbers of new cancer cases and deaths that will occur in the United States in the current year and compiles the most recent data on cancer incidence, mortality, and survival. Incidence data were collected by the National Cancer Institute (Surveillance, Epidemiology, and End Results [SEER] Program), the Centers for Disease Control and Prevention (National Program of Cancer Registries), and the North American Association of Central Cancer Registries. Mortality data were collected by the National Center for Health Statistics. In 2016, 1,685,210 new cancer cases and 595,690 cancer deaths are projected to occur in the United States. Overall cancer incidence trends (13 oldest SEER registries) are stable in women, but declining by 3.1% per year in men (from 2009-2012), much of which is because of recent rapid declines in prostate cancer diagnoses. The cancer death rate has dropped by 23% since 1991, translating to more than 1.7 million deaths averted through 2012. Despite this progress, death rates are increasing for cancers of the liver, pancreas, and uterine corpus, and cancer is now the leading cause of death in 21 states, primarily due to exceptionally large reductions in death from heart disease. Among children and adolescents (aged birth-19 years), brain cancer has surpassed leukemia as the leading cause of cancer death because of the dramatic therapeutic advances against leukemia. Accelerating progress against cancer requires both increased national investment in cancer research and the application of existing cancer control knowledge across all segments of the population.

13,496 citations


Journal ArticleDOI
David L. Wheeler1, Deanna M. Church1, Ron Edgar1, Scott Federhen1  +9 moreInstitutions (1)
TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.
Abstract: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.

8,599 citations


Journal ArticleDOI
TL;DR: The Genotype-Tissue Expression (GTEx) project is described, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
Abstract: Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.

4,930 citations


Journal ArticleDOI
TL;DR: The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA with a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages.
Abstract: The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

4,022 citations