scispace - formally typeset
Search or ask a question
Journal ArticleDOI

CancerTracer: a curated database for intrapatient tumor heterogeneity.

08 Nov 2019-Nucleic Acids Research (Oxford University Press (OUP))-Vol. 48
TL;DR: A manually curated database designed to track and characterize the evolutionary trajectories of tumor growth in individual patients, it is hoped that CancerTracer will significantly improve the understanding of the evolutionary histories of tumors, and may facilitate the identification of predictive biomarkers for personalized cancer therapies.
Abstract: Comprehensive genomic analyses of cancers have revealed substantial intrapatient molecular heterogeneities that may explain some instances of drug resistance and treatment failures. Examination of the clonal composition of an individual tumor and its evolution through disease progression and treatment may enable identification of precise therapeutic targets for drug design. Multi-region and single-cell sequencing are powerful tools that can be used to capture intratumor heterogeneity. Here, we present a database we've named CancerTracer (http://cailab.labshare.cn/cancertracer): a manually curated database designed to track and characterize the evolutionary trajectories of tumor growth in individual patients. We collected over 6000 tumor samples from 1548 patients corresponding to 45 different types of cancer. Patient-specific tumor phylogenetic trees were constructed based on somatic mutations or copy number alterations identified in multiple biopsies. Using the structured heterogeneity data, researchers can identify common driver events shared by all tumor regions, and the heterogeneous somatic events present in different regions of a tumor of interest. The database can also be used to investigate the phylogenetic relationships between primary and metastatic tumors. It is our hope that CancerTracer will significantly improve our understanding of the evolutionary histories of tumors, and may facilitate the identification of predictive biomarkers for personalized cancer therapies.
Citations
More filters
25 May 2011
TL;DR: A quantitative analysis of the timing of the genetic evolution of pancreatic cancer was performed, indicating at least a decade between the occurrence of the initiating mutation and the birth of the parental, non-metastatic founder cell.
Abstract: Metastasis, the dissemination and growth of neoplastic cells in an organ distinct from that in which they originated, is the most common cause of death in cancer patients. This is particularly true for pancreatic cancers, where most patients are diagnosed with metastatic disease and few show a sustained response to chemotherapy or radiation therapy. Whether the dismal prognosis of patients with pancreatic cancer compared to patients with other types of cancer is a result of late diagnosis or early dissemination of disease to distant organs is not known. Here we rely on data generated by sequencing the genomes of seven pancreatic cancer metastases to evaluate the clonal relationships among primary and metastatic cancers. We find that clonal populations that give rise to distant metastases are represented within the primary carcinoma, but these clones are genetically evolved from the original parental, non-metastatic clone. Thus, genetic heterogeneity of metastases reflects that within the primary carcinoma. A quantitative analysis of the timing of the genetic evolution of pancreatic cancer was performed, indicating at least a decade between the occurrence of the initiating mutation and the birth of the parental, non-metastatic founder cell. At least five more years are required for the acquisition of metastatic ability and patients die an average of two years thereafter. These data provide novel insights into the genetic features underlying pancreatic cancer progression and define a broad time window of opportunity for early detection to prevent deaths from metastatic disease.

2,019 citations

Journal ArticleDOI
TL;DR: A primary focus of this update was integration with crowdsourced efforts, leveraging the Drug Target Commons for community-contributed interaction data, Wikidata to facilitate term normalization, and export to NDEx for drug-gene interaction network representations.
Abstract: The Drug-Gene Interaction Database (DGIdb, www.dgidb.org) is a web resource that provides information on drug-gene interactions and druggable genes from publications, databases, and other web-based sources. Drug, gene, and interaction data are normalized and merged into conceptual groups. The information contained in this resource is available to users through a straightforward search interface, an application programming interface (API), and TSV data downloads. DGIdb 4.0 is the latest major version release of this database. A primary focus of this update was integration with crowdsourced efforts, leveraging the Drug Target Commons for community-contributed interaction data, Wikidata to facilitate term normalization, and export to NDEx for drug-gene interaction network representations. Seven new sources have been added since the last major version release, bringing the total number of sources included to 41. Of the previously aggregated sources, 15 have been updated. DGIdb 4.0 also includes improvements to the process of drug normalization and grouping of imported sources. Other notable updates include the introduction of a more sophisticated Query Score for interaction search results, an updated Interaction Score, the inclusion of interaction directionality, and several additional improvements to search features, data releases, licensing documentation and the application framework.

318 citations


Cites background from "CancerTracer: a curated database fo..."

  • ...Wang,C., Yang,J., Luo,H., Wang,K., Wang,Y., Xiao,Z.-X., Tao,X., Jiang,H. and Cai,H. (2020) CancerTracer: a curated database for intrapatient tumor heterogeneity....

    [...]

  • ...Existing data clients include GeneCards (8), BioGPS (9), CancerTracer (10), Gene4Denovo (11), SLBioDP (12), TargetDB (13) and OncoGemini (14), among others....

    [...]

Journal ArticleDOI
TL;DR: This issue contains three breakthrough articles: AntiBodies Chemically Defined curates antibody sequences and their cognate antigens; SCOP returns with a new schema and breaks away from a purely hierarchical structure; while the new Alliance of Genome Resources brings together a number of Model Organism databases to pool knowledge and tools.
Abstract: The 2020 Nucleic Acids Research Database Issue contains 148 papers spanning molecular biology. They include 59 papers reporting on new databases and 79 covering recent changes to resources previously published in the issue. A further ten papers are updates on databases most recently published elsewhere. This issue contains three breakthrough articles: AntiBodies Chemically Defined (ABCD) curates antibody sequences and their cognate antigens; SCOP returns with a new schema and breaks away from a purely hierarchical structure; while the new Alliance of Genome Resources brings together a number of Model Organism databases to pool knowledge and tools. Major returning nucleic acid databases include miRDB and miRTarBase. Databases for protein sequence analysis include CDD, DisProt and ELM, alongside no fewer than four newcomers covering proteins involved in liquid-liquid phase separation. In metabolism and signaling, Pathway Commons, Reactome and Metabolights all contribute papers. PATRIC and MicroScope update in microbial genomes while human and model organism genomics resources include Ensembl, Ensembl genomes and UCSC Genome Browser. Immune-related proteins are covered by updates from IPD-IMGT/HLA and AFND, as well as newcomers VDJbase and OGRDB. Drug design is catered for by updates from the IUPHAR/BPS Guide to Pharmacology and the Therapeutic Target Database. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been revised, updating 305 entries, adding 65 new resources and eliminating 125 discontinued URLs; so bringing the current total to 1637 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.

67 citations


Cites background from "CancerTracer: a curated database fo..."

  • ...As ever, cancer databases are well-represented with new contributions to the field including CancerTracer (81), a resource for studying and intrapatient tumor heterogeneity that features data from 1500 patients, including patient-specific tumor phylogenetic trees, and DNMIVD (82) which has a wide range of functions regarding links between DNA methylation and cancer....

    [...]

Journal ArticleDOI
TL;DR: FastClone improves over existing methods by allowing the deconvolution of subclones that have independent copy number variation events within the same chromosome regions, which will allow its application to large-scale data and clinical data, and facilitate personalized medicine in cancers.
Abstract: Dissecting tumor heterogeneity is a key to understanding the complex mechanisms underlying drug resistance in cancers. The rich literature of pioneering studies on tumor heterogeneity analysis spurred a recent community-wide benchmark study that compares diverse modeling algorithms. Here we present FastClone, a top-performing algorithm in accuracy in this benchmark. FastClone improves over existing methods by allowing the deconvolution of subclones that have independent copy number variation events within the same chromosome regions. We characterize the behavior of FastClone in identifying subclones using stage III colon cancer primary tumor samples as well as simulated data. It achieves approximately 100-fold acceleration in computation for both simulated and patient data. The efficacy of FastClone will allow its application to large-scale data and clinical data, and facilitate personalized medicine in cancers. Multiple algorithms exist for predicting heterogeneity and clonal architecture from the bulk sequencing of tumor tissue. Here, the authors report on an algorithm, FastClone, which was developed from a DREAM challenge and show that FastClone can accurately predict clonality in simulated data and data from colon cancer.

32 citations

Journal ArticleDOI
TL;DR: In this article, the authors discuss how combinations and fusions of different -omic workflows on a single cell level can be used to examine cellular phenotypes, immune effector functions, and even dynamic changes, such as metabolomic state of different cells in a sample or even in a defined tissue location.
Abstract: High throughput single cell multi-omics platforms, such as mass cytometry (cytometry by time-of-flight; CyTOF), high dimensional imaging (>6 marker; Hyperion, MIBIscope, CODEX, MACSima) and the recently evolved genomic cytometry (Citeseq or REAPseq) have enabled unprecedented insights into many biological and clinical questions, such as hematopoiesis, transplantation, cancer, and autoimmunity. In synergy with constantly adapting new single-cell analysis approaches and subsequent accumulating big data collections from these platforms, whole atlases of cell types and cellular and sub-cellular interaction networks are created. These atlases build an ideal scientific discovery environment for reference and data mining approaches, which often times reveals new cellular disease networks. In this review we will discuss how combinations and fusions of different -omic workflows on a single cell level can be used to examine cellular phenotypes, immune effector functions, and even dynamic changes, such as metabolomic state of different cells in a sample or even in a defined tissue location. We will touch on how pre-print platforms help in optimization and reproducibility of workflows, as well as community outreach. We will also shortly discuss how leveraging single cell multi-omic approaches can be used to accelerate cellular biomarker discovery during clinical trials to predict response to therapy, follow responsive cell types, and define novel druggable target pathways. Single cell proteome approaches already have changed how we explore cellular mechanism in disease and during therapy. Current challenges in the field are how we share these disruptive technologies to the scientific communities while still including new approaches, such as genomic cytometry and single cell metabolomics.

26 citations

References
More filters
Journal ArticleDOI
04 Mar 2011-Cell
TL;DR: Recognition of the widespread applicability of these concepts will increasingly affect the development of new means to treat human cancer.

51,099 citations


"CancerTracer: a curated database fo..." refers background in this paper

  • ...The process of conversion from a normal to a malignant cell is known to occur through the sequential accumulation of alterations in oncogenes and tumor suppressor genes (4,5)....

    [...]

Journal ArticleDOI
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

32,980 citations


Additional excerpts

  • ...All terms pairs with Kappa similarity >0.3 were connected and visualized by Cytoscape (60)....

    [...]

  • ...3 were connected and visualized by Cytoscape (60)....

    [...]

Book
13 Aug 2009
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

29,504 citations

Journal ArticleDOI
TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.
Abstract: To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

11,571 citations


"CancerTracer: a curated database fo..." refers methods in this paper

  • ...To systematically annotate the functional impact of somatic mutations in proteins, we employed ANNOVAR (46) and PolyPhen2 (47) to perform the analysis....

    [...]

  • ...The annotations from SIFT, PolyPhen2, etc., as well as protein change, were displayed in the table....

    [...]

  • ...About 88.78% mutations were successfully annotated by ANNOVAR or PolyPhen2, and 84.62% of non-synonymous point mutations were annotated by SIFT and/or PolyPhen2....

    [...]

Journal ArticleDOI
TL;DR: The ANNOVAR tool to annotate single nucleotide variants and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP is developed.
Abstract: High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

10,461 citations


"CancerTracer: a curated database fo..." refers methods in this paper

  • ...To systematically annotate the functional impact of somatic mutations in proteins, we employed ANNOVAR (46) and PolyPhen2 (47) to perform the analysis....

    [...]

  • ...We also provide systematic annotation on the curated somatic mutations by ANNOVAR....

    [...]

  • ...About 88.78% mutations were successfully annotated by ANNOVAR or PolyPhen2, and 84.62% of non-synonymous point mutations were annotated by SIFT and/or PolyPhen2....

    [...]