Author
Emily S. Boja
Other affiliations: Case Western Reserve University
Bio: Emily S. Boja is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Proteomics & Proteogenomics. The author has an hindex of 35, co-authored 72 publications receiving 6098 citations. Previous affiliations of Emily S. Boja include Case Western Reserve University.
Papers published on a yearly basis
Papers
More filters
••
Vanderbilt University1, Pacific Northwest National Laboratory2, Washington University in St. Louis3, Fred Hutchinson Cancer Research Center4, Icahn School of Medicine at Mount Sinai5, National Institutes of Health6, Massachusetts Institute of Technology7, Harvard University8, Georgetown University9, Johns Hopkins University10, Leidos11, Memorial Sloan Kettering Cancer Center12, National Institute of Standards and Technology13, New York University14, Stanford University15, University of Chicago16, University of North Carolina at Chapel Hill17, University of Washington18, Virginia Tech19
TL;DR: Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.
Abstract: Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants. Messenger RNA transcript abundance did not reliably predict protein abundance differences between tumours. Proteomics identified five proteomic subtypes in the TCGA cohort, two of which overlapped with the TCGA 'microsatellite instability/CpG island methylation phenotype' transcriptomic subtype, but had distinct mutation, methylation and protein expression patterns associated with different clinical outcomes. Although copy number alterations showed strong cis- and trans-effects on mRNA abundance, relatively few of these extend to the protein level. Thus, proteomics data enabled prioritization of candidate driver genes. The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels; proteomics data highlighted potential 20q candidates, including HNF4A (hepatocyte nuclear factor 4, alpha), TOMM34 (translocase of outer mitochondrial membrane 34) and SRC (SRC proto-oncogene, non-receptor tyrosine kinase). Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.
1,183 citations
••
TL;DR: A view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC is provided.
728 citations
••
ETH Zurich1, Northeastern University2, University of Georgia3, Macquarie University4, Stanford University5, National Institutes of Health6, Boston University7, Scripps Research Institute8, University of Maryland, College Park9, University of Pennsylvania10, University of Wisconsin-Madison11, Harvard University12, Memorial Sloan Kettering Cancer Center13, University of Illinois at Urbana–Champaign14, University of Salzburg15, University of Southern Denmark16, Northwestern University17, Massachusetts Institute of Technology18, University of California, San Francisco19, University of California, Los Angeles20, Royal Institute of Technology21, University of Washington22, Princeton University23, Saint Mary's College of California24, Salk Institute for Biological Studies25, Genentech26, University of Hamburg27, Yale University28, Cedars-Sinai Medical Center29, University of California, Berkeley30, Ohio State University31, University of Pittsburgh32, Baylor College of Medicine33
TL;DR: This work frames central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today, and uses this framework to assess existing data and ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?"
Abstract: Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function O
516 citations
••
TL;DR: The first proteogenomic characterization of hepatitis B virus-related hepatocellular carcinoma using paired tumor and adjacent liver tissues from 159 patients provides a valuable resource that significantly expands the knowledge of HBV-related HCC and may eventually benefit clinical practice.
509 citations
••
Broad Institute1, Eli Lilly and Company2, University of Victoria3, Institute for Systems Biology4, University of Washington5, ETH Zurich6, University of California, San Francisco7, University of South Florida8, Vanderbilt University9, Pacific Northwest National Laboratory10, Food and Drug Administration11, Pfizer12, Fred Hutchinson Cancer Research Center13, Purdue University14, National Institutes of Health15, Centers for Disease Control and Prevention16, Johns Hopkins University17, Korea University18, Harvard University19, Emory University20, Washington University in St. Louis21, Leidos22, University of Texas Health Science Center at San Antonio23
TL;DR: A workshop was held at the National Institutes of Health with representatives from the multiple communities developing and employing targeted MS assays and defined three tiers of assays distinguished by their performance and extent of analytical characterization.
476 citations
Cited by
More filters
••
Sage Bionetworks1, Autonomous University of Barcelona2, University of Amsterdam3, City University of Hong Kong4, Netherlands Cancer Institute5, Swiss Institute of Bioinformatics6, University of Texas MD Anderson Cancer Center7, École Polytechnique Fédérale de Lausanne8, Institut Gustave Roussy9, Hospital Clínico San Carlos10, Oregon Health & Science University11, Paris Descartes University12, University of Lausanne13, Katholieke Universiteit Leuven14
TL;DR: An international consortium dedicated to large-scale data sharing and analytics across expert groups is formed, showing marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features.
Abstract: Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features: CMS1 (microsatellite instability immune, 14%), hypermutated, microsatellite unstable and strong immune activation; CMS2 (canonical, 37%), epithelial, marked WNT and MYC signaling activation; CMS3 (metabolic, 13%), epithelial and evident metabolic dysregulation; and CMS4 (mesenchymal, 23%), prominent transforming growth factor-β activation, stromal invasion and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intratumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC-with clear biological interpretability-and the basis for future clinical stratification and subtype-based targeted interventions.
3,351 citations
01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.
2,187 citations
••
TL;DR: It is concluded that transcript levels by themselves are not sufficient to predict protein levels in many scenarios and to thus explain genotype-phenotype relationships and that high-quality data quantifying different levels of gene expression are indispensable for the complete understanding of biological processes.
1,996 citations
••
TL;DR: In the 2019 update, WebGestalt supports 12 organisms, 342 gene identifiers and 155 175 functional categories, as well as user-uploaded functional databases and has completely redesigned result visualizations and user interfaces to improve user-friendliness and to provide multiple types of interactive and publication-ready figures.
Abstract: WebGestalt is a popular tool for the interpretation of gene lists derived from large scale -omics studies. In the 2019 update, WebGestalt supports 12 organisms, 342 gene identifiers and 155 175 functional categories, as well as user-uploaded functional databases. To address the growing and unique need for phosphoproteomics data interpretation, we have implemented phosphosite set analysis to identify important kinases from phosphoproteomics data. We have completely redesigned result visualizations and user interfaces to improve user-friendliness and to provide multiple types of interactive and publication-ready figures. To facilitate comprehension of the enrichment results, we have implemented two methods to reduce redundancy between enriched gene sets. We introduced a web API for other applications to get data programmatically from the WebGestalt server or pass data to WebGestalt for analysis. We also wrapped the core computation into an R package called WebGestaltR for users to perform analysis locally or in third party workflows. WebGestalt can be freely accessed at http://www.webgestalt.org.
1,789 citations