scispace - formally typeset
Search or ask a question
Author

Sunghee Woo

Bio: Sunghee Woo is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Proteogenomics & Proteome. The author has an hindex of 7, co-authored 7 publications receiving 876 citations.

Papers
More filters
Journal ArticleDOI
28 Jul 2016-Cell
TL;DR: A view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC is provided.

728 citations

01 Jun 2016
TL;DR: In this article, a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer was provided, such as how different copy-number alterna-tions in the Proteome, the proteins associated with chromosomal instability, the sets of signalingpathways that diverse genome rearrangements converge on, and the ones associated with short overall survival.
Abstract: To provide a detailed analysis of the molecular com-ponents and underlying mechanisms associatedwith ovarian cancer, we performed a comprehensivemass-spectrometry-based proteomic characteriza-tion of 174 ovarian tumors previously analyzed byThe Cancer Genome Atlas (TCGA), of which 169were high-grade serous carcinomas (HGSCs). Inte-grating our proteomic measurements with thegenomic data yielded a number of insights into dis-ease, such as how different copy-number alterna-tionsinfluencetheproteome,theproteinsassociatedwith chromosomal instability, the sets of signalingpathways that diverse genome rearrangementsconverge on, and the ones most associated withshort overall survival. Specific protein acetylationsassociated with homologous recombination defi-ciency suggest a potential means for stratifying pa-tients for therapy. In addition to providing a valuableresource,thesefindingsprovideaviewofhowtheso-maticgenomedrivesthecancerproteomeandasso-ciations between protein and post-translationalmodification levels and clinical outcomes in HGSC.

160 citations

Journal ArticleDOI
TL;DR: This paper construction of a compact database that contains all useful information expressed in RNA-seq reads is presented, highlighting the usefulness of transcript + proteomic integration for improved genome annotations.
Abstract: The advent of inexpensive RNA-seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS-based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our paper addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads. Applying our method to cumulative C. elegans data reduced 496.2 GB of aligned RNA-seq SAM files to 410 MB of splice graph database written in FASTA format. This corresponds to 1000× compression of data size, without loss of sensitivity. We performed a proteogenomics study using the custom data set, using a completely automated pipeline, and identified a total of 4044 novel events, including 215 novel genes, 808 novel exons, 12 alternative splicings, 618 gene-boundary corrections, 245 exon-boundary changes, 938 frame shifts, 1166 reverse strands, and 42 translated UTRs. Our results highlight the usefulness of transcript + proteomic integration for improved genome annotations.

116 citations

Journal ArticleDOI
TL;DR: A discussion of applying different strategies relating to large database search, FDR (false discovery rate) ‐based error control, and their implication to cancer proteogenomics extends and develops the idea of a unified genomic variant database that can be searched by any MS sample.
Abstract: Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular subtyping of cancers, understanding cancer progression, and the discovery of novel biomarkers. The advances of genomics technologies (whole-genome exome, and transcript sequencing, collectively referred to as NGS (next-generation sequencing)) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome translated portion of aberrant genes using only genomic approaches. Combination of proteomic and genomic technologies are increasingly being employed. Various strategies have been employed to allow the usage of large-scale NGS data for conventional MS/MS searches. This paper provides a discussion of applying different strategies relating to large database search, and FDR (false discovery rate) -based error control, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any MS sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database that contained 2787062 novel splice junctions, 38 464 deletions, 1 105 insertions, and 182 302 substitutions. Proteomic data from a single ovarian carcinoma sample (439 858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65 578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and nonsample-recruited mutations, which emphasize the strength of our approach.

62 citations


Cited by
More filters
Journal ArticleDOI
Klaus F. X. Mayer, Jane Rogers, Jaroslav Doležel1, Curtis J. Pozniak2, Kellye Eversole, Catherine Feuillet3, Bikram S. Gill4, Bernd Friebe4, Adam J. Lukaszewski5, Pierre Sourdille6, Takashi R. Endo7, M. Kubaláková1, Jarmila Číhalíková1, Zdeňka Dubská1, Jan Vrána1, Romana Šperková1, Hana Šimková1, Melanie Febrer8, Leah Clissold, Kirsten McLay, Kuldeep Singh9, Parveen Chhuneja9, Nagendra K. Singh10, Jitendra P. Khurana11, Eduard Akhunov4, Frédéric Choulet6, Adriana Alberti, Valérie Barbe, Patrick Wincker, Hiroyuki Kanamori12, Fuminori Kobayashi12, Takeshi Itoh12, Takashi Matsumoto12, Hiroaki Sakai12, Tsuyoshi Tanaka12, Jianzhong Wu12, Yasunari Ogihara13, Hirokazu Handa12, P. Ron Maclachlan2, Andrew G. Sharpe14, Darrin Klassen14, David Edwards, Jacqueline Batley, Odd-Arne Olsen, Simen Rød Sandve15, Sigbjørn Lien15, Burkhard Steuernagel16, Brande B. H. Wulff16, Mario Caccamo, Sarah Ayling, Ricardo H. Ramirez-Gonzalez, Bernardo J. Clavijo, Jonathan M. Wright, Matthias Pfeifer, Manuel Spannagl, Mihaela Martis, Martin Mascher17, Jarrod Chapman18, Jesse Poland4, Uwe Scholz17, Kerrie Barry18, Robbie Waugh19, Daniel S. Rokhsar18, Gary J. Muehlbauer, Nils Stein17, Heidrun Gundlach, Matthias Zytnicki20, Véronique Jamilloux20, Hadi Quesneville20, Thomas Wicker21, Primetta Faccioli, Moreno Colaiacovo, Antonio Michele Stanca, Hikmet Budak22, Luigi Cattivelli, Natasha Glover6, Lise Pingault6, Etienne Paux6, Sapna Sharma, Rudi Appels23, Matthew I. Bellgard23, Brett Chapman23, Thomas Nussbaumer, Kai Christian Bader, Hélène Rimbert, Shichen Wang4, Ron Knox, Andrzej Kilian, Michael Alaux20, Françoise Alfama20, Loïc Couderc20, Nicolas Guilhot6, Claire Viseux20, Mikaël Loaec20, Beat Keller21, Sébastien Praud 
18 Jul 2014-Science
TL;DR: Insight into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.
Abstract: An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.

1,421 citations

Journal ArticleDOI
TL;DR: It is demonstrated that LinkedOmics provides a unique platform for biologists and clinicians to access, analyze and compare cancer multi-omics data within and across tumor types.
Abstract: The LinkedOmics database contains multi-omics data and clinical data for 32 cancer types and a total of 11 158 patients from The Cancer Genome Atlas (TCGA) project. It is also the first multi-omics database that integrates mass spectrometry (MS)-based global proteomics data generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) on selected TCGA tumor samples. In total, LinkedOmics has more than a billion data points. To allow comprehensive analysis of these data, we developed three analysis modules in the LinkedOmics web application. The LinkFinder module allows flexible exploration of associations between a molecular or clinical attribute of interest and all other attributes, providing the opportunity to analyze and visualize associations between billions of attribute pairs for each cancer cohort. The LinkCompare module enables easy comparison of the associations identified by LinkFinder, which is particularly useful in multi-omics and pan-cancer analyses. The LinkInterpreter module transforms identified associations into biological understanding through pathway and network analysis. Using five case studies, we demonstrate that LinkedOmics provides a unique platform for biologists and clinicians to access, analyze and compare cancer multi-omics data within and across tumor types. LinkedOmics is freely available at http://www.linkedomics.org.

1,256 citations

Journal ArticleDOI
TL;DR: The ProteomeXchange Consortium of proteomics resources was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide and is supporting a change in culture of the proteomics field.
Abstract: The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components.We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted standard, supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approximately half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.

754 citations

Journal ArticleDOI
28 Jul 2016-Cell
TL;DR: A view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC is provided.

728 citations

Journal ArticleDOI
TL;DR: The current state of proteogenomic methods and applications are reviewed, including computational strategies for building and using customized protein sequence databases, and the challenge of false positive identifications are drawn attention.
Abstract: A proteogenomic approach to analyzing mass spectrometry–based proteomic data enables the discovery of novel peptides, provides peptide-level evidence of gene expression, and assists in refining gene models. Strategies for building custom sequence databases, applications benefitting from a proteogenomic approach, and challenges in interpreting data are discussed in this Review. Also in this issue, Alfaro et al. discuss the use of proteogenomic approaches for studying cancer biology.

617 citations