Home
/
Authors
/
Sunghee Woo

Author

Sunghee Woo

Bio: Sunghee Woo is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Proteogenomics & Proteome. The author has an hindex of 7, co-authored 7 publications receiving 876 citations.

Topics: Proteogenomics, Proteome, Genome, Shotgun sequencing, Exome ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer

[...]

Hui Zhang¹, Tao Liu², Zhen Zhang¹, Samuel H. Payne², Bai Zhang¹, Jason E. McDermott², Jian-Ying Zhou¹, Vladislav A. Petyuk², Li Chen¹, Debjit Ray², Shisheng Sun¹, Feng Yang², Lijun Chen¹, Jing Wang³, Punit Shah¹, Seong Won Cha⁴, Paul Aiyetan¹, Sunghee Woo⁴, Yuan Tian¹, Marina A. Gritsenko², Therese R. W. Clauss², Caitlin H. Choi¹, Matthew E. Monroe², Stefani N. Thomas¹, Song Nie², Chaochao Wu², Ronald J. Moore², Kun-Hsing Yu⁵, David L. Tabb³, David Fenyö⁶, Vineet Bafna⁴, Yue Wang⁷, Henry Rodriguez, Emily S. Boja, Tara Hiltke, Robert Rivers, Lori J. Sokoll¹, Heng Zhu¹, Ie Ming Shih¹, Leslie Cope¹, Akhilesh Pandey¹, Bing Zhang³, Michael Snyder⁵, Douglas A. Levine⁶, Richard D. Smith², Daniel W. Chan¹, Karin D. Rodland², Steven A. Carr, Michael A. Gillette, Karl R. Klauser, Eric Kuhn, D. R. Mani, Philipp Mertins, Karen A. Ketchum, Ratna R. Thangudu, Shuang Cai, Mauricio Oberti, Amanda G. Paulovich, Jeffrey R. Whiteaker, Nathan Edwards, Peter B. McGarvey, Subha Madhavan, Pei Wang, Gordon Whiteley, Steven J. Skates, Forest M. White, Christopher R. Kinsinger, Mehdi Mesri, Kenna M. Shaw, Stephen E. Stein, Paul A. Rudnick, Michael Snyder⁵, Yingming Zhao, Xian Chen, David F. Ransohoff, Andrew N. Hoofnagle, Daniel C. Liebler, Melinda E. Sanders, Zhiao Shi, Robbert J.C. Slebos, Lisa J. Zimmerman, Sherri R. Davies, Li Ding, Matthew J. Ellis, R. Reid Townsend - Show less +81 more•Institutions (7)

Johns Hopkins University¹, Pacific Northwest National Laboratory², Vanderbilt University³, University of California, San Diego⁴, Stanford University⁵, New York University⁶, Virginia Tech⁷

28 Jul 2016-Cell

TL;DR: A view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC is provided.

...read moreread less

728 citations

Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer

[...]

Johns Hopkins University¹, Pacific Northwest National Laboratory², Vanderbilt University³, University of California, San Diego⁴, Stanford University⁵, New York University⁶, Virginia Tech⁷

01 Jun 2016

TL;DR: In this article, a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer was provided, such as how different copy-number alterna-tions in the Proteome, the proteins associated with chromosomal instability, the sets of signalingpathways that diverse genome rearrangements converge on, and the ones associated with short overall survival.

...read moreread less

Abstract: To provide a detailed analysis of the molecular com-ponents and underlying mechanisms associatedwith ovarian cancer, we performed a comprehensivemass-spectrometry-based proteomic characteriza-tion of 174 ovarian tumors previously analyzed byThe Cancer Genome Atlas (TCGA), of which 169were high-grade serous carcinomas (HGSCs). Inte-grating our proteomic measurements with thegenomic data yielded a number of insights into dis-ease, such as how different copy-number alterna-tionsinﬂuencetheproteome,theproteinsassociatedwith chromosomal instability, the sets of signalingpathways that diverse genome rearrangementsconverge on, and the ones most associated withshort overall survival. Speciﬁc protein acetylationsassociated with homologous recombination deﬁ-ciency suggest a potential means for stratifying pa-tients for therapy. In addition to providing a valuableresource,theseﬁndingsprovideaviewofhowtheso-maticgenomedrivesthecancerproteomeandasso-ciations between protein and post-translationalmodiﬁcation levels and clinical outcomes in HGSC.

...read moreread less

160 citations

Journal Article•DOI•

Proteogenomic database construction driven from large scale RNA-seq data.

[...]

Sunghee Woo¹, Seong Won Cha¹, Gennifer E. Merrihew², Yupeng He¹, Natalie Castellana¹, Clark C. Guest¹, Michael J. MacCoss², Vineet Bafna¹ - Show less +4 more•Institutions (2)

University of California, San Diego¹, University of Washington²

03 Jan 2014-Journal of Proteome Research

TL;DR: This paper construction of a compact database that contains all useful information expressed in RNA-seq reads is presented, highlighting the usefulness of transcript + proteomic integration for improved genome annotations.

...read moreread less

Abstract: The advent of inexpensive RNA-seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS-based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our paper addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads. Applying our method to cumulative C. elegans data reduced 496.2 GB of aligned RNA-seq SAM files to 410 MB of splice graph database written in FASTA format. This corresponds to 1000× compression of data size, without loss of sensitivity. We performed a proteogenomics study using the custom data set, using a completely automated pipeline, and identified a total of 4044 novel events, including 215 novel genes, 808 novel exons, 12 alternative splicings, 618 gene-boundary corrections, 245 exon-boundary changes, 938 frame shifts, 1166 reverse strands, and 42 translated UTRs. Our results highlight the usefulness of transcript + proteomic integration for improved genome annotations.

...read moreread less

116 citations

Journal Article•DOI•

Proteogenomic strategies for identification of aberrant cancer peptides using large‐scale next‐generation sequencing data

[...]

Sunghee Woo¹, Seong Won Cha¹, Seungjin Na¹, Clark C. Guest¹, Tao Liu², Richard D. Smith², Karin D. Rodland², Samuel H. Payne², Vineet Bafna¹ - Show less +5 more•Institutions (2)

University of California, San Diego¹, Pacific Northwest National Laboratory²

01 Dec 2014-Proteomics

TL;DR: A discussion of applying different strategies relating to large database search, FDR (false discovery rate) ‐based error control, and their implication to cancer proteogenomics extends and develops the idea of a unified genomic variant database that can be searched by any MS sample.

...read moreread less

Abstract: Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular subtyping of cancers, understanding cancer progression, and the discovery of novel biomarkers. The advances of genomics technologies (whole-genome exome, and transcript sequencing, collectively referred to as NGS (next-generation sequencing)) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome translated portion of aberrant genes using only genomic approaches. Combination of proteomic and genomic technologies are increasingly being employed. Various strategies have been employed to allow the usage of large-scale NGS data for conventional MS/MS searches. This paper provides a discussion of applying different strategies relating to large database search, and FDR (false discovery rate) -based error control, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any MS sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database that contained 2787062 novel splice junctions, 38 464 deletions, 1 105 insertions, and 182 302 substitutions. Proteomic data from a single ovarian carcinoma sample (439 858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65 578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and nonsample-recruited mutations, which emphasize the strength of our approach.

...read moreread less

62 citations

Journal Article•DOI•

Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis

[...]

Dhanashree S. Kelkar¹, Elayne Provost², Raghothama Chaerkady², Babylakshmi Muthusamy³, Srikanth S. Manda⁴, Srikanth S. Manda³, Tejaswini Subbannayya¹, Lakshmi Dhevi N. Selvan¹, Chieh-Huei Wang², Keshava K. Datta⁵, Sunghee Woo⁶, Sutopa B. Dwivedi¹, Santosh Renuse¹, Derese Getnet², Tai-Chung Huang², Min-Sik Kim², Min-Sik Kim⁴, Sneha M. Pinto², Sneha M. Pinto⁷, Chris J. Mitchell², Anil K. Madugundu, Praveen Kumar, Jyoti Sharma⁷, Jayshree Advani, Gourav Dey⁷, Lavanya Balakrishnan⁸, Nazia Syed³, Vishalakshi Nanjappa¹, Yashwanth Subbannayya, Renu Goel, T. S. Keshava Prasad, Vineet Bafna⁶, Ravi Sirdeshmukh, Harsha Gowda, Charles Wang⁹, Steven D. Leach², Akhilesh Pandey - Show less +33 more•Institutions (9)

Amrita Vishwa Vidyapeetham¹, Johns Hopkins University², Pondicherry University³, Johns Hopkins University School of Medicine⁴, KIIT University⁵, University of California, San Diego⁶, Manipal University⁷, Kuvempu University⁸, Loma Linda University⁹

01 Nov 2014-Molecular & Cellular Proteomics

TL;DR: This study uses an integrated transcriptomic and proteomic strategy to validate and improve the existing zebrafish genome annotation, and reports the identification of 157 novel protein-coding genes.

...read moreread less

46 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome

[...]

Klaus F. X. Mayer, Jane Rogers, Jaroslav Doležel¹, Curtis J. Pozniak², Kellye Eversole, Catherine Feuillet³, Bikram S. Gill⁴, Bernd Friebe⁴, Adam J. Lukaszewski⁵, Pierre Sourdille⁶, Takashi R. Endo⁷, M. Kubaláková¹, Jarmila Číhalíková¹, Zdeňka Dubská¹, Jan Vrána¹, Romana Šperková¹, Hana Šimková¹, Melanie Febrer⁸, Leah Clissold, Kirsten McLay, Kuldeep Singh⁹, Parveen Chhuneja⁹, Nagendra K. Singh¹⁰, Jitendra P. Khurana¹¹, Eduard Akhunov⁴, Frédéric Choulet⁶, Adriana Alberti, Valérie Barbe, Patrick Wincker, Hiroyuki Kanamori¹², Fuminori Kobayashi¹², Takeshi Itoh¹², Takashi Matsumoto¹², Hiroaki Sakai¹², Tsuyoshi Tanaka¹², Jianzhong Wu¹², Yasunari Ogihara¹³, Hirokazu Handa¹², P. Ron Maclachlan², Andrew G. Sharpe¹⁴, Darrin Klassen¹⁴, David Edwards, Jacqueline Batley, Odd-Arne Olsen, Simen Rød Sandve¹⁵, Sigbjørn Lien¹⁵, Burkhard Steuernagel¹⁶, Brande B. H. Wulff¹⁶, Mario Caccamo, Sarah Ayling, Ricardo H. Ramirez-Gonzalez, Bernardo J. Clavijo, Jonathan M. Wright, Matthias Pfeifer, Manuel Spannagl, Mihaela Martis, Martin Mascher¹⁷, Jarrod Chapman¹⁸, Jesse Poland⁴, Uwe Scholz¹⁷, Kerrie Barry¹⁸, Robbie Waugh¹⁹, Daniel S. Rokhsar¹⁸, Gary J. Muehlbauer, Nils Stein¹⁷, Heidrun Gundlach, Matthias Zytnicki²⁰, Véronique Jamilloux²⁰, Hadi Quesneville²⁰, Thomas Wicker²¹, Primetta Faccioli, Moreno Colaiacovo, Antonio Michele Stanca, Hikmet Budak²², Luigi Cattivelli, Natasha Glover⁶, Lise Pingault⁶, Etienne Paux⁶, Sapna Sharma, Rudi Appels²³, Matthew I. Bellgard²³, Brett Chapman²³, Thomas Nussbaumer, Kai Christian Bader, Hélène Rimbert, Shichen Wang⁴, Ron Knox, Andrzej Kilian, Michael Alaux²⁰, Françoise Alfama²⁰, Loïc Couderc²⁰, Nicolas Guilhot⁶, Claire Viseux²⁰, Mikaël Loaec²⁰, Beat Keller²¹, Sébastien Praud - Show less +92 more•Institutions (23)

Academy of Sciences of the Czech Republic¹, University of Saskatchewan², Bayer³, Kansas State University⁴, University of California, Riverside⁵, Blaise Pascal University⁶, Kyoto University⁷, University of Dundee⁸, Punjab Agricultural University⁹, Indian Agricultural Research Institute¹⁰, University of Delhi¹¹, University of Tsukuba¹², Yokohama City University¹³, National Research Council¹⁴, Norwegian University of Life Sciences¹⁵, Sainsbury Laboratory¹⁶, Leibniz Association¹⁷, United States Department of Energy¹⁸, James Hutton Institute¹⁹, Institut national de la recherche agronomique²⁰, University of Zurich²¹, Sabancı University²², Murdoch University²³

18 Jul 2014-Science

TL;DR: Insight into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.

...read moreread less

Abstract: An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.

...read moreread less

1,421 citations

Journal Article•DOI•

LinkedOmics: analyzing multi-omics data within and across 32 cancer types.

[...]

Suhas Vasaikar¹, Peter Straub², Jing Wang¹, Bing Zhang¹•Institutions (2)

Baylor College of Medicine¹, Vanderbilt University Medical Center²

04 Jan 2018-Nucleic Acids Research

TL;DR: It is demonstrated that LinkedOmics provides a unique platform for biologists and clinicians to access, analyze and compare cancer multi-omics data within and across tumor types.

...read moreread less

Abstract: The LinkedOmics database contains multi-omics data and clinical data for 32 cancer types and a total of 11 158 patients from The Cancer Genome Atlas (TCGA) project. It is also the first multi-omics database that integrates mass spectrometry (MS)-based global proteomics data generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) on selected TCGA tumor samples. In total, LinkedOmics has more than a billion data points. To allow comprehensive analysis of these data, we developed three analysis modules in the LinkedOmics web application. The LinkFinder module allows flexible exploration of associations between a molecular or clinical attribute of interest and all other attributes, providing the opportunity to analyze and visualize associations between billions of attribute pairs for each cancer cohort. The LinkCompare module enables easy comparison of the associations identified by LinkFinder, which is particularly useful in multi-omics and pan-cancer analyses. The LinkInterpreter module transforms identified associations into biological understanding through pathway and network analysis. Using five case studies, we demonstrate that LinkedOmics provides a unique platform for biologists and clinicians to access, analyze and compare cancer multi-omics data within and across tumor types. LinkedOmics is freely available at http://www.linkedomics.org.

...read moreread less

1,256 citations

Journal Article•DOI•

The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition

[...]

Eric W. Deutsch¹, Attila Csordas², Zhi Sun¹, Andrew F. Jarnuczak², Yasset Perez-Riverol², Tobias Ternent², David S. Campbell¹, Manuel Bernal-Llinares², Shujiro Okuda³, Shin Kawano, Robert L. Moritz¹, Jeremy Carver⁴, Mingxun Wang⁵, Mingxun Wang⁴, Yasushi Ishihama⁶, Nuno Bandeira⁴, Nuno Bandeira⁵, Henning Hermjakob², Henning Hermjakob⁷, Juan Antonio Vizcaíno² - Show less +16 more•Institutions (7)

Institute for Systems Biology¹, European Bioinformatics Institute², Niigata University³, University of California, San Diego⁴, University of Montana⁵, Kyoto University⁶, Protein Sciences⁷

04 Jan 2017-Nucleic Acids Research

TL;DR: The ProteomeXchange Consortium of proteomics resources was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide and is supporting a change in culture of the proteomics field.

...read moreread less

Abstract: The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components.We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted standard, supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approximately half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.

...read moreread less

754 citations

Journal Article•DOI•

Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer

[...]

Johns Hopkins University¹, Pacific Northwest National Laboratory², Vanderbilt University³, University of California, San Diego⁴, Stanford University⁵, New York University⁶, Virginia Tech⁷

28 Jul 2016-Cell

TL;DR: A view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC is provided.

...read moreread less

728 citations

Journal Article•DOI•

Proteogenomics: concepts, applications and computational strategies

[...]

Alexey I. Nesvizhskii¹•Institutions (1)

University of Michigan¹

01 Nov 2014-Nature Methods

TL;DR: The current state of proteogenomic methods and applications are reviewed, including computational strategies for building and using customized protein sequence databases, and the challenge of false positive identifications are drawn attention.

...read moreread less

Abstract: A proteogenomic approach to analyzing mass spectrometry–based proteomic data enables the discovery of novel peptides, provides peptide-level evidence of gene expression, and assists in refining gene models. Strategies for building custom sequence databases, applications benefitting from a proteogenomic approach, and challenges in interpreting data are discussed in this Review. Also in this issue, Alfaro et al. discuss the use of proteogenomic approaches for studying cancer biology.

...read moreread less

617 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

Collapse