Home
/
Authors
/
Wendy Wu

Author

Wendy Wu

Other affiliations: University of California, Santa Cruz

Bio: Wendy Wu is an academic researcher from National Institutes of Health. The author has contributed to research in topics: RefSeq & Reference genome. The author has an hindex of 4, co-authored 4 publications receiving 4017 citations. Previous affiliations of Wendy Wu include University of California, Santa Cruz.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

Nuala A. O'Leary¹, Mathew W. Wright¹, J. Rodney Brister¹, Stacy Ciufo¹, Diana Haddad¹, Richard McVeigh¹, Bhanu Rajput¹, Barbara Robbertse¹, Brian Smith-White¹, Danso Ako-adjei¹, Alexander Astashyn¹, Azat Badretdin¹, Yiming Bao¹, Olga Blinkova¹, Vyacheslav Brover¹, Vyacheslav Chetvernin¹, Jinna Choi¹, Eric Cox¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Tamara Goldfarb¹, Tripti Gupta¹, Daniel H. Haft¹, Eneida L. Hatcher¹, Wratko Hlavina¹, Vinita Joardar¹, Vamsi K. Kodali¹, Wenjun Li¹, Donna Maglott¹, Patrick Masterson¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Kathleen O'Neill¹, Shashikant Pujar¹, Sanjida H. Rangwala¹, Daniel Rausch¹, Lillian D. Riddick¹, Conrad L. Schoch¹, Andrei Shkeda¹, Susan S. Storz¹, Hanzhen Sun¹, Françoise Thibaud-Nissen¹, Igor Tolstoy¹, Raymond E. Tully¹, Anjana R. Vatsan¹, Craig Wallin¹, David Webb¹, Wendy Wu¹, Melissa J. Landrum¹, Avi Kimchi¹, Tatiana Tatusova¹, Michael DiCuccio¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹ - Show less +51 more•Institutions (1)

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.

...read moreread less

Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

...read moreread less

4,104 citations

Journal Article•DOI•

RefSeq: an update on mammalian reference sequences

[...]

Kim D. Pruitt¹, Garth Brown¹, Susan M. Hiatt¹, Françoise Thibaud-Nissen¹, Alexander Astashyn¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Jennifer Hart¹, Melissa J. Landrum¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Nuala A. O'Leary¹, Shashikant Pujar¹, Bhanu Rajput¹, Sanjida H. Rangwala¹, Lillian D. Riddick¹, Andrei Shkeda¹, Hanzhen Sun¹, Pamela Tamez¹, Raymond E. Tully¹, Craig Wallin¹, David Webb¹, Janet Weber¹, Wendy Wu¹, Michael DiCuccio¹, Paul Kitts¹, Donna Maglott¹, Terence Murphy¹, James Ostell¹ - Show less +25 more•Institutions (1)

National Institutes of Health¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration.

...read moreread less

Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://wwwncbinlmnihgov/refseq/) We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project

...read moreread less

949 citations

Journal Article•DOI•

Current status and new features of the Consensus Coding Sequence database

[...]

Catherine M. Farrell¹, Nuala A. O'Leary¹, Rachel A. Harte¹, Jane E. Loveland¹, Laurens G. Wilming¹, Craig Wallin¹, Mark Diekhans¹, Daniel Barrell¹, Stephen M. J. Searle¹, Bronwen Aken¹, Susan M. Hiatt¹, Adam Frankish¹, Marie-Marthe Suner¹, Bhanu Rajput¹, Charles A. Steward¹, Garth Brown¹, Ruth Bennett¹, Michael R. Murphy¹, Wendy Wu¹, M. Kay¹, Jennifer Hart¹, Jeena Rajan¹, Janet Weber¹, Catherine Snow¹, Lillian D. Riddick¹, Toby Hunt¹, David Webb¹, Mark G. Thomas¹, Pamela Tamez¹, Sanjida H. Rangwala¹, Kelly M. McGarvey¹, Shashikant Pujar¹, Andrei Shkeda¹, Jonathan M. Mudge¹, José M. González¹, James G. R. Gilbert¹, Stephen J. Trevanion¹, Robert Baertsch¹, Jennifer Harrow¹, Tim Hubbard¹, James Ostell¹, David Haussler¹, Kim D. Pruitt¹ - Show less +39 more•Institutions (1)

University of California, Santa Cruz¹

01 Jan 2014-Nucleic Acids Research

TL;DR: The current status and recent growth in the CCDS dataset is described, as well as recent changes to the web and FTP sites, which include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and the approach to representing genes for which support evidence is incomplete.

...read moreread less

Abstract: The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

...read moreread less

157 citations

Journal Article•DOI•

The completion of the Mammalian Gene Collection (MGC).

[...]

Gary F. Temple¹, Daniela S. Gerhard¹, Rebekah S. Rasooly¹, Elise A. Feingold¹, Peter J. Good¹, Cristen Robinson¹, Allison Mandich¹, Jeffrey G. Derge², Jeanne Lewis², Debonny Shoaf², Francis S. Collins¹, Wonhee Jang¹, Lukas Wagner¹, Carolyn M. Shenmen¹, Leonie Misquitta¹, Carl F. Schaefer¹, Kenneth H. Buetow¹, Tom I. Bonner¹, Linda Yankie¹, Ming Ward¹, Lon Phan¹, Alex Astashyn¹, Garth Brown¹, Catherine M. Farrell¹, Jennifer Hart¹, Melissa J. Landrum¹, Bonnie L. Maidak¹, Michael R. Murphy¹, Terence Murphy¹, Bhanu Rajput¹, Lillian D. Riddick¹, David Webb¹, Janet Weber¹, Wendy Wu¹, Kim D. Pruitt¹, Donna Maglott¹, Adam Siepel³, Brona Brejova⁴, Brona Brejova³, Mark Diekhans⁵, Rachel A. Harte⁵, Robert Baertsch⁵, Jim Kent⁵, David Haussler⁵, Michael R. Brent⁶, Laura Langton⁶, Charles L.G. Comstock⁶, Michael Stevens⁶, Chaochun Wei⁶, Chaochun Wei⁷, Marijke J. van Baren⁶, Kourosh Salehi-Ashtiani⁸, Ryan R. Murray⁸, Lila Ghamsari⁸, Elizabeth Mello⁸, Chenwei Lin⁸, Chenwei Lin⁹, Christa Pennacchio¹⁰, Christa Pennacchio¹¹, Kirsten Schreiber¹¹, Nicole Shapiro¹¹, Nicole Shapiro¹², Amber Marsh¹¹, Elizabeth Pardes¹¹, Troy Moore, Anita Lebeau, Mike Muratet, Blake A. Simmons, David Kloske, Stephanie Sieja, James R. Hudson, Praveen Sethupathy¹, Michael J. Brownstein¹, Narayan K. Bhat¹³, Narayan K. Bhat¹, Joseph Lazar¹⁴, Howard J. Jacob¹⁴, Chris E. Gruber, Mark R. Smith, John Douglas Mcpherson¹⁵, Angela M. Garcia¹⁵, Preethi H. Gunaratne¹⁵, Preethi H. Gunaratne¹⁶, Jia Qian Wu¹⁵, Jia Qian Wu¹⁷, Donna M. Muzny¹⁵, Richard A. Gibbs¹⁵, Alice C. Young¹, Gerard G. Bouffard¹, Robert W. Blakesley¹, Jim C. Mullikin¹, Eric D. Green¹, Mark Dickson⁹, Alex Rodriguez⁹, Alex Rodriguez¹⁸, Jane Grimwood⁹, Jeremy Schmutz⁹, Richard M. Myers⁹, Martin Hirst¹⁹, Thomas Zeng¹⁹, Kane Tse¹⁹, Michelle Moksa¹⁹, Merinda Deng¹⁹, Kevin Ma¹⁹, Diana Mah¹⁹, Johnson Pang¹⁹, Greg Taylor¹⁹, Eric Chuah¹⁹, Athena Deng¹⁹, Keith Fichter¹⁹, Anne Go¹⁹, Stephanie Lee¹⁹, Jing Wang¹⁹, Malachi Griffith¹⁹, Ryan D. Morin¹⁹, Richard A. Moore¹⁹, Michael Mayo¹⁹, Sarah Munro¹⁹, Susan Wagner¹⁹, Steven J.M. Jones¹⁹, Robert A. Holt¹⁹, Marco A. Marra¹⁹, Sun Lu, Shuwei Yang, James Hartigan²⁰, Marcus Graf, Ralf Wagner, Stanley Letovksy²¹, Jacqueline C. Pulido, Keith Robison, Dominic Esposito¹, James L. Hartley¹, Vanessa Wall¹, Ralph F. Hopkins¹, Osamu Ohara, Stefan Wiemann²² - Show less +132 more•Institutions (22)

National Institutes of Health¹, Science Applications International Corporation², Cornell University³, Comenius University in Bratislava⁴, University of California, Santa Cruz⁵, Washington University in St. Louis⁶, Shanghai Jiao Tong University⁷, Harvard University⁸, Stanford University⁹, Lawrence Berkeley National Laboratory¹⁰, Lawrence Livermore National Laboratory¹¹, United States Department of Energy¹², United States Patent and Trademark Office¹³, Medical College of Wisconsin¹⁴, Baylor College of Medicine¹⁵, University of Houston¹⁶, Yale University¹⁷, Baxter International¹⁸, University of British Columbia¹⁹, Beckman Coulter²⁰, Helicos BioSciences²¹, German Cancer Research Center²²

01 Dec 2009-Genome Research

TL;DR: The Mammalian Gene Collection now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97%.

...read moreread less

Abstract: Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.

...read moreread less

140 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

12,661 citations

Journal Article•DOI•

KEGG: new perspectives on genomes, pathways, diseases and drugs

[...]

Minoru Kanehisa¹, Miho Furumichi¹, Mao Tanabe¹, Yoko Sato², Kanae Morishima¹ - Show less +1 more•Institutions (2)

Kyoto University¹, Fujitsu²

04 Jan 2017-Nucleic Acids Research

TL;DR: The content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases, and the newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined.

...read moreread less

Abstract: KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information.

...read moreread less

5,741 citations

Journal Article•DOI•

The Ensembl Variant Effect Predictor.

[...]

William M. McLaren¹, Laurent Gil¹, Sarah E. Hunt¹, Harpreet Singh Riat¹, Graham R. S. Ritchie¹, Anja Thormann¹, Paul Flicek¹, Fiona Cunningham¹ - Show less +4 more•Institutions (1)

European Bioinformatics Institute¹

06 Jun 2016-Genome Biology

TL;DR: The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

...read moreread less

Abstract: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

...read moreread less

4,658 citations

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

...read moreread less

4,104 citations

Journal Article•DOI•

Genome Regulation by Long Noncoding RNAs

[...]

John L. Rinn¹, Howard Y. Chang²•Institutions (2)

Harvard University¹, Stanford University²

04 Jun 2012-Annual Review of Biochemistry

TL;DR: Long noncoding RNAs (lncRNAs) as discussed by the authors form extensive networks of ribonucleoprotein (RNP) complexes with numerous chromatin regulators and then target these enzymatic activities to appropriate locations in the genome.

...read moreread less

Abstract: The central dogma of gene expression is that DNA is transcribed into messenger RNAs, which in turn serve as the template for protein synthesis. The discovery of extensive transcription of large RNA transcripts that do not code for proteins, termed long noncoding RNAs (lncRNAs), provides an important new perspective on the centrality of RNA in gene regulation. Here, we discuss genome-scale strategies to discover and characterize lncRNAs. An emerging theme from multiple model systems is that lncRNAs form extensive networks of ribonucleoprotein (RNP) complexes with numerous chromatin regulators and then target these enzymatic activities to appropriate locations in the genome. Consistent with this notion, lncRNAs can function as modular scaffolds to specify higher-order organization in RNP complexes and in chromatin states. The importance of these modes of regulation is underscored by the newly recognized roles of long RNAs for proper gene control across all kingdoms of life.

...read moreread less

3,075 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse