Home
/
Authors
/
Kenneth S. Katz

Author

Kenneth S. Katz

Bio: Kenneth S. Katz is an academic researcher from National Institutes of Health. The author has contributed to research in topics: RefSeq & Entrez. The author has an hindex of 13, co-authored 17 publications receiving 4668 citations.

Topics: RefSeq, Entrez, Metadata, MinHash, Entrez Gene ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

ClinVar: improving access to variant interpretations and supporting evidence.

[...]

Melissa J. Landrum¹, Jennifer M. Lee¹, Mark L. Benson¹, Garth Brown¹, Chen Chao¹, Shanmuga Chitipiralla¹, Baoshan Gu¹, Jennifer Hart¹, Douglas W. Hoffman¹, Wonhee Jang¹, Karen Karapetyan¹, Kenneth S. Katz¹, Chunlei Liu¹, Zenith Maddipatla¹, Malheiro Aj¹, Kurt McDaniel¹, Michael Ovetsky¹, George R. Riley¹, George Zhou¹, J. Bradley Holmes¹, Brandi L. Kattman¹, Donna Maglott¹ - Show less +18 more•Institutions (1)

National Institutes of Health¹

04 Jan 2018-Nucleic Acids Research

TL;DR: ClinVar continues to make improvements to its search and retrieval functions.

...read moreread less

Abstract: ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease, maintained at the National Institutes of Health. Interpretations of the clinical significance of variants are submitted by clinical testing laboratories, research laboratories, expert panels and other groups. ClinVar aggregates data by variant-disease pairs, and by variant (or set of variants). Data aggregated by variant are accessible on the website, in an improved set of variant call format files and as a new comprehensive XML report. ClinVar recently started accepting submissions that are focused primarily on providing phenotypic information for individuals who have had genetic testing. Submissions may come from clinical providers providing their own interpretation of the variant ('provider interpretation') or from groups such as patient registries that primarily provide phenotypic information from patients ('phenotyping only'). ClinVar continues to make improvements to its search and retrieval functions. Several new fields are now indexed for more precise searching, and filters allow the user to narrow down a large set of search results.

...read moreread less

2,345 citations

Journal Article•DOI•

ClinVar: public archive of interpretations of clinically relevant variants.

[...]

Melissa J. Landrum¹, Jennifer M. Lee¹, Mark L. Benson¹, Garth Brown¹, Chen Chao¹, Shanmuga Chitipiralla¹, Baoshan Gu¹, Jennifer Hart¹, Douglas W. Hoffman¹, Jeffrey Hoover¹, Wonhee Jang¹, Kenneth S. Katz¹, Michael Ovetsky¹, George R. Riley¹, Amanjeev Sethi¹, Raymond E. Tully¹, Ricardo Villamarin-Salomon¹, Wendy S. Rubinstein¹, Donna Maglott¹ - Show less +15 more•Institutions (1)

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

TL;DR: ClinVar at the National Center for Biotechnology Information (NCBI) is a freely available archive for interpretations of clinical significance of variants for reported conditions, which includes germline and somatic variants of any size, type or genomic location.

...read moreread less

Abstract: ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) at the National Center for Biotechnology Information (NCBI) is a freely available archive for interpretations of clinical significance of variants for reported conditions. The database includes germline and somatic variants of any size, type or genomic location. Interpretations are submitted by clinical testing laboratories, research laboratories, locus-specific databases, OMIM®, GeneReviews™, UniProt, expert panels and practice guidelines. In NCBI's Variation submission portal, submitters upload batch submissions or use the Submission Wizard for single submissions. Each submitted interpretation is assigned an accession number prefixed with SCV. ClinVar staff review validation reports with data types such as HGVS (Human Genome Variation Society) expressions; however, clinical significance is reported directly from submitters. Interpretations are aggregated by variant-condition combination and assigned an accession number prefixed with RCV. Clinical significance is calculated for the aggregate record, indicating consensus or conflict in the submitted interpretations. ClinVar uses data standards, such as HGVS nomenclature for variants and MedGen identifiers for conditions. The data are available on the web as variant-specific views; the entire data set can be downloaded via ftp. Programmatic access for ClinVar records is available through NCBI's E-utilities. Future development includes providing a variant-centric XML archive and a web page for details of SCV submissions.

...read moreread less

2,094 citations

Journal Article•DOI•

Gene: a gene-centered information resource at NCBI

[...]

Garth Brown¹, Vichet Hem¹, Kenneth S. Katz¹, Michael Ovetsky¹, Craig Wallin¹, Olga Ermolaeva¹, Igor Tolstoy¹, Tatiana Tatusova¹, Kim D. Pruitt¹, Donna Maglott¹, Terence Murphy¹ - Show less +7 more•Institutions (1)

National Institutes of Health¹

28 Jan 2015-Nucleic Acids Research

TL;DR: The National Center for Biotechnology Information's (NCBI) Gene database integrates gene-specific information from multiple data sources and represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI.

...read moreread less

Abstract: The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.

...read moreread less

489 citations

Journal Article•DOI•

Introducing RefSeq and LocusLink: curated human genome resources at the NCBI

[...]

Kim D. Pruitt¹, Kenneth S. Katz¹, Hugues Sicotte¹, Donna Maglott¹•Institutions (1)

National Institutes of Health¹

01 Jan 2000-Trends in Genetics

TL;DR: The goal of LocusLink and RefSeq is to include all known genes and their major products and to encourage collaborations with the scientific community to ensure that these resources are as comprehensive and accurate as possible.

...read moreread less

258 citations

Journal Article•DOI•

Human immunodeficiency virus type 1, human protein interaction database at NCBI.

[...]

William Fu¹, Brigitte E. Sanders-Beer², Kenneth S. Katz², Donna Maglott², Kim D. Pruitt², Roger G. Ptak² - Show less +2 more•Institutions (2)

Southern Research Institute¹, National Institutes of Health²

01 Jan 2009-Nucleic Acids Research

TL;DR: The ‘Human Immunodeficiency Virus Type 1 (HIV-1), Human Protein Interaction Database’, available through the National Library of Medicine at www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions, was created to catalog all interactions between HIV-1 and human proteins published in the peer-reviewed literature.

...read moreread less

Abstract: The ‘Human Immunodeficiency Virus Type 1 (HIV-1), Human Protein Interaction Database’, available through the National Library of Medicine at www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions, was created to catalog all interactions between HIV-1 and human proteins published in the peer-reviewed literature. The database serves the scientific community exploring the discovery of novel HIV vaccine candidates and therapeutic targets. To facilitate this discovery approach, the following information for each HIV-1 human protein interaction is provided and can be retrieved without restriction by web-based downloads and ftp protocols: Reference Sequence (RefSeq) protein accession numbers, Entrez Gene identification numbers, brief descriptions of the interactions, searchable keywords for interactions and PubMed identification numbers (PMIDs) of journal articles describing the interactions. Currently, 2589 unique HIV-1 to human protein interactions and 5135 brief descriptions of the interactions, with a total of 14 312 PMID references to the original articles reporting the interactions, are stored in this growing database. In addition, all protein–protein interactions documented in the database are integrated into Entrez Gene records and listed in the ‘HIV-1 protein interactions’ section of Entrez Gene reports. The database is also tightly linked to other databases through Entrez Gene, enabling users to search for an abundance of information related to HIV pathogenesis and replication.

...read moreread less

249 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

Journal Article•DOI•

The Human Genome Browser at UCSC

[...]

W. James Kent¹, Charles W. Sugnet¹, Terrence S. Furey¹, Krishna M. Roskin¹, Tom H. Pringle, Alan M. Zahler¹, and David Haussler¹ - Show less +3 more•Institutions (1)

University of California, Santa Cruz¹

01 Jun 2002-Genome Research

TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.

...read moreread less

Abstract: As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.

...read moreread less

9,605 citations

Journal Article•DOI•

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

[...]

Maxim V. Kuleshov¹, Matthew R. Jones¹, Andrew D. Rouillard¹, Nicolas F. Fernandez¹, Qiaonan Duan¹, Zichen Wang¹, Simon Koplev¹, Sherry L. Jenkins¹, Kathleen M. Jagodnik², Alexander Lachmann¹, Michael G. McDermott¹, Caroline D. Monteiro¹, Gregory W. Gundersen¹, Avi Ma'ayan¹ - Show less +10 more•Institutions (2)

Icahn School of Medicine at Mount Sinai¹, Glenn Research Center²

08 Jul 2016-Nucleic Acids Research

TL;DR: A significant update to one of the tools in this domain called Enrichr, a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries is presented.

...read moreread less

Abstract: Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

...read moreread less

6,201 citations

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

Nuala A. O'Leary¹, Mathew W. Wright¹, J. Rodney Brister¹, Stacy Ciufo¹, Diana Haddad¹, Richard McVeigh¹, Bhanu Rajput¹, Barbara Robbertse¹, Brian Smith-White¹, Danso Ako-adjei¹, Alexander Astashyn¹, Azat Badretdin¹, Yiming Bao¹, Olga Blinkova¹, Vyacheslav Brover¹, Vyacheslav Chetvernin¹, Jinna Choi¹, Eric Cox¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Tamara Goldfarb¹, Tripti Gupta¹, Daniel H. Haft¹, Eneida L. Hatcher¹, Wratko Hlavina¹, Vinita Joardar¹, Vamsi K. Kodali¹, Wenjun Li¹, Donna Maglott¹, Patrick Masterson¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Kathleen O'Neill¹, Shashikant Pujar¹, Sanjida H. Rangwala¹, Daniel Rausch¹, Lillian D. Riddick¹, Conrad L. Schoch¹, Andrei Shkeda¹, Susan S. Storz¹, Hanzhen Sun¹, Françoise Thibaud-Nissen¹, Igor Tolstoy¹, Raymond E. Tully¹, Anjana R. Vatsan¹, Craig Wallin¹, David Webb¹, Wendy Wu¹, Melissa J. Landrum¹, Avi Kimchi¹, Tatiana Tatusova¹, Michael DiCuccio¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹ - Show less +51 more•Institutions (1)

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.

...read moreread less

Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

...read moreread less

4,104 citations

Journal Article•DOI•

The impact of microRNAs on protein output

[...]

Daehyun Baek¹, Judit Villén², Chanseok Shin¹, Fernando D. Camargo¹, Steven P. Gygi², David P. Bartel¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Harvard University²

04 Sep 2008-Nature

TL;DR: The impact of micro RNAs on the proteome indicated that for most interactions microRNAs act as rheostats to make fine-scale adjustments to protein output.

...read moreread less

Abstract: MicroRNAs are endogenous ∼23-nucleotide RNAs that can pair to sites in the messenger RNAs of protein-coding genes to downregulate the expression from these messages. MicroRNAs are known to influence the evolution and stability of many mRNAs, but their global impact on protein output had not been examined. Here we use quantitative mass spectrometry to measure the response of thousands of proteins after introducing microRNAs into cultured cells and after deleting mir-223 in mouse neutrophils. The identities of the responsive proteins indicate that targeting is primarily through seed-matched sites located within favourable predicted contexts in 3′ untranslated regions. Hundreds of genes were directly repressed, albeit each to a modest degree, by individual microRNAs. Although some targets were repressed without detectable changes in mRNA levels, those translationally repressed by more than a third also displayed detectable mRNA destabilization, and, for the more highly repressed targets, mRNA destabilization usually comprised the major component of repression. The impact of microRNAs on the proteome indicated that for most interactions microRNAs act as rheostats to make fine-scale adjustments to protein output. MicroRNAs can regulate gene expression by either inhibiting translation of a messenger RNA, or inducing its degradation. While previous studies have measured regulation at the mRNA level, it was unknown how much regulation occurred at the protein level. Now two groups led by David Bartel and Nikolaus Rajewsky have used variants of the technique known as SILAC (stable isotope labelling with amino acids in cell culture) to measure proteome-wide changes in protein level as a function of expression of endogenous and exogenous microRNAs. They find that while microRNAs can directly repress the translation of hundreds of genes, additional indirect effects result in changes in expression of thousands of genes. Many of the changes observed are less than twofold in magnitude, however, indicating either directly or indirectly, microRNAs can act as rheostats to fine-tune protein synthesis to match the needs of the cell at any given time. In one of two studies, a technique known as SILAC is used to measure, on a large scale, changes in protein level as a function of expression of endogenous and exogenous miRNAs. It is found that although miRNAs directly repress the translation of hundreds of genes, additional indirect effects result in changes in expression of thousands of genes.

...read moreread less

3,562 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse