Home
/
Authors
/
Aron Marchler-Bauer

Author

Aron Marchler-Bauer

Other affiliations: Research Institute of Molecular Pathology, University of California, San Francisco, French Institute of Health and Medical Research ...read more

Bio: Aron Marchler-Bauer is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Conserved Domain Database & Entrez. The author has an hindex of 35, co-authored 61 publications receiving 20666 citations. Previous affiliations of Aron Marchler-Bauer include Research Institute of Molecular Pathology & University of California, San Francisco.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2012
2011
2010
2009
2008
2007
2004
2003
2002
2000
1999
1998
1997
1996

Papers

PDF

Open Access

More filters

Journal Article•DOI•

CDD: a Conserved Domain Database for the functional annotation of proteins

[...]

Aron Marchler-Bauer¹, Shennan Lu¹, John B. Anderson¹, Farideh Chitsaz¹, Myra K. Derbyshire¹, Carol DeWeese-Scott¹, Jessica H. Fong¹, Lewis Y. Geer¹, Renata C. Geer¹, Noreen R. Gonzales¹, Marc Gwadz¹, David I. Hurwitz¹, John D. Jackson¹, Zhaoxi Ke¹, Christopher J. Lanczycki¹, Fu-Ping Lu¹, Gabriele H. Marchler¹, Mikhail Mullokandov¹, Marina V. Omelchenko¹, Cynthia L. Robertson¹, James S. Song¹, Narmada Thanki¹, Roxanne A. Yamashita¹, Dachuan Zhang¹, Naigong Zhang¹, Chanjuan Zheng¹, Stephen H. Bryant¹ - Show less +23 more•Institutions (1)

National Institutes of Health¹

01 Jan 2011-Nucleic Acids Research

TL;DR: NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints.

...read moreread less

Abstract: NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

...read moreread less

2,934 citations

Journal Article•DOI•

CDD: NCBI's conserved domain database

[...]

Aron Marchler-Bauer¹, Myra K. Derbyshire¹, Noreen R. Gonzales¹, Shennan Lu¹, Farideh Chitsaz¹, Lewis Y. Geer¹, Renata C. Geer¹, Jane He¹, Marc Gwadz¹, David I. Hurwitz¹, Christopher J. Lanczycki¹, Fu Lu¹, Gabriele H. Marchler¹, James S. Song¹, Narmada Thanki¹, Zhouxi Wang¹, Roxanne A. Yamashita¹, Dachuan Zhang¹, Chanjuan Zheng¹, Stephen H. Bryant¹ - Show less +16 more•Institutions (1)

National Institutes of Health¹

28 Jan 2015-Nucleic Acids Research

TL;DR: NCBI's CDD, the Conserved Domain Database, enters its 15th year as a public resource for the annotation of proteins with the location of conserved domain footprints and aims at increasing coverage and providing finer-grained classifications of common protein domains.

...read moreread less

Abstract: NCBI's CDD, the Conserved Domain Database, enters its 15th year as a public resource for the annotation of proteins with the location of conserved domain footprints. Going forward, we strive to improve the coverage and consistency of domain annotation provided by CDD. We maintain a live search system as well as an archive of pre-computed domain annotation for sequences tracked in NCBI's Entrez protein database, which can be retrieved for single sequences or in bulk. We also maintain import procedures so that CDD contains domain models and domain definitions provided by several collections available in the public domain, as well as those produced by an in-house curation effort. The curation effort aims at increasing coverage and providing finer-grained classifications of common protein domains, for which a wealth of functional and structural data has become available. CDD curation generates alignment models of representative sequence fragments, which are in agreement with domain boundaries as observed in protein 3D structure, and which model the structurally conserved cores of domain families as well as annotate conserved features. CDD can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

...read moreread less

2,821 citations

Journal Article•DOI•

CDD/SPARCLE: functional classification of proteins via subfamily domain architectures.

[...]

Aron Marchler-Bauer¹, Yu Bo¹, Lianyi Han¹, Jane He¹, Christopher J. Lanczycki¹, Shennan Lu¹, Farideh Chitsaz¹, Myra K. Derbyshire¹, Renata C. Geer¹, Noreen R. Gonzales¹, Marc Gwadz¹, David I. Hurwitz¹, Fu Lu¹, Gabriele H. Marchler¹, James S. Song¹, Narmada Thanki¹, Zhouxi Wang¹, Roxanne A. Yamashita¹, Dachuan Zhang¹, Chanjuan Zheng¹, Lewis Y. Geer¹, Stephen H. Bryant¹ - Show less +18 more•Institutions (1)

National Institutes of Health¹

04 Jan 2017-Nucleic Acids Research

TL;DR: NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints.

...read moreread less

Abstract: NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families. CDD also supports comparative analyses of protein families via conserved domain architectures, and a recent curation effort focuses on providing functional characterizations of distinct subfamily architectures using SPARCLE: Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

...read moreread less

2,052 citations

Journal Article•DOI•

CD-Search: protein domain annotations on the fly

[...]

Aron Marchler-Bauer¹, Stephen H. Bryant¹•Institutions (1)

National Institutes of Health¹

01 Jul 2004-Nucleic Acids Research

TL;DR: The Conserved Domain Search service (CD-Search), a web-based tool for the detection of structural and functional domains in protein sequences, uses BLAST(R) heuristics to provide a fast, interactive service, and searches a comprehensive collection of domain models.

...read moreread less

Abstract: We describe the Conserved Domain Search service (CD-Search), a web-based tool for the detection of structural and functional domains in protein sequences. CD-Search uses BLAST® heuristics to provide a fast, interactive service, and searches a comprehensive collection of domain models. Search results are displayed as domain architecture cartoons and pairwise alignments between the query and domain-model consensus sequences. Search results may be visualized in further detail by embedding the query sequence into multiple alignment displays and by mapping onto three-dimensional molecular graphic displays of known structures within the domain family. CD-Search can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi.

...read moreread less

1,882 citations

Journal Article•DOI•

CDD/SPARCLE: the conserved domain database in 2020

[...]

Shennan Lu¹, Jiyao Wang¹, Farideh Chitsaz¹, Myra K. Derbyshire¹, Renata C. Geer¹, Noreen R. Gonzales¹, Marc Gwadz¹, David I. Hurwitz¹, Gabriele H. Marchler¹, James S. Song¹, Narmada Thanki¹, Roxanne A. Yamashita¹, Mingzhang Yang¹, Dachuan Zhang¹, Chanjuan Zheng¹, Christopher J. Lanczycki¹, Aron Marchler-Bauer¹ - Show less +13 more•Institutions (1)

National Institutes of Health¹

08 Jan 2020-Nucleic Acids Research

TL;DR: As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research.

...read moreread less

Abstract: As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq. These architecture definitions are available via SPARCLE, the Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

...read moreread less

1,515 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Database resources of the National Center for Biotechnology Information

[...]

David L. Wheeler¹, Deanna M. Church¹, Ron Edgar¹, Scott Federhen¹, Wolfgang Helmberg¹, Thomas L. Madden¹, Joan Pontius¹, Gregory D. Schuler¹, Lynn M. Schriml¹, Edwin Sequeira¹, Tugba O. Suzek¹, Tatiana Tatusova¹, Lukas Wagner¹ - Show less +9 more•Institutions (1)

National Institutes of Health¹

01 Jan 2004-Nucleic Acids Research

TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.

...read moreread less

Abstract: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.

...read moreread less

9,604 citations

Journal Article•DOI•

Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding.

[...]

Roujian Lu¹, Xiang Zhao¹, Juan Li², Peihua Niu¹, Bo Yang³, Honglong Wu, Wenling Wang¹, Hao Song⁴, Baoying Huang¹, Na Zhu¹, Yuhai Bi⁴, Xuejun Ma¹, Faxian Zhan³, Liang Wang⁴, Tao Hu², Hong Zhou², Zhenhong Hu, Weimin Zhou¹, Li Zhao¹, Jing Chen⁵, Yao Meng¹, Ji Wang¹, Yang Lin, Jianying Yuan, Zhihao Xie, Jinmin Ma, William J. Liu¹, Dayan Wang¹, Wenbo Xu¹, Edward C. Holmes⁶, George F. Gao⁴, George F. Gao¹, Guizhen Wu¹, Weijun Chen, Weifeng Shi², Wenjie Tan⁴, Wenjie Tan¹ - Show less +33 more•Institutions (6)

Chinese Center for Disease Control and Prevention¹, Peking Union Medical College², Centers for Disease Control and Prevention³, Chinese Academy of Sciences⁴, Wenzhou Medical College⁵, University of Sydney⁶

22 Feb 2020-The Lancet

TL;DR: The phylogenetic analysis suggests that bats might be the original host of this virus, an animal sold at the seafood market in Wuhan might represent an intermediate host facilitating the emergence of the virus in humans.

...read moreread less

9,474 citations

Journal Article•DOI•

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

[...]

Weizhong Li¹, Adam Godzik¹•Institutions (1)

Sanford-Burnham Institute for Medical Research¹

01 Jul 2006-Bioinformatics

TL;DR: Cd-hit-2d compares two protein datasets and reports similar matches between them; cd- Hit-est clusters a DNA/RNA sequence database and cd- hit-est-2D compares two nucleotide datasets.

...read moreread less

Abstract: Motivation: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282--283, Bioinformatics, 18, 77--82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST. Availability: http://cd-hit.org Contact: [email protected]

...read moreread less

8,306 citations

Journal Article•DOI•

The Phyre2 web portal for protein modeling, prediction and analysis

[...]

Lawrence A. Kelley¹, Stefans Mezulis¹, Christopher M. Yates¹, Christopher M. Yates², Mark N. Wass³, Mark N. Wass¹, Michael J.E. Sternberg¹ - Show less +3 more•Institutions (3)

Imperial College London¹, University College London², University of Kent³

07 May 2015-Nature Protocols

TL;DR: An updated protocol for Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants for a user's protein sequence.

...read moreread less

Abstract: Phyre2 is a web-based tool for predicting and analyzing protein structure and function. Phyre2 uses advanced remote homology detection methods to build 3D models, predict ligand binding sites, and analyze amino acid variants in a protein sequence. Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2 . A typical structure prediction will be returned between 30 min and 2 h after submission.

...read moreread less

7,941 citations

Journal Article•DOI•

NCBI GEO: archive for functional genomics data sets—update

[...]

Tanya Barrett¹, Stephen E. Wilhite¹, Pierre Ledoux¹, Carlos Evangelista¹, Irene F. Kim¹, Maxim Tomashevsky¹, Kimberly A. Marshall¹, Katherine Phillippy¹, Patti M. Sherman¹, Michelle Holko¹, Andrey Yefanov¹, Hye Seung Lee¹, Naigong Zhang¹, Cynthia L. Robertson¹, Nadezhda Serova¹, Sean Davis¹, Alexandra Soboleva¹ - Show less +13 more•Institutions (1)

National Institutes of Health¹

27 Nov 2012-Nucleic Acids Research

TL;DR: The Gene Expression Omnibus is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community and supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable.

...read moreread less

Abstract: The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.

...read moreread less

6,683 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse