scispace - formally typeset
Search or ask a question
Topic

Protein Data Bank

About: Protein Data Bank is a research topic. Over the lifetime, 2265 publications have been published within this topic receiving 174153 citations. The topic is also known as: PDB.


Papers
More filters
Journal ArticleDOI
TL;DR: A series of databases that run parallel to the Protein Data Bank, used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design, are presented.
Abstract: The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.

1,024 citations

Journal ArticleDOI
TL;DR: STRIDE is a software tool for secondary structure assignment from atomic resolution protein structures that makes combined use of hydrogen bond energy and statistically derived backbone torsional angle information and is optimized to return resulting assignments in maximal agreement with crystallographers' designations.
Abstract: STRIDE is a software tool for secondary structure assignment from atomic resolution protein structures. It implements a knowledge-based algorithm that makes combined use of hydrogen bond energy and statistically derived backbone torsional angle information and is optimized to return resulting assignments in maximal agreement with crystallographers' designations. The STRIDE web server provides access to this tool and allows visualization of the secondary structure, as well as contact and Ramachandran maps for any file uploaded by the user with atomic coordinates in the Protein Data Bank (PDB) format. A searchable database of STRIDE assignments for the latest PDB release is also provided. The STRIDE server is accessible from http://webclu.bio.wzw.tum.de/stride/.

861 citations

Journal ArticleDOI
TL;DR: Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSb PDB web portals in support of a ‘Structural View of Biology.’
Abstract: The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, rcsb.org), the US data center for the global PDB archive, serves thousands of Data Depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without usage restrictions to more than 1 million rcsb.org Users worldwide and 600 000 pdb101.rcsb.org education-focused Users around the globe. PDB Data Depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy and 3D electron microscopy. PDB Data Consumers include researchers, educators and students studying Fundamental Biology, Biomedicine, Biotechnology and Energy. Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSB PDB web portals in support of a 'Structural View of Biology.'

827 citations

Journal ArticleDOI
TL;DR: Two algorithms are developed to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy and are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory.
Abstract: The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.

817 citations

Journal ArticleDOI
TL;DR: To reduce redundancy in the Protein Data Bank of 3D protein structures, which is caused by many homologous proteins in the data bank, a representative set of structures is selected to reduce time and effort in statistical analyses.
Abstract: To reduce redundancy in the Protein Data Bank of 3D protein structures, which is caused by many homologous proteins in the data bank, we have selected a representative set of structures. The selection algorithm was designed to (1) select as many nonhomologous structures as possible, and (2) to select structures of good quality. The representative set may reduce time and effort in statistical analyses.

800 citations


Network Information
Related Topics (5)
Protein structure
42.3K papers, 3M citations
91% related
Binding site
48.1K papers, 2.5M citations
86% related
Active site
28.6K papers, 1.1M citations
86% related
Peptide sequence
84.1K papers, 4.3M citations
82% related
Membrane protein
30.2K papers, 1.7M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202345
2022130
2021136
2020130
2019112
201894