scispace - formally typeset
Search or ask a question

Showing papers by "Philip E. Bourne published in 2003"


Book ChapterDOI
TL;DR: This short introductory chapter is intended simply to introduce a sense of the progress, limitations, challenges, and likely future developments in the field of protein structure prediction through what seems to be a unique scientific process.
Abstract: This short introductory chapter is intended simply to introduce a sense of the progress, limitations, challenges, and likely future developments in the field of protein structure prediction through what seems to be a unique scientific process. CASP and CAFASP represent a direct challenge and careful assessment of a field of study that has captured the interest of many scientists. Three of the best scientists in the field and their colleagues provide a more detailed description of the field and how it is developing in Chapters 25, 26, and 27. As prediction methods have advanced the distinction between comparative modeling, fold recognition, and novel fold recognition have blurred somewhat. It is a testament to the community that as the knowledge of the algorithms evolved, World Wide Web servers providing access to these algorithms appeared. Thus, making it relatively straightforward for any investigator to apply a melting pot of methods to the prediction process. What all approaches need are more targets and a continued refinement to the evaluation process. The first need is being met in part by the PDB, which is, with depositors' approval, releasing sequences ahead of structure release (see http://www.rcsb.org/pdb/status.html). Further, the structural genomics projects are reporting their progress for all targets on a weekly basis (see http://targetdb.pdb.org/). While there is no indication that the sequences of the latter will lead to a structure, it is a rich source of targets (17,000 in October 2002). Not only do CASP and CAFASP measure progress, they help define where efforts should be directed to move the field forward. It is a testament to how far the field has come that investigators are now turning to the unknown. Although attempting to predict a structure that will appear experimentally helps improve the methods applied to structure prediction, it does not further our understanding of living systems directly. Attempts at defining the "The Most Wanted" (Abbott, 2001)--the structures most in need of prediction to help further our understanding of the biology, and the efforts to make those predictions, speak to a healthy future for the field of protein structure prediction. To the many individuals who help define the CASP and CAFASP processes, serve the community as assesors and compete in the experiments this is a tribute.

35 citations


Journal ArticleDOI
TL;DR: The apoptosis database provides functional annotation, literature references, diagrams/images, and alternative nomenclatures on a set of proteins having ‘apoptotic domains’, the distinctive domains that are often, if not exclusively, found in proteins involved in apoptosis.
Abstract: The apoptosis database is a public resource for researchers and students interested in the molecular biology of apoptosis. The resource provides functional annotation, literature references, diagrams/images, and alternative nomenclatures on a set of proteins having 'apoptotic domains'. These are the distinctive domains that are often, if not exclusively, found in proteins involved in apoptosis. The initial choice of proteins to be included is defined by apoptosis experts and bioinformatics tools. Users can browse through the web accessible lists of domains, proteins containing these domains and their associated homologs. The database can also be searched by sequence homology using basic local alignment search tool, text word matches of the annotation, and identifiers for specific records. The resource is available at http://www.apoptosis-db.org and is updated on a regular basis.

31 citations


Journal ArticleDOI
TL;DR: All the proteins of Arabidopsis thaliana are analyzed and three-dimensional structures at the level of the domain are assigned by fold recognition and threading based on a novel fold library that extends common domain classifications.
Abstract: Using an integrative genome annotation pipeline (iGAP) for proteome-wide protein structure and functional domain assignment, we analyzed all the proteins of Arabidopsis thaliana. Three-dimensional structures at the level of the domain are assigned by fold recognition and threading based on a novel fold library that extends common domain classifications. iGAP is being applied to proteins from all available proteomes as part of a comparative proteomics resource. The database is accessible from the web.

24 citations


Journal ArticleDOI
TL;DR: BioEditor is an application to enable scientists and educators to prepare and present structure annotations containing formatted text, graphics, sequence data, and interactive molecular views that bridge the gap between printed journal articles and Internet presentation formats.
Abstract: Summary: BioEditor is an application to enable scientists and educators to prepare and present structure annotations containing formatted text, graphics, sequence data, and interactive molecular views. It is intended to bridge the gap between printed journal articles and Internet presentation formats. BioEditor is relevant in the era of structural genomics, where annotation and publication could become the rate determining step in structure determination. Availability: BioEditor is available at http://bioeditor.sdsc. edu. The Web site includes the latest version of the software for Microsoft Windows, including documentation, the opportunity to submit bug reports and suggestions, example documentaries prepared with BioEditor and a repository where users can submit documentaries for posting to the site.

15 citations


Book ChapterDOI
30 May 2003
TL;DR: This chapter is written from the perspective of bioinformatics specialists who seek to fully capitalize on the promise of the Grid and who are working with computer scientists and technologists developing biological applications for the Grid.
Abstract: Computational biology is undergoing a revolution from a traditionally compute-intensive science conducted by individuals and small research groups to a high-throughput, datadriven science conducted by teams working in both academia and industry. It is this new biology as a data-driven science in the era of Grid Computing that is the subject of this chapter. This chapter is written from the perspective of bioinformatics specialists who seek to fully capitalize on the promise of the Grid and who are working with computer scientists and technologists developing biological applications for the Grid. To understand what has been developed and what is proposed for utilizing the Grid in the new biology era, it is useful to review the ‘first wave’ of computational biology application models. In the next section, we describe the first wave of computational models used for computational biology and computational chemistry to date.

13 citations


Proceedings ArticleDOI
01 Dec 2003
TL;DR: This analysis focused on temporal characteristics and target characteristics of the proteins targeted by structural genomics and how biased is the target set when compared to the PDB and to predictions across complete genomes.
Abstract: Structural genomics--large-scale macromolecular 3-dimenional structure determination--is unique in that major participants report scientific progress on a weekly basis. The target database (TargetDB) maintained by the Protein Data Bank (http://targetdb.pdb.org) reports this progress through the status of each protein sequence (target) under consideration by the major structural genomics centers worldwide. Hence, TargetDB provides a unique opportunity to analyze the potential impact that this major initiative provides to scientists interested in the sequence-structure-function-disease paradigm. Here we report such an analysis with a focus on: (i) temporal characteristics--how is the project doing and what can we expect in the future? (ii) target characteristics--what are the predicted functions of the proteins targeted by structural genomics and how biased is the target set when compared to the PDB and to predictions across complete genomes? (iii) structures solved--what are the characteristics of structures solved thus far and what do they contribute? The analysis required a more extensive database of structure predictions using different methods integrated with data from other sources. This database, associated tools and related data sources are available from http://spam.sdsc.edu.

10 citations


Book ChapterDOI
TL;DR: The SCOP database is based on onevolutionary relationships and the principles that govern their three-dimensional structure, which plays an important role in the interpretation of sequences produced by genome projects.
Abstract: Thestructureofaproteincanelucidateitsfunctionanditsevolutionaryhistory(seeChapters18, 21 and 23). Extracting this information requires knowledge of the structure and itsrelationshipswithotherproteins.Theseinturnrequireageneralknowledgeofthefoldsthatproteins adopt and detailed information about the structure of many proteins. Nearly allproteinshavestructuralsimilaritieswithotherproteins,andinmanycases,shareacommonevolutionaryorigin.Theknowledgeoftheserelationshipsmakesimportantcontributionstostructuralbioinformaticsandotherrelatedareasofscience.Further,theserelationshipswillplay an important role in the interpretation of sequences produced by genome projects. Tofacilitateunderstandingandaccesstotheinformationavailableforknownproteinstructures,Murzin,Brenner,Hubbard,andChothia(1995) haveconstructedastructuralclassificationofproteins(SCOP)database.TheSCOPdatabaseisbasedonevolutionaryrelationshipsandon the principles that govern their three-dimensional structure. It provides for each entrylinks to coordinates, images of the structure, interactive viewers, sequence data, andliterature references. The database is freely accessible on the World Wide Web (http://scop.mrc-lmb.cam.ac.uk/scop).Tounderstand the rationalebehindSCOP, webeginwith adiscussion of protein evolution from a sequence structure and functional perspective.

6 citations


Book ChapterDOI
TL;DR: The single repository for experimentally derived macromolecular structures is the Protein DataBank (PDB), which currently releases primary structure data once per week as requested by the depositor, whereupon a number of sites worldwide acquire these data via the Internet, derive additional information, and constitute a set of secondary resources.
Abstract: The single repository for experimentally derived macromolecular structures is the Protein DataBank (PDB) (Bernstein et al., 1977;Berman et al., 2000;Berman et al., 2007) described in Chapter 11. The primary data provided by the PDB are the Cartesian coordinates, occupancies, and temperature factors for the atoms in these structures. Additional information given includes literature references, author names, experimental details, links to the sequence in the sequence databases, and some limited annotation of the biological function (Chapter 10). Collated into a single entry, due to the restrictions of the PDB format, or into multiple entries for very large X-ray structures and large NMR ensembles, these data constitute a concise description of the three-dimensional form of a molecule. The PDB currently releases the primary structure data once per week as requested by the depositor, whereupon a number of sites worldwide acquire these data via the Internet, derive additional information, and constitute a set of secondary resources. Secondary resources cover features such as stereochemical quality (Table 13.1), protein structure classification (Table 13.2), protein–protein interaction data (Table 13.3), structure visualization (Table 13.4), and data on specific protein families. The secondary resources described in this chapter can beviewed as downstream of the PDB in an information flow diagram (Figure 13.1). The number of these secondary resources is growing every year and no attempt is made at a complete overview, but rather to give a synopsis from several classes of resource (Figure 13.1) of what is available. A current compendium of secondary resources is maintained by the PDB at http://www.pdb.org/pdb/static.do?p1⁄4general_information/web_links/index.html. More details on popular, well-established, structure-based databases are available in other chapters. Chapter 5 includes a description of the NMR-specific BioMagResBank resource; the Nucleic Acid Database (NDB) is described in Chapter 12; the comparative fold classification databases SCOP and CATH are described in Chapters 17 and 18, respectively; Chapter 14 includes brief descriptions of stereochemicalquality-oriented resources and

4 citations


Journal ArticleDOI
01 Oct 2003-Targets
TL;DR: A large-scale project with scientific, engineering and technological components and the potential to have a large impact on the life sciences, structural genomics is interpreted to mean a project that goes beyond the blueprint for life to the buildings defined by that blueprint — the three-dimensional protein structures.
Abstract: Structural genomics has been heralded as the follow on to the human genome project. We interpret that to mean a large-scale project with scientific, engineering and technological components and the potential to have a large impact on the life sciences. A project that goes beyond the blueprint for life to the buildings defined by that blueprint — the three-dimensional protein structures. Whereas the human genome project was relatively well defined, with the aim to sequence the three billion nucleotides comprising the human genome, what constitutes structural genomics is open to different interpretations. To some, it aims to characterize all the protein structures in a given genome — Arabidopsis thaliana, Thermotoga maritima and Mycobacterium tuberculosis are examples under scrutiny. To others, the goal of structural genomics is to provide sufficient coverage of fold space to facilitate accurate homology modeling of the majority of proteins of biological interest. As structure determination has already taught us so much about biological function when undertaken as a functionally driven initiative, undertaking structure determination in a broader genomic sense is also likely to bring significant new understanding of living systems. Further, it is likely to lead to advances in the process of structure determination, whether by X-ray crystallography or NMR. With such promise, and with some projects already in their third or fourth year, an obvious question is, how are we doing?

3 citations


Proceedings ArticleDOI
11 Aug 2003
TL;DR: A comprehensive statistical analysis of PMR mutants and makes available morph movies from PMR structure pairs, allowing visual analysis of conformational change and the ability to distinguish visually between conformational changes due to motions and mutations.
Abstract: The relationship between protein mutations and conformational change can potentially decipher the language relating sequence to structure. Elsewhere, we presented the protein mutant resource (PMR), an online tool that systematically identified related mutants in the protein databank (PDB), inferred mutant Gene Ontology classifications using data-mining, and allowed intuitive exploration of relationships between mutant structures. Here, we perform a comprehensive statistical analysis of PMR mutants. Although the PMR contains spectacular conformational changes, generally there is a counter-intuitive inverse relationship between conformational change and the number of mutations. That is, PDB mutations contrast naturally evolved mutations. We compare the frequencies of mutations in the PMR/PDB datasets against the PAM250 natural mutation frequencies to confirm this. We make available morph movies from PMR structure pairs, allowing visual analysis of conformational change and the ability to distinguish visually between conformational change due to motions (e.g., ligand binding) and mutations. The PMR is at http://pmr.sdsc.edu.

1 citations