scispace - formally typeset
Search or ask a question

Showing papers by "Philip E. Bourne published in 1997"


Book ChapterDOI
TL;DR: The approach described in this chapter is to extend the Crystallographic Information File (CIF) data representation used for describing small-molecule structures and associated diffraction experiments to be referred to as the “macromolecularcrystallographic information file (mmCIF).
Abstract: Publisher Summary A variety of approaches for improved scientific data representation is being explored. The approach described in this chapter, which has been developed under the auspices of the International Union of Crystallography (IUCr), is to extend the Crystallographic Information File (CIF) data representation used for describing small-molecule structures and associated diffraction experiments. This extension is referred to as the “macromolecular Crystallographic Information File (mmCIF)” and is discussed in the chapter. The chapter covers the history of mmCIF, similarities to and differences from the Protein Data Bank (PDB) format, contents of the mmCIF dictionary, and the way to represent structures using mmCIF. The mmCIF home page contains a historic description of the development of the dictionary, current versions of the dictionary in text and HyperText markup language (HTML) formats, software tools, archives of the mmCIF discussion list, and a detailed online tutorial. It is anticipated that full use of the expressive power of mmCIF will be made only when existing structure solution, refinement programs are modified to maintain mmCIF data items and software tools are developed to help prepare and use an mmCIF effectively.

138 citations



Journal ArticleDOI
TL;DR: The initial phase of the work, the data representation and query of all available macromolecular structure data, including real-time access to complex property patterns based on the amino acid sequence, is reported.
Abstract: Motivation: To provide data management tools to maintain and query efficiently experimental and derived protein data with the goal of providing new insights into structure-function relationships. The tools should be portable, extensible, and accessible locally, or via the World Wide Web, providing data that would not otherwise be available. Results: The initial phase of the work, the data representation and query of all available macromolecular structure data, including real-time access to complex property patterns based on the amino acid sequence, is reported. Protein structure data taken from the Protein Data Bank (PDB) are decomposed into native and derived elementary properties, and represented as compact indexed objects minimizing storage requirements and query time for select types of query. In addition, collections of indices representing a particular property are maintained and can be queried for specific property patterns found across the whole database. The approach is proving applicable to a wide variety of data available on specific protein families. Availability: Three resources an available using this approach, (i) The query of basic structural components and property patterns of the complete PDB is available via the World Wide Web at the URL http://www.sdsc.edu/moose. (ii) WPDB, a PC-based compressed macromolecular structure database and loader with a Microsoft Windows inteiface, is available fivm ftp:llftpsdsc.edu/publsdsdbiologyl WPDB/. (Hi) A database supporting real-time three-dimensional substructure searching will be reported elsewhere. Source code is available by contacting the authors. Contact: E-mail: {shindyal.bournej@sdsc.edu

9 citations


Proceedings Article
21 Jun 1997
TL;DR: The solution has been to define a simple domain specific language (DSL) which is added to the extensive annotation already found in the mmCIF dictionary, and the conversion process becomes part of the global dictionary and is not open to a variety of interpretations by different research groups writing code based on dictionary contents.
Abstract: The maintenance of software which uses a rapidly evolving data annotation scheme is time consuming and expensive. At the same time without current software the annotation scheme itself becomes limited and is less likely to be widely adopted. A solution to this problem has been developed for the macromolecular Crystallographic Information File (mmCIF) annotation scheme. The approach could generalized for a variety of annotation schemes used or proposed for molecular biology data. mmCIF provides a highly structured and complete annotation for describing NMR and X-ray crystallographic data and the resulting maeromolecular structures. This annotation is maintained in the mmCIF dictionary which on-rently contains over 3,200 terms. A major challenge is to maintain code for converting between mmCIF and Protein Data Bank (PDB) annotations while both continue to evolve. The solution has been to define a simple domain specific language (DSL) which added to the extensive annotation already found in the mmCIF dictionary. The DSL calls specific mapping modules for each category of data item in the mmCIF dictionary. Adding or changing the mapping between PDB and mmCIF items of data is slraighlforward since data categories (and hence mapping modules) correspond to elements of macromolecular structure familiar to the experimentalist. Each time a change is made to the macrornolecular annotation the appropriate change is made to the easily located and modifiable mapping modules. A code generator is then called which reads the mapping modules and creates a new executable for perfo~-ming the data conversion. In this way code is easily kept current by individuals with limited programming skill, but who have an understanding of macromolecular structure and details of the annotation scheme. Most important, the conversion process becomes part of the global dictionary and is not open to a variety of interpretations by different research groups writing code based on dictionary contents. Details of the DSL and code generator are provided.

2 citations