scispace - formally typeset
Search or ask a question

Showing papers on "Munich Information Center for Protein Sequences published in 2002"


Proceedings ArticleDOI
14 Aug 2002
TL;DR: A novel approach is developed that applies the theory of Markov random fields to infer a protein's functions using protein-protein interaction data and the functional annotations of its interaction protein partners to outperforms other available methods for function prediction based on protein interaction data.
Abstract: Assigning functions to novel proteins is one of the most important problems in the post-genomic era. We develop a novel approach that applies the theory of Markov random fields to infer a protein's functions using protein-protein interaction data and the functional annotations of its interaction protein partners. For each function of interest and a protein, we predict the probability that the protein has that function using Bayesian approaches. Unlike in other available approaches for protein annotation where a protein has or does not have a function of interest, we give a probability for having the function. This probability indicates how confident we are about the prediction. We apply our method to predict cellular functions (43 categories including a category "others") for yeast proteins defined in the Yeast Proteome Database, using the protein-protein interaction data from the Munich Information Center for Protein Sequences. We show that our approach outperforms other available methods for function prediction based on protein interaction data.

270 citations


Journal ArticleDOI
TL;DR: A bibliography submission system is developed for scientists to submit, categorize and retrieve literature information, and a non-redundant reference protein database, PIR-NREF is introduced.
Abstract: The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).

223 citations


Journal ArticleDOI
TL;DR: The process of annotating a previously annotated genome sequence as 're-annotation', and the strengths and weaknesses of current manual and automatic genome-wide re-ANNotation approaches are examined.
Abstract: Annotation, the process by which structural or functional information is inferred for genes or proteins, is crucial for obtaining value from genome sequences. We define the process of annotating a previously annotated genome sequence as 're-annotation', and examine the strengths and weaknesses of current manual and automatic genome-wide re-annotation approaches.

98 citations


Journal ArticleDOI
TL;DR: The results indicate that post-genomic technologies are providing rich new information for nearly all yeast genes, but data from these experiments is scattered across many Web sites and the results fromThese experiments are poorly integrated with other forms of yeast knowledge.
Abstract: Since the completion of the yeast genome sequence in 1996, three genomic databases, the Saccharomyces Genome Database, the Yeast Proteome Database, and MIPS (produced by the Munich Information Center for Protein Sequences), have organized published knowledge of yeast genes and proteins onto the framework of the genome. Now, post-genomic technologies are producing large-scale datasets of many types, and these pose new challenges for knowledge integration. This review first examines the structure and content of the three genomic databases, and then draws from them and other resources to examine the ways knowledge from the literature, genome, and post-genomic experiments is stored, integrated, and disseminated. To better understand the impact of post-genomic technologies, 20 collections of post-genomic data were analyzed relative to a set of 243 previously uncharacterized genes. The results indicate that post-genomic technologies are providing rich new information for nearly all yeast genes, but data from these experiments is scattered across many Web sites and the results from these experiments are poorly integrated with other forms of yeast knowledge. Goals for the next generation of databases are set forth which could lead to better access to yeast knowledge for yeast researchers and the entire scientific community.

24 citations


Journal ArticleDOI
TL;DR: An insider look at one typical on-line genomic resource -- the yeast genome database hosted at the Munich Information Center for Protein Sequences -- is taken and how and why it has evolved from a basic sequence repository to a multidomain knowledge base is explained.
Abstract: The review begins by providing a brief typology of biological databases on the Internet, illustrated by examples of the most influential resources of each kind. We then take an insider look at one typical on-line genomic resource – the yeast genome database hosted at the Munich Information Center for Protein Sequences (MIPS) – and explain how and why it has evolved from a basic sequence repository to a multidomain knowledge base. The role of community efforts in curating and annotating genome data is discussed. The crucial role of data integration and interoperability in developing next-generation genomic facilities is underscored.

8 citations


Book ChapterDOI
01 Jan 2002
TL;DR: I introduce and review proposals for the experimental design and pattern recognition problems of gene expression experiments, the supervised learning or classification problem, the unsupervised learning or clustering problem and the potential of improving prognostic models, and suggest boostrap methods to estimate the stability of the used hierarchical clustering.
Abstract: The availability of microarray data caused a new interest in clustering and classification methods. DNA microarrays are likely to play an important role for diagnosis and prognosis in clinical practice. Using the example of gene expression of diffuse large B-cell lymphona I introduce and review proposals for the experimental design and pattern recognition problems of gene expression experiments, the supervised learning or classification problem, the unsupervised learning or clustering problem and the potential of improving prognostic models. Moreover, I suggest boostrap methods to estimate the stability of the used hierarchical clustering. The proposal is applied to prognostic factors by micro array data for censored survival data.

2 citations