scispace - formally typeset
Search or ask a question
Topic

Munich Information Center for Protein Sequences

About: Munich Information Center for Protein Sequences is a research topic. Over the lifetime, 79 publications have been published within this topic receiving 6967 citations.


Papers
More filters
01 Jan 2000
TL;DR: Improvements in genome, gene expression and proteome database mining algorithms will enable the prediction of protein function in the context of higher order processes such as the regulation of gene expression, metabolic pathways and signalling cascades and the elucidation of high-resolution structural and functional maps of the human genome.
Abstract: The Human Genome Initiative is an international research program for the creation of detailed genetic and physical maps of the human genome. Genome research projects generate enormous quantities of data. Database mining is the process of finding and extracting useful information from raw datasets. Computational genomics has identified a classification of three successive levels for the management and analysis of genetic data in scientific databases: Genomics. 1. Gene expression. 2. Proteomics. 3. Genome database mining is the identification of the protein-encoding regions of a genome and the assignment of functions to these genes on the basis of sequence similarity homologies against other genes of known function. Gene expression database mining is the identification of intrinsic patterns and relationships in transcriptional expression data generated by large-scale gene expression experiments. Proteome database mining is the identification of intrinsic patterns and relationships in translational expression data generated by large-scale proteomics experiments. Improvements in genome, gene expression and proteome database mining algorithms will enable the prediction of protein function in the context of higher order processes such as the regulation of gene expression, metabolic pathways and signalling cascades. Thus, the final objective of such higher-level functional analysis will be the elucidation of high-resolution structural and functional maps of the human genome. Contents

22 citations

Book ChapterDOI
01 Jan 2003
TL;DR: The performance of manual, automated and mixed approaches in genome annotation and ways to avoid some common pitfalls are discussed, including an issue of quantity turning into quality.
Abstract: In the preceding chapter, we gave a brief overview of the methods that are commonly used for identification of protein-coding genes and analysis of protein sequences. Here, we turn to one of the main subjects of this book, namely how these methods are applied to the task of primary analysis of genomes, which often goes under the name of “genome annotation”. Many researchers still view genome annotation as a notoriously unreliable and inaccurate process. There are excellent reasons for this opinion: genome annotation produces a considerable number of errors and some outright ridiculous “identifications” (see ♦3.1.3 and further discussion in this chapter). These errors are highly visible, even when the error rate is quite low: because of the large number of genes in most genomes, the errors are also rather numerous. Some of the problems and challenges faced by genome annotation are an issue of quantity turning into quality: an analysis that can be easily and reliably done by a qualified researcher for one or ten protein sequences becomes difficult and error-prone for the same scientist and much more so for an automated tool when the task is scaled up to 10,000 sequences. We discuss here the performance of manual, automated and mixed approaches in genome annotation and ways to avoid some common pitfalls.

21 citations

Journal ArticleDOI
TL;DR: To develop an integrated cross-species plant genome resource, this work maintains comprehensive databases for model plant genomes, including Arabidopsis (Arabidopsis thaliana), maize (Zea mays), Medicago truncatula, and rice (Oryza sativa).
Abstract: With several plant genomes sequenced, the power of comparative genome analysis can now be applied. However, genome-scale cross-species analyses are limited by the effort for data integration. To develop an integrated cross-species plant genome resource, we maintain comprehensive databases for model plant genomes, including Arabidopsis (Arabidopsis thaliana), maize (Zea mays), Medicago truncatula, and rice (Oryza sativa). Integration of data and resources is emphasized, both in house as well as with external partners and databases. Manual curation and state-of-the-art bioinformatic analysis are combined to achieve quality data. Easy access to the data is provided through Web interfaces and visualization tools, bulk downloads, and Web services for application-level access. This allows a consistent view of the model plant genomes for comparative and evolutionary studies, the transfer of knowledge between species, and the integration with functional genomics data.

21 citations

Journal ArticleDOI
TL;DR: The Seoul National University Genome Browser (SNUGB) integrates various types of genomic information derived from 98 fungal/oomycete and 34 plant and animal species, graphically presents germane features and properties of each genome, and supports comparison between genomes.
Abstract: Background Since the full genome sequences of Saccharomyces cerevisiae were released in 1996, genome sequences of over 90 fungal species have become publicly available. The heterogeneous formats of genome sequences archived in different sequencing centers hampered the integration of the data for efficient and comprehensive comparative analyses. The Comparative Fungal Genomics Platform (CFGP) was developed to archive these data via a single standardized format that can support multifaceted and integrated analyses of the data. To facilitate efficient data visualization and utilization within and across species based on the architecture of CFGP and associated databases, a new genome browser was needed.

20 citations

Journal ArticleDOI
TL;DR: It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS, and indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS.
Abstract: Motivation: One of the important goals of biological investigation is to predict the function of unclassified gene. Although there is a rich literature on multi data source integration for gene function prediction, there is hardly any similar work in the framework of data source weighting using functional annotations of classified genes. In this investigation, we propose a new scoring framework, called biological score (BS) and incorporating data source weighting, for predicting the function of some of the unclassified yeast genes. Methods: The BS is computed by first evaluating the similarities between genes, arising from different data sources, in a common framework, and then integrating them in a linear combination style through weights. The relative weight of each data source is determined adaptively by utilizing the information on yeast gene ontology (GO)-slim process annotations of classified genes, available from Saccharomyces Genome Database (SGD). Genes are clustered by a method called K-BS, where, for each gene, a cluster comprising that gene and its K nearest neighbors is computed using the proposed score (BS). The performances of BS and K-BS are evaluated with gene annotations available from Munich Information Center for Protein Sequences (MIPS). Results: We predict the functional categories of 417 classified genes from 417 clusters with 0.98 positive predictive value using K-BS. The functional categories of 12 unclassified yeast genes are also predicted. Conclusion: Our experimental results indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS. It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS.

19 citations


Network Information
Related Topics (5)
Genomics
15.4K papers, 1M citations
80% related
Genome
74.2K papers, 3.8M citations
80% related
Human genome
11.5K papers, 1M citations
78% related
Conserved sequence
12.4K papers, 887K citations
76% related
Phylogenetic tree
26.6K papers, 1.3M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20171
20161
20151
20144
20134
20121