scispace - formally typeset
Search or ask a question

Showing papers on "Munich Information Center for Protein Sequences published in 2009"


Journal ArticleDOI
TL;DR: It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS, and indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS.
Abstract: Motivation: One of the important goals of biological investigation is to predict the function of unclassified gene. Although there is a rich literature on multi data source integration for gene function prediction, there is hardly any similar work in the framework of data source weighting using functional annotations of classified genes. In this investigation, we propose a new scoring framework, called biological score (BS) and incorporating data source weighting, for predicting the function of some of the unclassified yeast genes. Methods: The BS is computed by first evaluating the similarities between genes, arising from different data sources, in a common framework, and then integrating them in a linear combination style through weights. The relative weight of each data source is determined adaptively by utilizing the information on yeast gene ontology (GO)-slim process annotations of classified genes, available from Saccharomyces Genome Database (SGD). Genes are clustered by a method called K-BS, where, for each gene, a cluster comprising that gene and its K nearest neighbors is computed using the proposed score (BS). The performances of BS and K-BS are evaluated with gene annotations available from Munich Information Center for Protein Sequences (MIPS). Results: We predict the functional categories of 417 classified genes from 417 clusters with 0.98 positive predictive value using K-BS. The functional categories of 12 unclassified yeast genes are also predicted. Conclusion: Our experimental results indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS. It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS.

19 citations


Journal ArticleDOI
TL;DR: A new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins.
Abstract: Protein sequence annotation is a major challenge in the postgenomic era. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. In this work, we describe a new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we develop BAR (the Bologna Annotation Resource), a prediction server for protein functional annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/).

18 citations


Journal ArticleDOI
TL;DR: WeGAS, a Web based microbial Genome Annotation System, which provides features that include gene prediction, homology search, promoter/motif analysis, genome browsing, gene ontology analysis based on the COGs and GO, and metabolic pathway analysis with web-based interfaces.
Abstract: We have developed WeGAS, a Web based microbial Genome Annotation System, which provides features that include gene prediction, homology search, promoter/motif analysis, genome browsing, gene ontology analysis based on the COGs and GO, and metabolic pathway analysis with web-based interfaces. Most raw data and intermediate data from genome projects can be managed with the WeGAS database system, and analysis results, including information on each gene and final genome maps, are provided by its visualization modules. Especially, a pie-view browser displaying circular maps of contigs and a COG-GO combination browser are very helpful for an overview of projects. Major public microbial genome databases can be imported, searched, and browsed through the WeGAS modules. WeGAS is freely accessible via web site http://ns.smallsoft.co.kr:8051.

13 citations


Book ChapterDOI
TL;DR: With the increasing availability of plant genome sequence data, the value of comparative annotation will increase and methodologies are evolving for genome annotation and will improve in the future.
Abstract: Annotation of plant genomic sequences can be separated into structural and functional annotation. Structural annotation is the foundation of all genomics as without accurate gene models understanding gene function or evolution of genes across taxa can be impeded. Structural annotation is dependent on sensitive, specific computational programs and deep experimental evidence to identify gene features within genomic DNA. Functional annotation is highly dependent on sequence similarity to other known genes or proteins as the majority of initial "first-pass" functional annotation on a genomic scale is transitive. Coupling structural and functional annotation across genomes in a comparative manner promotes more accurate annotation as well as an understanding of gene and genome evolution. With the increasing availability of plant genome sequence data, the value of comparative annotation will increase. As with any new field, methodologies are evolving for genome annotation and will improve in the future.

6 citations



Book ChapterDOI
19 Mar 2009
TL;DR: This chapter discusses the automatic annotation and the generation of high quality gene models, the setup and execution of global proteomics experiments that are quantitative and statistically rigorous and finally add biological context to proteomics.
Abstract: Author(s): Panisko, Ellen A.; Grigoriev, Igor; Daly, Don S.; Webb-Robertson, Bobbie-Jo; Baker, Scott E. | Abstract: Biologists are awash with genomic sequence data. In large part, this is due to the rapid acceleration in the generation of DNA sequence that occurred as public and private research institutes raced to sequence the human genome. In parallel with the large human genome effort, mostly smaller genomes of other important model organisms were sequenced. Projects following on these initial efforts have made use of technological advances and the DNA sequencing infrastructure that was built for the human and other organism genome projects. As a result, the genome sequences of many organisms are available in high quality draft form.While in many ways this is good news, there are limitations to the biological insights that can be gleaned from DNA sequences alone; genome sequences offer only a bird's eye view of the biological processes endemic to an organism or community. Fortunately, the genome sequences now being produced at such a high rate can serve as the foundation for other global experimental platforms such as proteomics. Proteomic methods offer a snapshot of the proteins present at a point in time for a given biological sample. Current global proteomics methods combine enzymatic digestion, separations, mass spectrometry and database searching for peptide identification. One key aspect of proteomics is the prediction of peptide sequences from mass spectrometry data. Global proteomic analysis uses computational matching of experimental mass spectra with predicted spectra based on databases of gene models that are often generated computationally. Thus, the quality of gene models predicted from a genome sequence is crucial in the generation of high quality peptide identifications. Once peptides are identified they can be assigned to their parent protein. Proteins identified as expressed in a given experiment are most useful when compared to other expressed proteins in a larger biological context or biochemical pathway. In this chapter we will discuss the automatic annotation and the generation of high quality gene models, the setup and execution of global proteomics experiments that are quantitative and statistically rigorous and finally add biological context to proteomics.

1 citations