scispace - formally typeset
Open AccessJournal ArticleDOI

Combining Multisource Information Through Functional-Annotation-Based Weighting: Gene Function Prediction in Yeast

Reads0
Chats0
TLDR
It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS, and indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS.
Abstract
Motivation: One of the important goals of biological investigation is to predict the function of unclassified gene. Although there is a rich literature on multi data source integration for gene function prediction, there is hardly any similar work in the framework of data source weighting using functional annotations of classified genes. In this investigation, we propose a new scoring framework, called biological score (BS) and incorporating data source weighting, for predicting the function of some of the unclassified yeast genes. Methods: The BS is computed by first evaluating the similarities between genes, arising from different data sources, in a common framework, and then integrating them in a linear combination style through weights. The relative weight of each data source is determined adaptively by utilizing the information on yeast gene ontology (GO)-slim process annotations of classified genes, available from Saccharomyces Genome Database (SGD). Genes are clustered by a method called K-BS, where, for each gene, a cluster comprising that gene and its K nearest neighbors is computed using the proposed score (BS). The performances of BS and K-BS are evaluated with gene annotations available from Munich Information Center for Protein Sequences (MIPS). Results: We predict the functional categories of 417 classified genes from 417 clusters with 0.98 positive predictive value using K-BS. The functional categories of 12 unclassified yeast genes are also predicted. Conclusion: Our experimental results indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS. It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS.

read more

Citations
More filters
Journal ArticleDOI

A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks

TL;DR: A novel methodology that aggregates scale-free properties to a classical low-cost feature selection method, known as Sequential Floating Forward Selection (SFFS), for guiding the inference task and provides smaller estimation errors than those obtained without guiding the SFFS application by the scale- free model, thus maintaining the robustness of the S FFS method.
Journal ArticleDOI

Entropic Biological Score: a cell cycle investigation for GRNs inference.

TL;DR: A new GRNs inference methodology, called Entropic Biological Score (EBS), which linearly combines the mean conditional entropy from expression levels and a Biological Score, obtained by integrating different biological data sources, is proposed.
Proceedings Article

Secure data integration systems

TL;DR: This research proposes a novel framework, called SecureDIS, to mitigate data leakage threats in Data Integration Systems (DIS), and helps software engineers to lessenData leakage threats during the early phases of DIS development.
Journal ArticleDOI

Assessing the gain of biological data integration in gene networks inference

TL;DR: A first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism shows that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG.
Journal ArticleDOI

A Weighted Power Framework for Integrating Multisource Information: Gene Function Prediction in Yeast

TL;DR: This study proposes a weighted power scoring framework, called weighted power biological score (WPBS), for combining different biological data sources and predicting the function of some of the unclassified yeast Saccharomyces cerevisiae genes.
References
More filters
Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI

The Universal Protein Resource (UniProt)

TL;DR: During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; the UniProt keyword list got augmented by additional keywords; the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications.
Journal ArticleDOI

From genomics to chemical genomics: new developments in KEGG

TL;DR: The scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules, and RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions.
Journal ArticleDOI

Comparative assessment of large-scale data sets of protein-protein interactions.

TL;DR: Comprehensive protein–protein interaction maps promise to reveal many aspects of the complex regulatory network underlying cellular function and are compared with each other and with a reference set of previously reported protein interactions.
Related Papers (5)