Combining Multisource Information Through Functional-Annotation-Based Weighting: Gene Function Prediction in Yeast
Reads0
Chats0
TLDR
It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS, and indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS.Abstract:
Motivation: One of the important goals of biological investigation is to predict the function of unclassified gene. Although there is a rich literature on multi data source integration for gene function prediction, there is hardly any similar work in the framework of data source weighting using functional annotations of classified genes. In this investigation, we propose a new scoring framework, called biological score (BS) and incorporating data source weighting, for predicting the function of some of the unclassified yeast genes. Methods: The BS is computed by first evaluating the similarities between genes, arising from different data sources, in a common framework, and then integrating them in a linear combination style through weights. The relative weight of each data source is determined adaptively by utilizing the information on yeast gene ontology (GO)-slim process annotations of classified genes, available from Saccharomyces Genome Database (SGD). Genes are clustered by a method called K-BS, where, for each gene, a cluster comprising that gene and its K nearest neighbors is computed using the proposed score (BS). The performances of BS and K-BS are evaluated with gene annotations available from Munich Information Center for Protein Sequences (MIPS). Results: We predict the functional categories of 417 classified genes from 417 clusters with 0.98 positive predictive value using K-BS. The functional categories of 12 unclassified yeast genes are also predicted. Conclusion: Our experimental results indicate that considering multiple data sources and estimating their weights with annotations of classified genes can considerably enhance the performance of BS. It has been found that even a small proportion of annotated genes can provide improvements in finding true positive gene pairs using BS.read more
Citations
More filters
Journal ArticleDOI
A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks
TL;DR: A novel methodology that aggregates scale-free properties to a classical low-cost feature selection method, known as Sequential Floating Forward Selection (SFFS), for guiding the inference task and provides smaller estimation errors than those obtained without guiding the SFFS application by the scale- free model, thus maintaining the robustness of the S FFS method.
Journal ArticleDOI
Entropic Biological Score: a cell cycle investigation for GRNs inference.
TL;DR: A new GRNs inference methodology, called Entropic Biological Score (EBS), which linearly combines the mean conditional entropy from expression levels and a Biological Score, obtained by integrating different biological data sources, is proposed.
Proceedings Article
Secure data integration systems
TL;DR: This research proposes a novel framework, called SecureDIS, to mitigate data leakage threats in Data Integration Systems (DIS), and helps software engineers to lessenData leakage threats during the early phases of DIS development.
Journal ArticleDOI
Assessing the gain of biological data integration in gene networks inference
Fábio Fernandes da Rocha Vicente,Fábio Fernandes da Rocha Vicente,Fabrício Martins Lopes,Ronaldo Fumio Hashimoto,Roberto M. Cesar +4 more
TL;DR: A first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism shows that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG.
Journal ArticleDOI
A Weighted Power Framework for Integrating Multisource Information: Gene Function Prediction in Yeast
TL;DR: This study proposes a weighted power scoring framework, called weighted power biological score (WPBS), for combining different biological data sources and predicting the function of some of the unclassified yeast Saccharomyces cerevisiae genes.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI
The Universal Protein Resource (UniProt)
Amos Marc Bairoch,Rolf Apweiler,Cathy H. Wu,Winona C. Barker,Brigitte Boeckmann,Serenella Ferro,Elisabeth Gasteiger,Hongzhan Huang,Rodrigo Lopez,Michele Magrane,Maria Jesus Martin,Darren A. Natale,Claire O'Donovan,Nicole Redaschi,Lai-Su L. Yeh +14 more
TL;DR: During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; the UniProt keyword list got augmented by additional keywords; the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications.
Journal ArticleDOI
From genomics to chemical genomics: new developments in KEGG
Minoru Kanehisa,Susumu Goto,Masahiro Hattori,Kiyoko F. Aoki-Kinoshita,Masumi Itoh,Shuichi Kawashima,Toshiaki Katayama,Michihiro Araki,Mika Hirakawa +8 more
TL;DR: The scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules, and RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions.
Journal ArticleDOI
Comparative assessment of large-scale data sets of protein-protein interactions.
Christian von Mering,Roland Krause,Berend Snel,Michael Cornell,Stephen G. Oliver,Stanley Fields,Peer Bork +6 more
TL;DR: Comprehensive protein–protein interaction maps promise to reveal many aspects of the complex regulatory network underlying cellular function and are compared with each other and with a reference set of previously reported protein interactions.