scispace - formally typeset
Journal ArticleDOI

True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction

Giorgio Valentini
- 01 May 2011 - 
- Vol. 8, Iss: 3, pp 832-847
Reads0
Chats0
TLDR
Cross-validated results with the model organism S. Crevisiae, using seven different sources of biomolecular data, and a theoretical analysis of the the TPR algorithm show the effectiveness and the drawbacks of the proposed approach.
Abstract
Gene function prediction is a complex computational problem, characterized by several items: the number of functional classes is large, and a gene may belong to multiple classes; functional classes are structured according to a hierarchy; classes are usually unbalanced, with more negative than positive examples; class labels can be uncertain and the annotations largely incomplete; to improve the predictions, multiple sources of data need to be properly integrated. In this contribution, we focus on the first three items, and, in particular, on the development of a new method for the hierarchical genome-wide and ontology-wide gene function prediction. The proposed algorithm is inspired by the “true path rule” (TPR) that governs both the Gene Ontology and FunCat taxonomies. According to this rule, the proposed TPR ensemble method is characterized by a two-way asymmetric flow of information that traverses the graph-structured ensemble: positive predictions for a node influence in a recursive way its ancestors, while negative predictions influence its offsprings. Cross-validated results with the model organism S. Crevisiae, using seven different sources of biomolecular data, and a theoretical analysis of the the TPR algorithm show the effectiveness and the drawbacks of the proposed approach.

read more

Citations
More filters
Journal ArticleDOI

Predicting protein functions using incomplete hierarchical labels

TL;DR: The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels and is shown to outperform other related techniques in replenishing the missing labels and in predicting the functions of completely unlabeled proteins on publicly available PPI datasets annotated with MIPS Functional Catalogue and Gene Ontology labels.
Proceedings Article

Hierarchical Multi-Label Classification Networks

TL;DR: A novel neural network architectures for HMC called HMCN is proposed, capable of simultaneously optimizing local and global loss functions for discovering local hierarchical class-relationships and global information from the entire class hierarchy while penalizing hierarchical violations.
Journal ArticleDOI

Hierarchical multi-label classification using local neural networks

TL;DR: A new local-based classification method that incrementally trains a multi-layer perceptron for each level of the classification hierarchy, and obtains competitive results to a robust global method regarding both precision and recall evaluation measures.
Book ChapterDOI

Ensemble methods : a review

TL;DR: Ensemble methods: a review 3 Matteo Re and Giorgio Valentini 1.1 Ensemble methods : a review
Journal ArticleDOI

Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference

TL;DR: The experiments show that key factors for the success of hierarchical ensemble methods are the integration and synergy among multilabel hierarchical, data fusion, and cost-sensitive approaches, as well as the strategy of selecting negative examples.
References
More filters
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI

The Pfam protein families database

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Journal ArticleDOI

The Elements of Statistical Learning

Eric R. Ziegel
- 01 Aug 2003 - 
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Journal Article

Statistical Comparisons of Classifiers over Multiple Data Sets

TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
Related Papers (5)