scispace - formally typeset
Open AccessProceedings Article

Maximal Quasi-Bicliques with Balanced Noise Tolerance: Concepts and Co-clustering Applications.

Reads0
Chats0
TLDR
Noise tolerance of maximal quasi-bicliques is improved by allowing every vertex to tolerate up to the same number, or the same percentage, of missing edges to lead to a more natural interaction between the two vertex sets— a balanced most-versus-most adjacency.
Abstract
The rigid all-versus-all adjacency required by a maximal biclique for its two vertex sets is extremely vulnerable to missing data In the past, several types of quasi-bicliques have been proposed to tackle this problem, however their noise tolerance is usually unbalanced and can be very skewed In this paper, we improve the noise tolerance of maximal quasi-bicliques by allowing every vertex to tolerate up to the same number, or the same percentage, of missing edges This idea leads to a more natural interaction between the two vertex sets— a balanced most-versus-most adjacency This generalization is also non-trivial, as many large-size maximal quasi-biclique subgraphs do not contain any maximal bicliques This observation implies that direct expansion from maximal bicliques may not guarantee a complete enumeration of all maximal quasi-bicliques We present important properties of maximal quasi-bicliques such as a bounded closure property and a fixed point property to design efficient algorithms Maximal quasi-bicliques are closely related to co-clustering problems such as documents and words co-clustering, images and features coclustering, stocks and financial ratios co-clustering, etc Here, we demonstrate the usefulness of our concepts using a new application—a bioinformatics example— where prediction of true protein interactions is investigated

read more

Citations
More filters
Book ChapterDOI

Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions

TL;DR: An algorithm is developed that predicts the functions of a protein in two steps: assigning a weight to each of its level-1 and level-2 neighbours by estimating its functional similarity with the protein using the local topology of the interaction network as well as the reliability of experimental sources and scoring each function based on its weighted frequency in these neighbours.
Book ChapterDOI

A Survey of Algorithms for Dense Subgraph Discovery

TL;DR: This chapter will discuss and organize the literature on this topic effectively in order to make it much more accessible to the reader.
Journal ArticleDOI

A survey on enhanced subspace clustering

TL;DR: This survey presents enhanced approaches to subspace clustering by discussing the problems they are solving, their cluster definitions and algorithms, and the related works in high-dimensional clustering.
Journal ArticleDOI

Selecting informative subsets of sparse supermatrices increases the chance to find correct trees

TL;DR: Analysis of simulated and empirical data demonstrate that sparse supermatrices can be reduced on a formal basis outperforming the usually used simple selections of taxa and genes with high data coverage.
Proceedings ArticleDOI

Efficient (α, β)-core Computation: an Index-based Approach

TL;DR: This paper presents an efficient algorithm based on a novel index such that the algorithm runs in linear time regarding the result size and proves that the index only requires O(m) space where m is the number of edges in the bipartite graph.
References
More filters
Proceedings ArticleDOI

Co-clustering documents and words using bipartite spectral graph partitioning

TL;DR: A new spectral co-clustering algorithm is used that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings and it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitionsing problem.
Journal ArticleDOI

MIPS: a database for genomes and protein sequences

TL;DR: This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume).
Proceedings ArticleDOI

Information-theoretic co-clustering

TL;DR: This work presents an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages and demonstrates that the algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.
Journal ArticleDOI

Efficient mining of association rules using closed itemset lattices

TL;DR: Experiments showed that Close is very efficient for mining dense and/or correlated data such as census style data, and performs reasonably well for market basket style data.
Journal ArticleDOI

Topological structure analysis of the protein–protein interaction network in budding yeast

TL;DR: A spectral method derived from graph theory was introduced to uncover hidden topological structures (i.e. quasi-cliques and quasi-bipartites) of complicated protein-protein interaction networks and suggest that they consist of biologically relevant functional groups.
Related Papers (5)