Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

doi:10.1007/S10115-009-0215-1

Journal ArticleDOI

Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Dimitrios Mavroeidis, +1 more

- 01 May 2010 -

Knowledge and Information Systems

- Vol. 23, Iss: 2, pp 243-265

Chats0

TLDR

This work proposes a novel semi-supervised spectral ordering algorithm that modifies the Laplacian matrix such that domain knowledge is taken into account and demonstrates the effectiveness of the proposed framework on the seriation of Usenet newsgroup messages.

Abstract:

Several studies have demonstrated the prospects of spectral ordering for data mining. One successful application is seriation of paleontological findings, i.e. ordering the sites of excavation, using data on mammal co-occurrences only. However, spectral ordering ignores the background knowledge that is naturally present in the domain: paleontologists can derive the ages of the sites within some accuracy. On the other hand, the age information is uncertain, so the best approach would be to combine the background knowledge with the information on mammal co-occurrences. Motivated by this kind of partial supervision we propose a novel semi-supervised spectral ordering algorithm that modifies the Laplacian matrix such that domain knowledge is taken into account. Also, it performs feature selection by discarding features that contribute most to the unwanted variability of the data in bootstrap sampling. Moreover, we demonstrate the effectiveness of the proposed framework on the seriation of Usenet newsgroup messages, where the task is to find out the underlying flow of discussion. The theoretical properties of our algorithm are thoroughly analyzed and it is demonstrated that the proposed framework enhances the stability of the spectral ordering output and induces computational gains.

Enhancing the stability and efficiency of spectral ordering with partial supervision and feature selection

Citations

Accelerating spectral clustering with partial supervision

Live and learn from mistakes: A lightweight system for document classification

Feature selection for k-means clustering stability: theoretical analysis and an algorithm

Mind the eigen-gap, or how to accelerate semi-supervised spectral learning algorithms

Combinatorial algorithms for the seriation problem

References

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

A tutorial on spectral clustering

The algebraic eigenvalue problem

Top 10 algorithms in data mining

Matrix perturbation theory

Related Papers (5)

Learning Spectral Embedding for Semi-supervised Clustering

Spectral clustering: A semi-supervised approach

Analysis of spectral clustering algorithms for community detection: the general bipartite setting

Fast semi-supervised clustering with enhanced spectral embedding

A Sampling Theory Perspective of Graph-based Semi-supervised Learning