Sparse nonnegative matrix factorization using ℓ 0 -constraints

doi:10.1109/MLSP.2010.5589219

Home
/
Papers
/
Sparse nonnegative matrix factorization using ℓ 0 -constraints

Proceedings Article•DOI•

Sparse nonnegative matrix factorization using ℓ 0 -constraints

Robert Peharz¹, Michael Stark¹, Franz Pernkopf¹•Institutions (1)

Graz University of Technology¹

07 Oct 2010-pp 83-88

TL;DR: In this article, the authors propose two NMF algorithms with l0-sparseness constraints on the bases and the coefficient matrices, respectively, and compare their results to sparse NMF and nonnegative K-SVD.

read less

Abstract: Although nonnegative matrix factorization (NMF) favors a part-based and sparse representation of its input, there is no guarantee for this behavior. Several extensions to NMF have been proposed in order to introduce sparseness via the l1-norm, while little work is done using the more natural sparseness measure, the l0-pseudo-norm. In this work we propose two NMF algorithms with l0-sparseness constraints on the bases and the coefficient matrices, respectively. We show that classic NMF [1] is a suited tool for l0-sparse NMF algorithms, due to a property we call sparseness maintenance. We apply our algorithms to synthetic and real-world data and compare our results to sparse NMF [2] and nonnegative K-SVD [3].

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Sparse nonnegative matrix factorization with ℓ 0 -constraints

[...]

Robert Peharz¹, Franz Pernkopf¹•Institutions (1)

Graz University of Technology¹

01 Mar 2012-Neurocomputing

TL;DR: This paper proposes a framework for approximate NMF which constrains the ℓ0-norm of the basis matrix, or the coefficient matrix, respectively, and demonstrates the benefits of these methods, which compare to, or outperform existing approaches.

...read moreread less

158 citations

Posted Content•

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

[...]

Francesco Locatello¹, Michael Tschannen², Gunnar Rätsch³, Martin Jaggi³•Institutions (3)

Max Planck Society¹, ETH Zurich², École Polytechnique Fédérale de Lausanne³

31 May 2017-arXiv: Learning

TL;DR: In this paper, the authors consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which they give explicit convergence rates and demonstrate excellent empirical performance.

...read moreread less

Abstract: Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance In particular, we derive sublinear ($\mathcal{O}(1/t)$) convergence on general smooth and convex objectives, and linear convergence ($\mathcal{O}(e^{-t})$) on strongly convex objectives, in both cases for general sets of atoms Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings

...read moreread less

18 citations

Journal Article•DOI•

Directional Clustering Through Matrix Factorization

[...]

Thomas Blumensath¹•Institutions (1)

University of Southampton¹

07 Jan 2016-IEEE Transactions on Neural Networks

TL;DR: A novel approach is presented that differs from classical clustering methods, such as seminonnegative matrix factorization, K-EVD, or k-means clustering, yet combines some aspects of all these, and is shown here to outperform common competitors in terms of clustering performance and/or computation speed.

...read moreread less

Abstract: This paper deals with a clustering problem where feature vectors are clustered depending on the angle between feature vectors, that is, feature vectors are grouped together if they point roughly in the same direction. This directional distance measure arises in several applications, including document classification and human brain imaging. Using ideas from the field of constrained low-rank matrix factorization and sparse approximation, a novel approach is presented that differs from classical clustering methods, such as seminonnegative matrix factorization, $K$ -EVD, or $k$ -means clustering, yet combines some aspects of all these. As in nonnegative matrix factorization and $K$ -EVD, the matrix decomposition is iteratively refined to optimize a data fidelity term; however, no positivity constraint is enforced directly nor do we need to explicitly compute eigenvectors. As in $k$ -means and $K$ -EVD, each optimization step is followed by a hard cluster assignment. This leads to an efficient algorithm that is shown here to outperform common competitors in terms of clustering performance and/or computation speed. In addition to a detailed theoretical analysis of some of the algorithm’s main properties, the approach is empirically evaluated on a range of toy problems, several standard text clustering data sets, and a high-dimensional problem in brain imaging, where functional magnetic resonance imaging data are used to partition the human cerebral cortex into distinct functional regions.

...read moreread less

13 citations

Dissertation•

Algorithms for super-resolution of images and videos based on learning methods

[...]

Marco Bevilacqua

04 Jun 2014

TL;DR: This thesis designs novel SR algorithms, with new upscaling and dictionary construction procedures, and compares them to state-of-the-art methods, and shows that, in specific cases, SR can also be an effective tool for video compression, thus opening new interesting perspectives.

...read moreread less

Abstract: With super-resolution (SR) we refer to a class of techniques that enhance the spatial resolution of images and videos. SR algorithms can be of two kinds: multi-frame methods, where multiple low-resolution images are aggregated to form a unique high-resolution image, and single-image methods, that aim at upscaling a single image. This thesis focuses on developing theory and algorithms for the single-image SR problem. In particular, we adopt the so called example-based approach, where the output image is estimated with machine learning techniques, by using the information contained in a dictionary of image “examples”. The examples consist in image patches, which are either extracted from external images or derived from the input image itself. For both kinds of dictionary, we design novel SR algorithms, with new upscaling and dictionary construction procedures, and compare them to state-of-the-art methods. The results achieved are shown to be very competitive both in terms of visual quality of the super-resolved images and computational complexity. We then apply our designed algorithms to the video upscaling case, where the goal is to enlarge the resolution of an entire video sequence. The algorithms, opportunely adapted to deal with this case, are also analyzed in the coding context. The analysis conducted shows that, in specific cases, SR can also be an effective tool for video compression, thus opening new interesting perspectives.

...read moreread less

11 citations

Journal Article•DOI•

Dictionary Learning Based on Nonnegative Matrix Factorization Using Parallel Coordinate Descent

[...]

Zunyi Tang, Shuxue Ding, Zhenni Li, Linlin Jiang

04 Jun 2013-Abstract and Applied Analysis

TL;DR: This paper develops a so-called parallel coordinate descent dictionary learning (PCDDL) algorithm, which is structured by iteratively solving the two optimal problems, the learning process of the dictionary and the estimatingprocess of the coefficients for constructing the signals.

...read moreread less

Abstract: Sparse representation of signals via an overcomplete dictionary has recently received much attention as it has produced promising results in various applications. Since the nonnegativities of the signals and the dictionary are required in some applications, for example, multispectral data analysis, the conventional dictionary learning methods imposed simply with nonnegativity may become inapplicable. In this paper, we propose a novel method for learning a nonnegative, overcomplete dictionary for such a case. This is accomplished by posing the sparse representation of nonnegative signals as a problem of nonnegative matrix factorization (NMF) with a sparsity constraint. By employing the coordinate descent strategy for optimization and extending it to multivariable case for processing in parallel, we develop a so-called parallel coordinate descent dictionary learning (PCDDL) algorithm, which is structured by iteratively solving the two optimal problems, the learning process of the dictionary and the estimating process of the coefficients for constructing the signals. Numerical experiments demonstrate that the proposed algorithm performs better than the conventional nonnegative K-SVD (NN-KSVD) algorithm and several other algorithms for comparison. What is more, its computational consumption is remarkably lower than that of the compared algorithms.

...read moreread less

10 citations

1
2
3
4
…
5

References

PDF

Open Access

More filters

Journal Article•DOI•

Learning the parts of objects by non-negative matrix factorization

[...]

Daniel D. Lee¹, H. Sebastian Seung¹, H. Sebastian Seung²•Institutions (2)

Alcatel-Lucent¹, Massachusetts Institute of Technology²

21 Oct 1999-Nature

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

...read moreread less

Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

...read moreread less

11,500 citations

Journal Article•DOI•

Atomic Decomposition by Basis Pursuit

[...]

Scott Chen¹, David L. Donoho², Michael A. Saunders²•Institutions (2)

Renaissance Technologies¹, Stanford University²

11 Dec 1998-SIAM Journal on Scientific Computing

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.

...read moreread less

Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

...read moreread less

9,950 citations

Learning parts of objects by non-negative matrix factorization

[...]

D. D. Lee

01 Jan 1999

TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.

...read moreread less

9,604 citations

Journal Article•DOI•

Matching pursuits with time-frequency dictionaries

[...]

Stéphane Mallat¹, Zhifeng Zhang¹•Institutions (1)

New York University¹

01 Aug 1993-IEEE Transactions on Signal Processing

TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.

...read moreread less

Abstract: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions. These waveforms are chosen in order to best match the signal structures. Matching pursuits are general procedures to compute adaptive signal representations. With a dictionary of Gabor functions a matching pursuit defines an adaptive time-frequency transform. They derive a signal energy distribution in the time-frequency plane, which does not include interference terms, unlike Wigner and Cohen class distributions. A matching pursuit isolates the signal structures that are coherent with respect to a given dictionary. An application to pattern extraction from noisy signals is described. They compare a matching pursuit decomposition with a signal expansion over an optimized wavepacket orthonormal basis, selected with the algorithm of Coifman and Wickerhauser see (IEEE Trans. Informat. Theory, vol. 38, Mar. 1992). >

...read moreread less

9,380 citations

Journal Article•DOI•

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

[...]

Michal Aharon¹, Michael Elad¹, Alfred M. Bruckstein¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Nov 2006-IEEE Transactions on Signal Processing

TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.

...read moreread less

Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

...read moreread less

8,905 citations