scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Sparse nonnegative matrix factorization using ℓ 0 -constraints

07 Oct 2010-pp 83-88
TL;DR: In this article, the authors propose two NMF algorithms with l0-sparseness constraints on the bases and the coefficient matrices, respectively, and compare their results to sparse NMF and nonnegative K-SVD.
Abstract: Although nonnegative matrix factorization (NMF) favors a part-based and sparse representation of its input, there is no guarantee for this behavior. Several extensions to NMF have been proposed in order to introduce sparseness via the l1-norm, while little work is done using the more natural sparseness measure, the l0-pseudo-norm. In this work we propose two NMF algorithms with l0-sparseness constraints on the bases and the coefficient matrices, respectively. We show that classic NMF [1] is a suited tool for l0-sparse NMF algorithms, due to a property we call sparseness maintenance. We apply our algorithms to synthetic and real-world data and compare our results to sparse NMF [2] and nonnegative K-SVD [3].
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a framework for approximate NMF which constrains the ℓ0-norm of the basis matrix, or the coefficient matrix, respectively, and demonstrates the benefits of these methods, which compare to, or outperform existing approaches.

158 citations

Posted Content
TL;DR: In this paper, the authors consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which they give explicit convergence rates and demonstrate excellent empirical performance.
Abstract: Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance In particular, we derive sublinear ($\mathcal{O}(1/t)$) convergence on general smooth and convex objectives, and linear convergence ($\mathcal{O}(e^{-t})$) on strongly convex objectives, in both cases for general sets of atoms Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings

18 citations

Journal ArticleDOI
TL;DR: A novel approach is presented that differs from classical clustering methods, such as seminonnegative matrix factorization, K-EVD, or k-means clustering, yet combines some aspects of all these, and is shown here to outperform common competitors in terms of clustering performance and/or computation speed.
Abstract: This paper deals with a clustering problem where feature vectors are clustered depending on the angle between feature vectors, that is, feature vectors are grouped together if they point roughly in the same direction. This directional distance measure arises in several applications, including document classification and human brain imaging. Using ideas from the field of constrained low-rank matrix factorization and sparse approximation, a novel approach is presented that differs from classical clustering methods, such as seminonnegative matrix factorization, $K$ -EVD, or $k$ -means clustering, yet combines some aspects of all these. As in nonnegative matrix factorization and $K$ -EVD, the matrix decomposition is iteratively refined to optimize a data fidelity term; however, no positivity constraint is enforced directly nor do we need to explicitly compute eigenvectors. As in $k$ -means and $K$ -EVD, each optimization step is followed by a hard cluster assignment. This leads to an efficient algorithm that is shown here to outperform common competitors in terms of clustering performance and/or computation speed. In addition to a detailed theoretical analysis of some of the algorithm’s main properties, the approach is empirically evaluated on a range of toy problems, several standard text clustering data sets, and a high-dimensional problem in brain imaging, where functional magnetic resonance imaging data are used to partition the human cerebral cortex into distinct functional regions.

13 citations

Dissertation
04 Jun 2014
TL;DR: This thesis designs novel SR algorithms, with new upscaling and dictionary construction procedures, and compares them to state-of-the-art methods, and shows that, in specific cases, SR can also be an effective tool for video compression, thus opening new interesting perspectives.
Abstract: With super-resolution (SR) we refer to a class of techniques that enhance the spatial resolution of images and videos. SR algorithms can be of two kinds: multi-frame methods, where multiple low-resolution images are aggregated to form a unique high-resolution image, and single-image methods, that aim at upscaling a single image. This thesis focuses on developing theory and algorithms for the single-image SR problem. In particular, we adopt the so called example-based approach, where the output image is estimated with machine learning techniques, by using the information contained in a dictionary of image “examples”. The examples consist in image patches, which are either extracted from external images or derived from the input image itself. For both kinds of dictionary, we design novel SR algorithms, with new upscaling and dictionary construction procedures, and compare them to state-of-the-art methods. The results achieved are shown to be very competitive both in terms of visual quality of the super-resolved images and computational complexity. We then apply our designed algorithms to the video upscaling case, where the goal is to enlarge the resolution of an entire video sequence. The algorithms, opportunely adapted to deal with this case, are also analyzed in the coding context. The analysis conducted shows that, in specific cases, SR can also be an effective tool for video compression, thus opening new interesting perspectives.

11 citations

Journal ArticleDOI
TL;DR: This paper develops a so-called parallel coordinate descent dictionary learning (PCDDL) algorithm, which is structured by iteratively solving the two optimal problems, the learning process of the dictionary and the estimatingprocess of the coefficients for constructing the signals.
Abstract: Sparse representation of signals via an overcomplete dictionary has recently received much attention as it has produced promising results in various applications. Since the nonnegativities of the signals and the dictionary are required in some applications, for example, multispectral data analysis, the conventional dictionary learning methods imposed simply with nonnegativity may become inapplicable. In this paper, we propose a novel method for learning a nonnegative, overcomplete dictionary for such a case. This is accomplished by posing the sparse representation of nonnegative signals as a problem of nonnegative matrix factorization (NMF) with a sparsity constraint. By employing the coordinate descent strategy for optimization and extending it to multivariable case for processing in parallel, we develop a so-called parallel coordinate descent dictionary learning (PCDDL) algorithm, which is structured by iteratively solving the two optimal problems, the learning process of the dictionary and the estimating process of the coefficients for constructing the signals. Numerical experiments demonstrate that the proposed algorithm performs better than the conventional nonnegative K-SVD (NN-KSVD) algorithm and several other algorithms for comparison. What is more, its computational consumption is remarkably lower than that of the compared algorithms.

10 citations

References
More filters
Journal ArticleDOI
21 Oct 1999-Nature
TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.
Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

11,500 citations

Journal ArticleDOI
TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

9,950 citations

01 Jan 1999
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

9,604 citations

Journal ArticleDOI
TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Abstract: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions. These waveforms are chosen in order to best match the signal structures. Matching pursuits are general procedures to compute adaptive signal representations. With a dictionary of Gabor functions a matching pursuit defines an adaptive time-frequency transform. They derive a signal energy distribution in the time-frequency plane, which does not include interference terms, unlike Wigner and Cohen class distributions. A matching pursuit isolates the signal structures that are coherent with respect to a given dictionary. An application to pattern extraction from noisy signals is described. They compare a matching pursuit decomposition with a signal expansion over an optimized wavepacket orthonormal basis, selected with the algorithm of Coifman and Wickerhauser see (IEEE Trans. Informat. Theory, vol. 38, Mar. 1992). >

9,380 citations

Journal ArticleDOI
TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

8,905 citations