scispace - formally typeset
Search or ask a question
Journal ArticleDOI

$rm K$ -SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

01 Nov 2006-IEEE Transactions on Signal Processing (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 54, Iss: 11, pp 4311-4322
TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data
Citations
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

6,122 citations

Journal ArticleDOI
TL;DR: This work addresses the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image, and uses the K-SVD algorithm to obtain a dictionary that describes the image content effectively.
Abstract: We address the image denoising problem, where zero-mean white and homogeneous Gaussian additive noise is to be removed from a given image. The approach taken is based on sparse and redundant representations over trained dictionaries. Using the K-SVD algorithm, we obtain a dictionary that describes the image content effectively. Two training options are considered: using the corrupted image itself, or training on a corpus of high-quality image database. Since the K-SVD is limited in handling small image patches, we extend its deployment to arbitrary image sizes by defining a global image prior that forces sparsity over patches in every location in the image. We show how such Bayesian treatment leads to a simple and effective denoising algorithm. This leads to a state-of-the-art denoising performance, equivalent and sometimes surpassing recently published leading alternative denoising methods

5,493 citations


Cites background or methods from "$rm K$ -SVD: An Algorithm for Desig..."

  • ...Indeed, we might do better by using a redundant version of the DCT,1 as practiced in [ 36 ]....

    [...]

  • ...The K-SVD proposes an iterative algorithm designed to handle the above task effectively [ 36 ], [37]....

    [...]

  • ...Instead of supplying an artificial set of examples to train on, as proposed above, one could take the patches from the corrupted image, , where . Since the K-SVD dictionary learning process has in it a noise rejection capability (see experiments reported in [ 36 ]), this seems like a natural idea....

    [...]

  • ...K-SVD algorithm [ 36 ], [37] because of its simplicity and efficiency for this task....

    [...]

  • ...This way, the value of is guaranteed to drop per an update of each dictionary atom, and along with this update, the representation coefficients change as well (see [ 36 ] and [37] for more details)....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a new approach to single-image superresolution, based upon sparse signal representation, which generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods.
Abstract: This paper presents a new approach to single-image superresolution, based upon sparse signal representation. Research on image statistics suggests that image patches can be well-represented as a sparse linear combination of elements from an appropriately chosen over-complete dictionary. Inspired by this observation, we seek a sparse representation for each patch of the low-resolution input, and then use the coefficients of this representation to generate the high-resolution output. Theoretical results from compressed sensing suggest that under mild conditions, the sparse representation can be correctly recovered from the downsampled signals. By jointly training two dictionaries for the low- and high-resolution image patches, we can enforce the similarity of sparse representations between the low-resolution and high-resolution image patch pair with respect to their own dictionaries. Therefore, the sparse representation of a low-resolution image patch can be applied with the high-resolution image patch dictionary to generate a high-resolution image patch. The learned dictionary pair is a more compact representation of the patch pairs, compared to previous approaches, which simply sample a large amount of image patch pairs , reducing the computational cost substantially. The effectiveness of such a sparsity prior is demonstrated for both general image super-resolution (SR) and the special case of face hallucination. In both cases, our algorithm generates high-resolution images that are competitive or even superior in quality to images produced by other similar SR methods. In addition, the local sparse modeling of our approach is naturally robust to noise, and therefore the proposed algorithm can handle SR with noisy inputs in a more unified framework.

4,958 citations

Book ChapterDOI
06 Sep 2014
TL;DR: This work proposes a deep learning method for single image super-resolution (SR) that directly learns an end-to-end mapping between the low/high-resolution images and shows that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) [15] that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage.

4,445 citations


Cites background from "$rm K$ -SVD: An Algorithm for Desig..."

  • ..., [1]) is to densely extract patches and then represent them by a set of pre-trained bases such as PCA, DCT, Haar, etc....

    [...]

Book ChapterDOI
24 Jun 2010
TL;DR: This paper deals with the single image scale-up problem using sparse-representation modeling, and assumes a local Sparse-Land model on image patches, serving as regularization, to recover an original image from its blurred and down-scaled noisy version.
Abstract: This paper deals with the single image scale-up problem using sparse-representation modeling. The goal is to recover an original image from its blurred and down-scaled noisy version. Since this problem is highly ill-posed, a prior is needed in order to regularize it. The literature offers various ways to address this problem, ranging from simple linear space-invariant interpolation schemes (e.g., bicubic interpolation), to spatially-adaptive and non-linear filters of various sorts. We embark from a recently-proposed successful algorithm by Yang et. al. [1,2], and similarly assume a local Sparse-Land model on image patches, serving as regularization. Several important modifications to the above-mentioned solution are introduced, and are shown to lead to improved results. These modifications include a major simplification of the overall process both in terms of the computational complexity and the algorithm architecture, using a different training approach for the dictionary-pair, and introducing the ability to operate without a training-set by boot-strapping the scale-up task from the given low-resolution image. We demonstrate the results on true images, showing both visual and PSNR improvements.

2,667 citations

References
More filters
Journal ArticleDOI
TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

9,950 citations


"$rm K$ -SVD: An Algorithm for Desig..." refers background in this paper

  • ...It is in this setting that we ask what the proper dictionary is....

    [...]

Journal ArticleDOI
TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Abstract: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions. These waveforms are chosen in order to best match the signal structures. Matching pursuits are general procedures to compute adaptive signal representations. With a dictionary of Gabor functions a matching pursuit defines an adaptive time-frequency transform. They derive a signal energy distribution in the time-frequency plane, which does not include interference terms, unlike Wigner and Cohen class distributions. A matching pursuit isolates the signal structures that are coherent with respect to a given dictionary. An application to pattern extraction from noisy signals is described. They compare a matching pursuit decomposition with a signal expansion over an optimized wavepacket orthonormal basis, selected with the algorithm of Coifman and Wickerhauser see (IEEE Trans. Informat. Theory, vol. 38, Mar. 1992). >

9,380 citations

Journal ArticleDOI
TL;DR: It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.
Abstract: We derive a new self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The algorithm does not assume any knowledge of the input distributions, and is defined here for the zero-noise limit. Under these conditions, information maximization has extra properties not found in the linear case (Linsker 1989). The nonlinearities in the transfer function are able to pick up higher-order moments of the input distributions and perform something akin to true redundancy reduction between units in the output representation. This enables the network to separate statistically independent components in the inputs: a higher-order generalization of principal components analysis. We apply the network to the source separation (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers. We also show that a variant on the network architecture is able to perform blind deconvolution (cancellation of unknown echoes and reverberation in a speech signal). Finally, we derive dependencies of information transfer on time delays. We suggest that information maximization provides a unifying framework for problems in "blind" signal processing.

9,157 citations


"$rm K$ -SVD: An Algorithm for Desig..." refers methods in this paper

  • ...There is an interesting relation between the above method and the independent component analysis (ICA) algorithm [43]....

    [...]

Book
01 Jan 1991
TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
Abstract: 1 Introduction- 11 Signals, Coding, and Compression- 12 Optimality- 13 How to Use this Book- 14 Related Reading- I Basic Tools- 2 Random Processes and Linear Systems- 21 Introduction- 22 Probability- 23 Random Variables and Vectors- 24 Random Processes- 25 Expectation- 26 Linear Systems- 27 Stationary and Ergodic Properties- 28 Useful Processes- 29 Problems- 3 Sampling- 31 Introduction- 32 Periodic Sampling- 33 Noise in Sampling- 34 Practical Sampling Schemes- 35 Sampling Jitter- 36 Multidimensional Sampling- 37 Problems- 4 Linear Prediction- 41 Introduction- 42 Elementary Estimation Theory- 43 Finite-Memory Linear Prediction- 44 Forward and Backward Prediction- 45 The Levinson-Durbin Algorithm- 46 Linear Predictor Design from Empirical Data- 47 Minimum Delay Property- 48 Predictability and Determinism- 49 Infinite Memory Linear Prediction- 410 Simulation of Random Processes- 411 Problems- II Scalar Coding- 5 Scalar Quantization I- 51 Introduction- 52 Structure of a Quantizer- 53 Measuring Quantizer Performance- 54 The Uniform Quantizer- 55 Nonuniform Quantization and Companding- 56 High Resolution: General Case- 57 Problems- 6 Scalar Quantization II- 61 Introduction- 62 Conditions for Optimality- 63 High Resolution Optimal Companding- 64 Quantizer Design Algorithms- 65 Implementation- 66 Problems- 7 Predictive Quantization- 71 Introduction- 72 Difference Quantization- 73 Closed-Loop Predictive Quantization- 74 Delta Modulation- 75 Problems- 8 Bit Allocation and Transform Coding- 81 Introduction- 82 The Problem of Bit Allocation- 83 Optimal Bit Allocation Results- 84 Integer Constrained Allocation Techniques- 85 Transform Coding- 86 Karhunen-Loeve Transform- 87 Performance Gain of Transform Coding- 88 Other Transforms- 89 Sub-band Coding- 810 Problems- 9 Entropy Coding- 91 Introduction- 92 Variable-Length Scalar Noiseless Coding- 93 Prefix Codes- 94 Huffman Coding- 95 Vector Entropy Coding- 96 Arithmetic Coding- 97 Universal and Adaptive Entropy Coding- 98 Ziv-Lempel Coding- 99 Quantization and Entropy Coding- 910 Problems- III Vector Coding- 10 Vector Quantization I- 101 Introduction- 102 Structural Properties and Characterization- 103 Measuring Vector Quantizer Performance- 104 Nearest Neighbor Quantizers- 105 Lattice Vector Quantizers- 106 High Resolution Distortion Approximations- 107 Problems- 11 Vector Quantization II- 111 Introduction- 112 Optimality Conditions for VQ- 113 Vector Quantizer Design- 114 Design Examples- 115 Problems- 12 Constrained Vector Quantization- 121 Introduction- 122 Complexity and Storage Limitations- 123 Structurally Constrained VQ- 124 Tree-Structured VQ- 125 Classified VQ- 126 Transform VQ- 127 Product Code Techniques- 128 Partitioned VQ- 129 Mean-Removed VQ- 1210 Shape-Gain VQ- 1211 Multistage VQ- 1212 Constrained Storage VQ- 1213 Hierarchical and Multiresolution VQ- 1214 Nonlinear Interpolative VQ- 1215 Lattice Codebook VQ- 1216 Fast Nearest Neighbor Encoding- 1217 Problems- 13 Predictive Vector Quantization- 131 Introduction- 132 Predictive Vector Quantization- 133 Vector Linear Prediction- 134 Predictor Design from Empirical Data- 135 Nonlinear Vector Prediction- 136 Design Examples- 137 Problems- 14 Finite-State Vector Quantization- 141 Recursive Vector Quantizers- 142 Finite-State Vector Quantizers- 143 Labeled-States and Labeled-Transitions- 144 Encoder/Decoder Design- 145 Next-State Function Design- 146 Design Examples- 147 Problems- 15 Tree and Trellis Encoding- 151 Delayed Decision Encoder- 152 Tree and Trellis Coding- 153 Decoder Design- 154 Predictive Trellis Encoders- 155 Other Design Techniques- 156 Problems- 16 Adaptive Vector Quantization- 161 Introduction- 162 Mean Adaptation- 163 Gain-Adaptive Vector Quantization- 164 Switched Codebook Adaptation- 165 Adaptive Bit Allocation- 166 Address VQ- 167 Progressive Code Vector Updating- 168 Adaptive Codebook Generation- 169 Vector Excitation Coding- 1610 Problems- 17 Variable Rate Vector Quantization- 171 Variable Rate Coding- 172 Variable Dimension VQ- 173 Alternative Approaches to Variable Rate VQ- 174 Pruned Tree-Structured VQ- 175 The Generalized BFOS Algorithm- 176 Pruned Tree-Structured VQ- 177 Entropy Coded VQ- 178 Greedy Tree Growing- 179 Design Examples- 1710 Bit Allocation Revisited- 1711 Design Algorithms- 1712 Problems

7,015 citations


"$rm K$ -SVD: An Algorithm for Desig..." refers background or methods in this paper

  • ...We then discuss some of the -SVD properties and implementation issues....

    [...]

  • ...There is a variant of the vector quantization (VQ) coding method, called gain-shape VQ, where this coefficient is allowed to vary [39]....

    [...]