Image compression with Stochastic Winner-Take-All Auto-Encoder
Summary (2 min read)
Introduction
- Image compression, sparse representations, auto-encoders, Orthogonal Matching Pursuit.
- Auto-encoders are powerful tools for reducing the dimensionality of data.
- But all image patches have the same rate and therefore different distortions due to the texture complexity variety in image patches.
- This work has been supported by the French Defense Procurement Agency (DGA).
- Therefore, during training, the WTA parameter that controls the rate is stochastically driven.
1.1. Notation
- Vectors are denoted by bold lower case letters and matrices by upper case ones.
- The authors now present their Stochastic Winner-Take-All AutoEncoder (SWTA AE) whose architecture is shown in Figure 1.
- The authors justify below two of the most critical choices for the SWTA AE architecture.
2.1. Strided convolution
- A compression algorithm must process images of various sizes.
- This imposes to train one architecture per image size.
- Each layer i ∈ J1, 4K consists in convolving the layer input with the bank of filters W(i), adding the biases b(i) and applying a mapping g(i), producing the layer output.
- For the borders of the layer input, zero-padding of width p(i) is used.
- Indeed, if the encoder contains a maxpooling layer, the locations of maximum activations selected during pooling operations must be recorded and transmitted to the corresponding unpooling layer in the decoder [12, 13].
2.2. Semi-sparse bottleneck
- The authors propose to apply a global sparse constraint that povides control over the coding cost of Z. gα only applies to the output of the convolution in the second layer involving the first 64 filters in W(2), producing the first 64 sparse feature maps in Z. Figure 1 displays these sparse feature maps in orange.
- Varying α leads to various coding costs of Z. Note that [14] uses WTA, but their WTA rule is different and gα does not apply to specific dimensions of its input tensor as this constraint is not relevant for image compression.
- The authors have noticed that, during the training in Section 4.2, SWTA AE learns to store in the last feature map a subsampled version of its input image.
2.3. Bitstream generation
- The coefficients of the non-sparse feature map in Z are uniformly quantized over 8-bits and coded with a Huffman code.
- The position along z is coded with a fixed-length code and, for each pair (x, y), the number of non-zero coefficients along z is coded with a Huffman code.
- The difference is that SWTA AE computes the sparse representation of an image by alternating convolutions and mappings whereas OMP runs an iterative decomposition of the image patches over a dictionary.
- For the sake of comparison, the authors build a variant of OMP called Winner-Take-All Orthogonal Matching Pursuit (WTA OMP).
- The support of the sparse representation of each patch has therefore been changed.
4.1. Training data extraction
- The RGB color space is transformed into YCbCr and the authors only keep the luminance channel.
- For SWTA AE, the luminance images are resized to 321×321. σ ∈ R∗+ is the mean of the standard deviation over all luminance images.
- The authors remove the DC component from each patch.
4.2. SWTA AE training
- If α is fixed during training, all the filters and the biases of SWTA AE are learned for one rate.
- This justifies the prefix “Stochastic” in SWTA AE.
- The training objective is to minimize the mean squared error between these cropped images and their reconstruction plus l2-norm weights decay.
- The authors implementation is based on Caffe [18].
4.3. Dictionary learning for WTA OMP
- Η∑ i=j ‖Zj‖0 ≤ γ × n× η (4) (4) is solved by Algorithm 2 which alternates between sparse coding steps that involve WTA OMP and dictionary updates that use stochastic gradient descent.
- Dictionary learning for WTA OMP, also known as Algorithm 2.
- For SWTA AE, the same values for m and n are used for training D via Algorithm 2. 1K-SVD code: http://www.cs.technion.ac.il/ elad/software/ 5. IMAGE COMPRESSION EXPERIMENT.
- After training in Section 4, the authors compare the rate-distortion curves of OMP, WTA OMP, SWTA AE, JPEG and JPEG2000 on test luminance images.
5.1. Image CODEC for SWTA AE
- Each input test luminance image is pre-processed similarly to the training in Section 4.1.
- The mean learned image M is interpolated to match the size of the input image.
- Then, the input image is subtracted by this interpolated mean image and divided by the learned σ.
5.2. Image CODEC for OMP and WTA OMP
- A luminance image is split into 8×8 non-overlapping patches.
- The DC component is removed from each patch.
- The DC components are uniformly quantized over 8-bits and coded with a fixed-length code.
- OMP (or WTA OMP) finds the coefficients of the sparse decompositions of the image patches over D ′ (or D).
- The non-zero coefficients are uniformly quantized over 8-bits and coded with a Huffman code while their position is coded with a fixed-length code.
5.3. Comparison of rate-distortion curves
- In the literature, there is no reference rate-distortion curve for auto-encoders.
- Furthermore, the authors compare SWTA AE with its 2JPEG and JPEG2000 code: http://www.imagemagick.org/script/index.php non-sparse Auto-Encoder counterpart (AE).
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, “ImageNet classification with deep convolutional neural networks,” in NIPS, 2012. [9].
Did you find this useful? Give us your feedback
Citations
209 citations
143 citations
Cites methods from "Image compression with Stochastic W..."
...In [23], a variant of autoencoder is proposed by using sparse...
[...]
116 citations
Additional excerpts
...on and optimized a convolutional auto-encoder with an incremental training strategy [6]. Dumas et al. incorporates a stochastic hyperparameter to control a competition mechanism between image patches [20]. As a fast-growing architecture in the field of neural network, GANs have also been proved to be effective on image compression in practice [21], [22]. In the works of [4], [7], [23], [24], a discrete...
[...]
47 citations
46 citations
Cites background from "Image compression with Stochastic W..."
...A less specific goal can be found in [122]....
[...]
References
2,601 citations
1,409 citations
"Image compression with Stochastic W..." refers background in this paper
...Max-pooling is a core component of neural networks [11] that downsamples its input representation by appling a max...
[...]
560 citations
"Image compression with Stochastic W..." refers methods in this paper
...SWTA AE is similar to Orthogonal Matching Pursuit (OMP) [15], a common algorithm for image compression using sparse representations [16]....
[...]
432 citations
420 citations
"Image compression with Stochastic W..." refers background in this paper
...In addition, recurrence, which is equivalent to scalability in image compression, is not optimal in terms of rate-distortion trade-off [6, 7]....
[...]
Related Papers (5)
Frequently Asked Questions (15)
Q2. What is the coding objective of the algorithm?
Given Γ and p ∈ N∗+, let φ be a function that randomly partitions Γ into ηp = η / p mini-batches { X(1), ...,X(ηp) } , where, for i ∈ J1, ηpK, X(i) ∈ Rm×p.
Q3. What is the code for the last feature map in Z?
The position along z is coded with a fixed-length code and, for each pair (x, y), the number of non-zero coefficients along z is coded with a Huffman code.
Q4. What is the objective of the training?
The training objective is to minimize the mean squared error between these cropped images and their reconstruction plus l2-norm weights decay.
Q5. What is the definition of a coding constraint?
Max-pooling is a core component of neural networks [11] that downsamples its input representation by appling a maxfilter to non-overlapping sub-regions.
Q6. What is the structure of the layer input?
Each layer i ∈ J1, 4K consists in convolving the layer input with the bank of filters W(i), adding the biases b(i) and applying a mapping g(i), producing the layer output.
Q7. What is the solution to the coding problem?
η∑ i=j ‖Zj‖0 ≤ γ × n× η(4)(4) is solved by Algorithm 2 which alternates between sparse coding steps that involve WTA OMP and dictionary updates that use stochastic gradient descent.
Q8. What is the code for the non-zero coefficients?
The non-zero coefficients are uniformly quantized over 8-bits and coded with a Huffman code while their position is coded with a fixed-length code.
Q9. What is the code for the WTA OMP?
for WTA OMP only, the number of non-zero coefficients of the sparse decomposition of each patch over D is coded with a Huffman code.
Q10. What is the simplest way to decompose a matrix?
For each j ∈ J1, pK,Yj = OMP (Xj ,D, k) (1) The author= fγ (Y) (2)For each j ∈ J1, pK,Zj = min z∈Rn ‖Xj −Dz‖22 st.supp (z) = supp (Ij) (3)Output: Z ∈ Rn×p.4.
Q11. What is the effect of removing a maxpooling layer?
For instance, [20] proves that removing a maxpooling layer and increasing the stride of the previous convolution, as the authors do, harms neural networks.
Q12. What is the difference between SWTA AE and WTA OMP?
CONCLUSIONS AND FUTURE WORKThe authors have shown that, SWTA AE is more adaptated to image compression than auto-encoders as it performs variable rate image compression for any size of image after a single training and provides better rate-distortion trade-offs.
Q13. Who is the author of this article?
Gary J. Sullivan, Jim M. Boyce, Ying Chen, Jens-Rainer Ohm, C. Andrew Segal, and Anthony Vetro, “Standardized extensions of high efficiency video coding (HEVC),” IEEE Journal of Selected Topics in Signal Processing, vol. 7 (6), pp. 1001–1016, December 2013.[8]
Q14. What is the definition of a vector of coefficients?
it keeps the γ × n × p coefficients with largest absolute value for the nlength sparse representation of the p patches and sets the rest to 0, see (2).
Q15. What is the coding objective of the problem?
Given Γ, k < m and γ ∈ ]0, 1[, the dictionary learning problem is formulated as (4).min D,Z1,...,Zη1η η∑ j=1 ‖Γj −DZj‖22st. ∀j ∈ J1, ηK, ‖Zj‖0 ≤ kst.