TL;DR: The results show that this feedback induces a scale-dependent refinement strategy that gives rise to more robust and meaningful motion estimation, which may facilitate higher level sequence interpretation.
Abstract: In this paper, a multigrid motion compensation video coder based on the current human visual system (HVS) contrast discrimination models is proposed. A novel procedure for the encoding of the prediction errors has been used. This procedure restricts the maximum perceptual distortion in each transform coefficient. This subjective redundancy removal procedure includes the amplitude nonlinearities and some temporal features of human perception. A perceptually weighted control of the adaptive motion estimation algorithm has also been derived from this model. Perceptual feedback in motion estimation ensures a perceptual balance between the motion estimation effort and the redundancy removal process. The results show that this feedback induces a scale-dependent refinement strategy that gives rise to more robust and meaningful motion estimation, which may facilitate higher level sequence interpretation. Perceptually meaningful distortion measures and the reconstructed frames show the subjective improvements of the proposed scheme versus an H.263 scheme with unweighted motion estimation and MPEG-like quantization.
I N natural video sequences to be judged by human observers,two kinds of redundancies can be identified: 1)objective redundancies, related to the spatio-temporal correlations among the video samples and 2)subjective redundancies, which refer to the data that can be safely discarded without perceptual loss.
On the one hand, better motion estimation may lead to better predictions and should alleviate the task of the quantizer.
Most of the recent work on motion estimation for video coding has been focused on the adaptation of the motion estimate to agiven quantizerto obtain an good balance between these elements.
II. CONVENTIONAL TECHNIQUES FORTRANSFORMQUANTIZER DESIGN AND MULTIGRID MOTION ESTIMATION
The basic elements of a motion compensated coder are the optical flow estimation and the prediction error quantization.
The optimal quantizers (in an average error sense) may underperform on individual blocks or frames [9] even if the error measure is perceptually weighted [28]: the accumulation of quantization levels in certain regions in order to minimize the average perceptual error does not ensure good behavior on a particular block of the DFD.
The motion estimation starts at a coarse resolution (large blocks).
While the magnitude-based criteria were proposed without a specific relation to the encoding of the DFD, the entropy-based criteria make explicit use of the trade off between DVF and DFD.
The similarity between the impulse responses of the perceptual filters of the transformand the basis functions of the local frequency transforms used in image and video coding has been used to apply the experimental properties of the perceptual transform domain to the block DCT transform as a reasonable approximation [13], [14], [25], [26], [38], [39].
A. Maximum Perceptual Error (MPE) Criterion for Quantizer Design
The natural way of assessing the quality of an encoded picture (or sequence) involves a one-to-one comparison between the original and the encoded version.
The result of this comparison is related to the ability of the observer to notice the particular quantization noise in the presence of the original pattern.
Even if a perceptual weighting is used, the average criteria may bias the results.
This requirement is satisfied by a perceptually uniform distribution of the available quantization levels in the transform domain.
If the perceptual distance between levels is constant, the MPE in each component is bounded regardless of the amplitude of the input.
B. Optimal Spatial Quantizers Under the MPE Criterion
The design of a transform quantizer for a given block transform involves finding the optimal number of quantization levels for each coefficient (bit allocation) and the optimal distribution of these quantization levels in each case [27].
The CSF-based (linear MPE) quantizer used in MPEG [1]–[3] and the proposed nonlinear MPE quantizer [13], [14], [26], represent different degrees of approximation to the actual quantization process, , eventually carried out by the HVS.
The scheme that takes into account the perceptual amplitude nonlinearities will presumably be more efficient in removing the subjective redundancy from the DFD.
It is interesting to note that the reasoning about the perceptual relevance of the motion information presented here leads to (14), which takes into account the quantized DFD entropy, in a natural way.
E timation
The performance of the linear MPE (MPEG-like) quantizer and the proposed nonlinear MPE quantizers was compared using the same (perceptually unweighted H.263-like) motion estimation.
Fig. 6 shows the increase of the perceptual distortion in the different reconstructions of theRubiksequence.
The temporal filtering smooths the reconstructed sequence and reduces to some extent the remaining blocking effect and busy artifacts of the 2-D nonlinear approach.
The key factor in the improvements of the proposed quantizers is the consideration of the amplitude nonlinearities and the corresponding enlargement of the spatial quantizer bandwidth (Fig. 2).
This enlargement implies that the quantized signal keeps some significant details otherwise discarded, avoiding the rapid degradation of the reconstructed signal.
B. Experiment 2: Different Motion Estimations with the Same Quantizer
The proposed perceptually weighted variable-size BMA and the unweighted variable-size BMA were compared in this experiment using the same linear MPE (MPEG-like) quantizer.
These results show that both variable-size BMAs make some difference in quality with regard to the fixed-size H.261 BMA due to the savings in DVF information.
It has been empirically found that this error measure adequately describes the intuitive quality of the segmentation (Figs. 10 and 11 and their particular errors illustrate this point).
The following trends can be identified from the obtained results: 1) the segmentation is better when the blocks are fairly small compared to the size of the moving regions (see the variable-size BMA error results forRubik–large object- andTaxi –small objects–); however, 2) the segmentation is very sensitive to the robustness and coherence of the sparse flow.
C. Experiment 3: Relative Relevance of the Improvements in the Motion Estimation and the Quantization
To study the combined effect and the relative advantages of the proposed improvements, the four possible combinations of motion estimation and quantization algorithms were compared at a fixed bit-rate.
Fig. 12 shows an example of a reconstructed frame using the four different approaches considered.
Due to the noisy (high-frequency) nature of the error signal, wide band quantizers may be better than narrower band (CSF-based) quantizers.
The interesting constant distortion result reported in Section V-A (Fig. 6) is also reproduced here:.
While the perceptual distortion increases quickly when using the linear quantizer, it remains constant with the nonlinear MPE quantizer regardless the motion estimation algorithm.
D. Experiment 4: The Proposed Scheme Versus Previous Comparable Schemes
The considered elements of the motion compensated video coder were combined to simulate and compare previously reported schemes (MPEG-1 or H.261 and H.263) and the proposed ones.
The current motion compensated video coding standards include very basic perceptual information (linear threshold models) only in the quantizer design.
He received the Licenciado degree in physics (electricity, electronics, and computer science) in 1987 and the Ph.D. degree in pattern recognition in 1993, both from Universitat de València, València, Spain.
TL;DR: A new perceptually-adaptive video coding (PVC) scheme for hybrid video compression is explored, in order to achieve better perceptual coding quality and operational efficiency and to integrate spatial masking factors with the nonlinear additivity model for masking (NAMM).
Abstract: We explore a new perceptually-adaptive video coding (PVC) scheme for hybrid video compression, in order to achieve better perceptual coding quality and operational efficiency. A new just noticeable distortion (JND) estimator for color video is first devised in the image domain. How to efficiently integrate masking effects together is a key issue of JND modelling. We integrate spatial masking factors with the nonlinear additivity model for masking (NAMM). The JND estimator applies to all color components and accounts for the compound impact of luminance masking, texture masking and temporal masking. Extensive subjective viewing confirms that it is capable of determining a more accurate visibility threshold that is close to the actual JND bound in human eyes. Secondly, the image-domain JND profile is incorporated into hybrid video encoding via the JND-adaptive motion estimation and residue filtering process. The scheme works with any prevalent video coding standards and various motion estimation strategies. To demonstrate the effectiveness of the proposed scheme, it has been implemented in the MPEG-2 TM5 coder and demonstrated to achieve average improvement of over 18% in motion estimation efficiency, 0.6 dB in average peak signal-to perceptual-noise ratio (PSPNR) and most remarkably, 0.17 dB in the objective coding quality measure (PSNR) on average. Theoretical explanation is presented for the improvement on the objective coding quality measure. With the JND-based motion estimation and residue filtering process, hybrid video encoding can be more efficient and the use of bits is optimized for visual quality.
305 citations
Cites background or methods from "Perceptual feedback in multigrid mo..."
...A few attempts have been made to non-standard video coding [6,26]....
[...]
...In [26], a subband JND model has been used in the quantization process and also in controlling the block splitting process in variable-size motion search....
[...]
..., discrete cosine transform (DCT) or wavelet domain) [1,2,31,33–36,14,15,26,8] and image-domain [6–8]....
TL;DR: A foveation model as well as a foveated JND (FJND) model in which the spatial and temporal JND models are enhanced to account for the relationship between visibility and eccentricity is described.
Abstract: Traditional video compression methods remove spatial and temporal redundancy based on the signal statistical correlation. However, to reach higher compression ratios without perceptually degrading the reconstructed signal, the properties of the human visual system (HVS) need to be better exploited. Research effort has been dedicated to modeling the spatial and temporal just-noticeable-distortion (JND) based on the sensitivity of the HVS to luminance contrast, and accounting for spatial and temporal masking effects. This paper describes a foveation model as well as a foveated JND (FJND) model in which the spatial and temporal JND models are enhanced to account for the relationship between visibility and eccentricity. Since the visual acuity decreases when the distance from the fovea increases, the visibility threshold increases with increased eccentricity. The proposed FJND model is then used for macroblock (MB) quantization adjustment in H.264/advanced video coding (AVC). For each MB, the quantization parameter is optimized based on its FJND information. The Lagrange multiplier in the rate-distortion optimization is adapted so that the MB noticeable distortion is minimized. The performance of the FJND model has been assessed with various comparisons and subjective visual tests. It has been shown that the proposed FJND model can increase the visual quality versus rate performance of the H.264/AVC video coding scheme.
TL;DR: A new numerical measure for visual attention's modulatory aftereffects, perceptual quality significance map (PQSM), is proposed and demonstrates the performance improvement on two PQSM-modulated visual sensitivity models and two P QSM-based visual quality metrics.
Abstract: With the fast development of visual noise-shaping related applications (visual compression, error resilience, watermarking, encryption, and display), there is an increasingly significant demand on incorporating perceptual characteristics into these applications for improved performance. In this paper, a very important mechanism of the human brain, visual attention, is introduced for visual sensitivity and visual quality evaluation. Based upon the analysis, a new numerical measure for visual attention's modulatory aftereffects, perceptual quality significance map (PQSM), is proposed. To a certain extent, the PQSM reflects the processing ability of the human brain on local visual contents statistically. The PQSM is generated with the integration of local perceptual stimuli from color contrast, texture contrast, motion, as well as cognitive features (skin color and face in this study). Experimental results with subjective viewing demonstrate the performance improvement on two PQSM-modulated visual sensitivity models and two PQSM-based visual quality metrics.
TL;DR: A new JND estimator for color video is devised in image-domain with the nonlinear additivity model for masking and is incorporated into a motion-compensated residue signal preprocessor for variance reduction toward coding quality enhancement, and both perceptual quality and objective quality are enhanced in coded video at a given bit rate.
Abstract: We present a motion-compensated residue signal preprocessing scheme in video coding scheme based on just-noticeable-distortion (JND) profile Human eyes cannot sense any changes below the JND threshold around a pixel due to their underlying spatial/temporal masking properties An appropriate (even imperfect) JND model can significantly help to improve the performance of video coding algorithms From the viewpoint of signal compression, smaller variance of signal results in less objective distortion of the reconstructed signal for a given bit rate In this paper, a new JND estimator for color video is devised in image-domain with the nonlinear additivity model for masking (NAMM) and is incorporated into a motion-compensated residue signal preprocessor for variance reduction toward coding quality enhancement As the result, both perceptual quality and objective quality are enhanced in coded video at a given bit rate A solution of adaptively determining the parameter for the residue preprocessor is also proposed The devised technique can be applied to any standardized video coding scheme based on motion compensated prediction It provides an extra design option for quality control, besides quantization, in contrast with most of the existing perceptually adaptive schemes which have so far focused on determination of proper quantization steps As an example for demonstration, the proposed scheme has been implemented in the MPEG-2 TM5 coder, and achieved an average peak signal-to-noise (PSNR) increment of 0505 dB over the twenty video sequences which have been tested The perceptual quality improvement has been confirmed by the subjective viewing tests conducted
191 citations
Cites background or methods from "Perceptual feedback in multigrid mo..."
...A few attempts have been made to nonstandard video coding [14], [ 9 ]....
[...]
...In [ 9 ], subband JND has been used in the quantization process and also in controlling the block splitting process in variable-size motion search....
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Abstract: It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.
TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
Abstract: It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.
9,602 citations
"Perceptual feedback in multigrid mo..." refers background in this paper
...Moreover, the MPE quantizers reduce to the JPEG and MPEG quantizers if a simple (linear) perception model is considered....
TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
Abstract: 1 Introduction- 11 Signals, Coding, and Compression- 12 Optimality- 13 How to Use this Book- 14 Related Reading- I Basic Tools- 2 Random Processes and Linear Systems- 21 Introduction- 22 Probability- 23 Random Variables and Vectors- 24 Random Processes- 25 Expectation- 26 Linear Systems- 27 Stationary and Ergodic Properties- 28 Useful Processes- 29 Problems- 3 Sampling- 31 Introduction- 32 Periodic Sampling- 33 Noise in Sampling- 34 Practical Sampling Schemes- 35 Sampling Jitter- 36 Multidimensional Sampling- 37 Problems- 4 Linear Prediction- 41 Introduction- 42 Elementary Estimation Theory- 43 Finite-Memory Linear Prediction- 44 Forward and Backward Prediction- 45 The Levinson-Durbin Algorithm- 46 Linear Predictor Design from Empirical Data- 47 Minimum Delay Property- 48 Predictability and Determinism- 49 Infinite Memory Linear Prediction- 410 Simulation of Random Processes- 411 Problems- II Scalar Coding- 5 Scalar Quantization I- 51 Introduction- 52 Structure of a Quantizer- 53 Measuring Quantizer Performance- 54 The Uniform Quantizer- 55 Nonuniform Quantization and Companding- 56 High Resolution: General Case- 57 Problems- 6 Scalar Quantization II- 61 Introduction- 62 Conditions for Optimality- 63 High Resolution Optimal Companding- 64 Quantizer Design Algorithms- 65 Implementation- 66 Problems- 7 Predictive Quantization- 71 Introduction- 72 Difference Quantization- 73 Closed-Loop Predictive Quantization- 74 Delta Modulation- 75 Problems- 8 Bit Allocation and Transform Coding- 81 Introduction- 82 The Problem of Bit Allocation- 83 Optimal Bit Allocation Results- 84 Integer Constrained Allocation Techniques- 85 Transform Coding- 86 Karhunen-Loeve Transform- 87 Performance Gain of Transform Coding- 88 Other Transforms- 89 Sub-band Coding- 810 Problems- 9 Entropy Coding- 91 Introduction- 92 Variable-Length Scalar Noiseless Coding- 93 Prefix Codes- 94 Huffman Coding- 95 Vector Entropy Coding- 96 Arithmetic Coding- 97 Universal and Adaptive Entropy Coding- 98 Ziv-Lempel Coding- 99 Quantization and Entropy Coding- 910 Problems- III Vector Coding- 10 Vector Quantization I- 101 Introduction- 102 Structural Properties and Characterization- 103 Measuring Vector Quantizer Performance- 104 Nearest Neighbor Quantizers- 105 Lattice Vector Quantizers- 106 High Resolution Distortion Approximations- 107 Problems- 11 Vector Quantization II- 111 Introduction- 112 Optimality Conditions for VQ- 113 Vector Quantizer Design- 114 Design Examples- 115 Problems- 12 Constrained Vector Quantization- 121 Introduction- 122 Complexity and Storage Limitations- 123 Structurally Constrained VQ- 124 Tree-Structured VQ- 125 Classified VQ- 126 Transform VQ- 127 Product Code Techniques- 128 Partitioned VQ- 129 Mean-Removed VQ- 1210 Shape-Gain VQ- 1211 Multistage VQ- 1212 Constrained Storage VQ- 1213 Hierarchical and Multiresolution VQ- 1214 Nonlinear Interpolative VQ- 1215 Lattice Codebook VQ- 1216 Fast Nearest Neighbor Encoding- 1217 Problems- 13 Predictive Vector Quantization- 131 Introduction- 132 Predictive Vector Quantization- 133 Vector Linear Prediction- 134 Predictor Design from Empirical Data- 135 Nonlinear Vector Prediction- 136 Design Examples- 137 Problems- 14 Finite-State Vector Quantization- 141 Recursive Vector Quantizers- 142 Finite-State Vector Quantizers- 143 Labeled-States and Labeled-Transitions- 144 Encoder/Decoder Design- 145 Next-State Function Design- 146 Design Examples- 147 Problems- 15 Tree and Trellis Encoding- 151 Delayed Decision Encoder- 152 Tree and Trellis Coding- 153 Decoder Design- 154 Predictive Trellis Encoders- 155 Other Design Techniques- 156 Problems- 16 Adaptive Vector Quantization- 161 Introduction- 162 Mean Adaptation- 163 Gain-Adaptive Vector Quantization- 164 Switched Codebook Adaptation- 165 Adaptive Bit Allocation- 166 Address VQ- 167 Progressive Code Vector Updating- 168 Adaptive Codebook Generation- 169 Vector Excitation Coding- 1610 Problems- 17 Variable Rate Vector Quantization- 171 Variable Rate Coding- 172 Variable Dimension VQ- 173 Alternative Approaches to Variable Rate VQ- 174 Pruned Tree-Structured VQ- 175 The Generalized BFOS Algorithm- 176 Pruned Tree-Structured VQ- 177 Entropy Coded VQ- 178 Greedy Tree Growing- 179 Design Examples- 1710 Bit Allocation Revisited- 1711 Design Algorithms- 1712 Problems
7,015 citations
"Perceptual feedback in multigrid mo..." refers background in this paper
...The design of a transform quantizer for a given block transform involves finding the optimal number of quantization levels for each coefficient (bit allocation) and the optimal distribution of these quantization levels in each case [27]....
[...]
...Moreover, the MPE quantizers reduce to the JPEG and MPEG quantizers if a simple (linear) perception model is considered....
TL;DR: These comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques the authors implemented.
Abstract: While different optical flow techniques continue to appear, there has been a lack of quantitative evaluation of existing methods. For a common set of real and synthetic image sequences, we report the results of a number of regularly cited optical flow techniques, including instances of differential, matching, energy-based, and phase-based methods. Our comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques we implemented.
TL;DR: Design of the MPEG algorithm presents a difficult challenge since quality requirements demand high compression that cannot be achieved with only intraframe coding, and the algorithm’s random access requirement is best satisfied with pure intraframes coding.
Abstract: The Moving Picture Experts Group (MPEG) standard addresses compression of video signals at approximately 1.5M-bits. MPEG is a generic standard and is independent of any particular applications. Applications of compressed video on digital storage media include asymmetric applications such as electronic publishing, games and entertainment. Symmetric applications of digital video include video mail, video conferencing, videotelephone and production of electronic publishing. Design of the MPEG algorithm presents a difficult challenge since quality requirements demand high compression that cannot be achieved with only intraframe coding. The algorithm’s random access requirement, however, is best satisfied with pure intraframe coding. MPEG uses predictive and interpolative coding techniques to answer this challenge. Extensive details are presented.
2,447 citations
"Perceptual feedback in multigrid mo..." refers background or methods in this paper
...REFERENCES
[1] D. LeGall, “MPEG: A video compression standard for multimedia applications,”Commun....
[...]
...1 shows the product for the linear (MPEGlike) and the nonlinear MPE quantizers....
[...]
...To achieve this aim, current video coders are based on motion compensation and two-dimensional (2-D) transform coding of the residual error [1]–[4]....
[...]
...• Proposed scheme versus previous comparable schemes
(H.263, MPEG-1)....
[...]
...8 shows a representative example of the decoded results using the same MPEG-like quantizer and the different considered motion estimations at a fixed bit-rate (frame 7 of theTaxi sequence)....
Q1. What are the contributions in "Perceptual feedback in multigrid motion estimation using an improved dct quantization" ?
In this paper, a multigrid motion compensation video coder based on the current human visual system ( HVS ) contrast discrimination models is proposed.