scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Perceptual feedback in multigrid motion estimation using an improved DCT quantization

01 Oct 2001-IEEE Transactions on Image Processing (IEEE)-Vol. 10, Iss: 10, pp 1411-1427
TL;DR: The results show that this feedback induces a scale-dependent refinement strategy that gives rise to more robust and meaningful motion estimation, which may facilitate higher level sequence interpretation.
Abstract: In this paper, a multigrid motion compensation video coder based on the current human visual system (HVS) contrast discrimination models is proposed. A novel procedure for the encoding of the prediction errors has been used. This procedure restricts the maximum perceptual distortion in each transform coefficient. This subjective redundancy removal procedure includes the amplitude nonlinearities and some temporal features of human perception. A perceptually weighted control of the adaptive motion estimation algorithm has also been derived from this model. Perceptual feedback in motion estimation ensures a perceptual balance between the motion estimation effort and the redundancy removal process. The results show that this feedback induces a scale-dependent refinement strategy that gives rise to more robust and meaningful motion estimation, which may facilitate higher level sequence interpretation. Perceptually meaningful distortion measures and the reconstructed frames show the subjective improvements of the proposed scheme versus an H.263 scheme with unweighted motion estimation and MPEG-like quantization.

Summary (3 min read)

Introduction

  • I N natural video sequences to be judged by human observers,two kinds of redundancies can be identified: 1)objective redundancies, related to the spatio-temporal correlations among the video samples and 2)subjective redundancies, which refer to the data that can be safely discarded without perceptual loss.
  • On the one hand, better motion estimation may lead to better predictions and should alleviate the task of the quantizer.
  • Most of the recent work on motion estimation for video coding has been focused on the adaptation of the motion estimate to agiven quantizerto obtain an good balance between these elements.

II. CONVENTIONAL TECHNIQUES FORTRANSFORMQUANTIZER DESIGN AND MULTIGRID MOTION ESTIMATION

  • The basic elements of a motion compensated coder are the optical flow estimation and the prediction error quantization.
  • The optimal quantizers (in an average error sense) may underperform on individual blocks or frames [9] even if the error measure is perceptually weighted [28]: the accumulation of quantization levels in certain regions in order to minimize the average perceptual error does not ensure good behavior on a particular block of the DFD.
  • The motion estimation starts at a coarse resolution (large blocks).
  • While the magnitude-based criteria were proposed without a specific relation to the encoding of the DFD, the entropy-based criteria make explicit use of the trade off between DVF and DFD.
  • The similarity between the impulse responses of the perceptual filters of the transformand the basis functions of the local frequency transforms used in image and video coding has been used to apply the experimental properties of the perceptual transform domain to the block DCT transform as a reasonable approximation [13], [14], [25], [26], [38], [39].

A. Maximum Perceptual Error (MPE) Criterion for Quantizer Design

  • The natural way of assessing the quality of an encoded picture (or sequence) involves a one-to-one comparison between the original and the encoded version.
  • The result of this comparison is related to the ability of the observer to notice the particular quantization noise in the presence of the original pattern.
  • Even if a perceptual weighting is used, the average criteria may bias the results.
  • This requirement is satisfied by a perceptually uniform distribution of the available quantization levels in the transform domain.
  • If the perceptual distance between levels is constant, the MPE in each component is bounded regardless of the amplitude of the input.

B. Optimal Spatial Quantizers Under the MPE Criterion

  • The design of a transform quantizer for a given block transform involves finding the optimal number of quantization levels for each coefficient (bit allocation) and the optimal distribution of these quantization levels in each case [27].
  • The CSF-based (linear MPE) quantizer used in MPEG [1]–[3] and the proposed nonlinear MPE quantizer [13], [14], [26], represent different degrees of approximation to the actual quantization process, , eventually carried out by the HVS.
  • The scheme that takes into account the perceptual amplitude nonlinearities will presumably be more efficient in removing the subjective redundancy from the DFD.
  • It is interesting to note that the reasoning about the perceptual relevance of the motion information presented here leads to (14), which takes into account the quantized DFD entropy, in a natural way.

E timation

  • The performance of the linear MPE (MPEG-like) quantizer and the proposed nonlinear MPE quantizers was compared using the same (perceptually unweighted H.263-like) motion estimation.
  • Fig. 6 shows the increase of the perceptual distortion in the different reconstructions of theRubiksequence.
  • The temporal filtering smooths the reconstructed sequence and reduces to some extent the remaining blocking effect and busy artifacts of the 2-D nonlinear approach.
  • The key factor in the improvements of the proposed quantizers is the consideration of the amplitude nonlinearities and the corresponding enlargement of the spatial quantizer bandwidth (Fig. 2).
  • This enlargement implies that the quantized signal keeps some significant details otherwise discarded, avoiding the rapid degradation of the reconstructed signal.

B. Experiment 2: Different Motion Estimations with the Same Quantizer

  • The proposed perceptually weighted variable-size BMA and the unweighted variable-size BMA were compared in this experiment using the same linear MPE (MPEG-like) quantizer.
  • These results show that both variable-size BMAs make some difference in quality with regard to the fixed-size H.261 BMA due to the savings in DVF information.
  • It has been empirically found that this error measure adequately describes the intuitive quality of the segmentation (Figs. 10 and 11 and their particular errors illustrate this point).
  • The following trends can be identified from the obtained results: 1) the segmentation is better when the blocks are fairly small compared to the size of the moving regions (see the variable-size BMA error results forRubik–large object- andTaxi –small objects–); however, 2) the segmentation is very sensitive to the robustness and coherence of the sparse flow.

C. Experiment 3: Relative Relevance of the Improvements in the Motion Estimation and the Quantization

  • To study the combined effect and the relative advantages of the proposed improvements, the four possible combinations of motion estimation and quantization algorithms were compared at a fixed bit-rate.
  • Fig. 12 shows an example of a reconstructed frame using the four different approaches considered.
  • Due to the noisy (high-frequency) nature of the error signal, wide band quantizers may be better than narrower band (CSF-based) quantizers.
  • The interesting constant distortion result reported in Section V-A (Fig. 6) is also reproduced here:.
  • While the perceptual distortion increases quickly when using the linear quantizer, it remains constant with the nonlinear MPE quantizer regardless the motion estimation algorithm.

D. Experiment 4: The Proposed Scheme Versus Previous Comparable Schemes

  • The considered elements of the motion compensated video coder were combined to simulate and compare previously reported schemes (MPEG-1 or H.261 and H.263) and the proposed ones.
  • The current motion compensated video coding standards include very basic perceptual information (linear threshold models) only in the quantizer design.
  • He received the Licenciado degree in physics (electricity, electronics, and computer science) in 1987 and the Ph.D. degree in pattern recognition in 1993, both from Universitat de València, València, Spain.

Did you find this useful? Give us your feedback

Figures (17)

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 10, OCTOBER 2001 1411
Perceptual Feedback in Multigrid Motion Estimation
Using an Improved DCT Quantization
Jesús Malo, Juan Gutiérrez, I. Epifanio, Francesc J. Ferri, and José M. Artigas
Abstract—In this paper, a multigrid motion compensation video
coder based on the current human visual system (HVS) contrast
discrimination models is proposed. A novel procedure for the en-
coding of the prediction errors has been used. This procedure re-
stricts the maximum perceptual distortion in each transform co-
efficient. This subjective redundancy removal procedure includes
the amplitude nonlinearities and some temporalfeatures of human
perception. Aperceptually weightedcontrol of the adaptive motion
estimation algorithm has also been derived from this model. Per-
ceptual feedback in motion estimation ensures a perceptual bal-
ance between the motion estimation effort and the redundancy re-
moval process. The results show that this feedback induces a scale-
dependent refinement strategy that gives rise to more robust and
meaningful motion estimation, which may facilitate higher level
sequence interpretation. Perceptually meaningful distortion mea-
sures and the reconstructed frames show the subjective improve-
ments of the proposed scheme versus an H.263 scheme with un-
weighted motion estimation and MPEG-like quantization.
Index Terms—Entropy constrained motion estimation, non-
linear human vision model, perceptual quantization, video coding.
I. INTRODUCTION
I
N natural video sequences to be judged by human observers,
two kinds of redundancies can be identified: 1) objective re-
dundancies, related to the spatio-temporal correlations among
the video samples and 2) subjective redundancies, which refer
to the data that can be safely discarded without perceptual loss.
The aim of any video coding scheme is to remove both kinds of
redundancy. To achieve this aim, current video coders are based
on motion compensation and two-dimensional (2-D) transform
coding of the residual error [1]–[4]. The original video signal
is split into motion information and prediction errors. These
two lower complexity sub-sources of information are usually
referred to as displacement vector field (DVF) and displaced
frame difference (DFD), respectively.
In the most recent standards, H.263 and MPEG-4 [4], [5],
the fixed-resolution motion estimation algorithm used in H.261
and MPEG-1 has been replaced by an adaptive, variable-size
block matchingalgorithm(BMA)toobtainimprovedmotiones-
timates [6]. Spatial subjective redundancy is commonly reduced
Manuscript received May 6, 1999; revised June 18, 2001. This work was sup-
ported in part by CICYT Projects TIC 1FD97-0279 and TIC 1FD97-1910. The
associate editor coordinating the review of this manuscript and approving it for
publication was Prof. Rashid Ansari.
J. Malo and J. M. Artigas are with the Departament d’Òptica, Universitat
de València, 46100 Burjassot, València, Spain (e-mail: Jesus.Malo@uv.es;
http://taz.uv.es/~jmalo).
J. Gutiérrez, I. Epifanio, and F. J. Ferri are with the Departament d’Infor-
màtica Universitat de València, 46100 Burjassot, València, Spain.
Publisher Item Identifier S 1057-7149(01)08211-2.
through a perceptually weighted quantization of a transform of
the DFD. The bit allocation among the transform coefficients is
based on the spatial frequency response of simple (linear and
threshold) perception models [1]–[3].
In this context, there is a clear tradeoff between the effort
devoted to motion compensation and transform redundancy re-
moval. On the one hand, better motion estimation may lead to
better predictions and should alleviate the task of the quantizer.
On the other hand, better quantization techniques may be able to
removemoreredundancy, thereby reducing the predictivepower
needed in the motion estimate. Most of the recent work on mo-
tion estimation for video coding has been focused on the adap-
tation of the motion estimate to a given quantizer to obtain an
good balance between these elements. Since the introduction
of the intuitive (suboptimal) entropy-constrained motion esti-
mation of Dufaux et al. [7], [8] several optimal, variable-size
BMAs have been proposed [9]–[12]. These approaches put for-
ward their intrinsic optimality, but the corresponding visual ef-
fect and the relative importance of the motion improvements
versus the quantizer improvements have not been deeply ex-
plored, mainly because of their subjective nature.
This paper adresses the problem of the tradeoff between
multigrid motion estimation and error quantization in a dif-
ferent way. An improved (nonlinear) perception model inspires
the whole design to obtain a coder that preserves no more
than the subjectively significant information. The role of
the perceptual model in the proposed video coder scheme is
twofold. First, it is used to simulate the redundancy removal in
the human visual system (HVS) through an appropriate per-
ceptually matched quantizer. Second, this perceptual quantizer
is used to control the adaptive motion estimation. This control
introduces a perceptual feedback in the motion estimation
stage. This perceptual feedback limits the motion estimation
effort, avoiding superfluous prediction of details that are
perceptually negligible and will be discarded by the quantizer.
The bandpass shape of the perceptual constraint to the motion
estimation gives a scale-dependent control criterion that may
be useful for discriminating between significant and noisy
motions. Therefore, the benefits of including the properties
of the biological filters in the design may go beyond a better
rate-distortion performance but also improve the meaningful-
ness of the motion estimates. This fact may be important for
next generation coders that build models of the scene from the
low-level information used in the current standards.
In this paper, a novel subjective redundancy removal proce-
dure [13], [14] and a novel perceptually weighted motion esti-
mation algorithm [12] are jointly considered to present a fully
perceptual motion compensated video coder. The aim of the
1057–7149/01$10.00 © 2001 IEEE

1412 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 10, OCTOBER 2001
paper is to assess the relative relevance of optimal variable-size
BMAs and quantizer improvements. To this end, the decoded
frames are explicitly compared and analyzed in terms of per-
ceptually meaningful distortion measures [15], [16]. The mean-
ingfulness of the motion information is tested by using it as
input for a well established motion-based segmentation algo-
rithm used in model-based video coding [17], [18].
The paper is organized as follows. In Section II, the current
methods for quantizer design and variable-size BMA for mo-
tion compensation are briefly reviewed. The proposed improve-
ments in the quantizer design, along with their perceptual foun-
dations, are detailed in Section III. In Section IV, the proposed
motion refinement criterion is obtained from the requirement of
a monotonic reduction of the significant (perceptual) entropy of
DFD and DVF. The comparison experiments are presented and
discussed in Section V. Some final remarks are given in Sec-
tion VI.
II. C
ONVENTIONAL TECHNIQUES FOR TRANSFORM QUANTIZER
DESIGN AND MULTIGRID MOTION ESTIMATION
The basic elements of a motion compensated coder are the
optical flow estimation and the prediction error quantization.
The optical flow information is used to reduce the objective
temporal redundancy, while the quantization of the transformed
error signal [usually a 2-D discrete cosine transform (DCT)] re-
duces the remaining (objective and subjective) redundancy to
certain extent [1]–[4].
Signal independent JPEG-like uniform quantizers are em-
ployed in the commonly used standards [1]–[4]. In this case,
bit allocation in the 2-D DCT domain is heuristically based on
the threshold detection properties of the HVS [2], [3], but nei-
ther amplitude nonlinearities [19] nor temporal properties of the
HVS [20]–[22] are taken into account. The effect of these prop-
erties is not negligible [23], [24]. In particular, the nonlinearities
of the HVS may have significant effects on bit allocation and
improve the subjective results of the JPEG-like quantizers [13],
[14], [25], [26].
The conventional design of a generic transform quantizer is
based on the minimization of the average quantization error
over a training set [27]. However, the techniques based on av-
erage error minimization have some subjective drawbacks in
image coding applications. The optimal quantizers (in an av-
erage error sense) may underperform on individual blocks or
frames [9] even if the error measure is perceptually weighted
[28]: the accumulation of quantization levels in certain regions
in order to minimize the average perceptual error does not en-
sure good behavior on a particular block of the DFD. This sug-
gests that the subjective problems of the conventional approach
are not only due to the use of perceptually unsuitable metrics, as
usually claimed, but are also due to the use of an inappropriate
average error criterion. In addition to this, quantizer designs that
depend on the statistics of the input have to be re-computed as
the input signal changes. These factors favor the use of quan-
tizers based on the threshold frequency response of the HVS
instead of the conventional, average error-based quantizers.
Multigrid motion estimation techniques are based on
matching between variable-size blocks of consecutive frames
of the sequence [6]. The motion estimation starts at a coarse
resolution (large blocks). At a given resolution, the best dis-
placement for each block is computed. The resolution of the
motion estimate is locally increased (a block of the quadtree is
split) according to some refinement criterion. The process ends
when no block of the quadtree can be split further.
The splitting criterion is the most important part of the algo-
rithm because it controls the local refinement of the motion esti-
mate. The splitting criterion has effects on the relative volumes
of DVF and DFD [7]–[12], and may give rise to unstable motion
estimates due to an excesive refinement of the quadtree struc-
ture [12], [29]. The usefulness of the motion information for
higher-level purposes (as in model-based video coding [5], [30],
[31]) highly depends on its robustness (absence of false alarms)
and hence on the splitting criterion. Motion-based segmentation
algorithms [17], [18] require reliable initial motion information,
especially when using sparse (nondense) flows such as those
given by variable-size BMA. Two kinds of splitting criteria have
already been used: 1) the magnitude of the prediction error, e.g.,
energy, mean-square error or mean-absolute error [6], [29], [32],
[33], and 2) the complexity of the prediction error. In this case,
the zeroth-order spatial entropy [7], [8] and the entropy of the
encoded DFD [9]–[12] have been reported. While the magni-
tude-based criteria were proposed without a specific relation to
the encoding of the DFD, the entropy-based criteria make ex-
plicit use of the trade off between DVF and DFD.
Since the first entropy-constrained approach was introduced
[7], [8], great effort has been devoted to obtaining analyt-
ical [9]–[11] or numerical [12] optimal entropy-constrained
quadtree DVF decompositions. These approaches criticize the
(faster) entropy measure of the DFD in the spatial domain of
Dufaux et al. because it does not take into account the effect
of the selective DCT quantizer. This necessarily implies a sub-
optimal bit allocation between DVF and DFD. The literature
[9]–[12] reports the optimality of the proposed methods, but
the practical (subjective) effect of this gain on the reconstructed
sequence is not analyzed. In particular, only perceptually
unweighted SNR or MSE distortion measures are given and no
explicit comparison of the decoded sequences is shown.
III. P
ERCEPTUALLY UNIFORM DCT QUANTIZATION
Splitting the original signal into two lowercomplexity signals
(DVF and DFD) does reduce their redundancy to a certain ex-
tent. However, the enabling fact behind very-low-bit-rate coding
is that not all the remaining data are significant to the human ob-
server. This is why more than just the strictly predictable data
can be safely discarded in the DFD quantization.
According to the current models of human contrast pro-
cessing and discrimination [34], [35], the input spatial patterns
are first mapped onto a local frequency domain through a set
of bandpass filters with different relative gains. After that, a
log-like nonlinearity is applied to each transform coefficient
to obtain the response representation. Let us describe this
two-step process as
(1)

MALO et al.: PERCEPTUAL FEEDBACK IN MULTIGRID MOTION ESTIMATION 1413
where
vector
input image;
matrix
filter bank;
vector
local frequency transform;
function
nonlinearity;
vector
response to the input.
The
components of the image vector, , repre-
sent the samples of the input luminance at the discrete positions
. is a matrix constituted by the impulse re-
sponses of the
bandpass filters.The local frequencytransform
is
. Each coefficient of the transform ,
represent the output of the filter
with . Each
local filter
is tuned to a certain frequency. In general [34],
each coefficient
of the response will
depend on several transform coefficients
. However, at a first
approximation [19], the contributions of
with can be
neglected.
The effect of the response
in the transform can be conve-
niently modeled by a nonuniform perceptual quantizer
. This
interpretation as a quantizer is based on the limited resolution
of the HVS. If the amplitude of a basis function of the transform
is modified, the induced perception will remain constant until
the just noticeable difference (JND) is reached. In this case, as
in quantization, a continuous range of amplitudes gives rise to a
single perception [36], [37]. This perceptual quantizer has to be
nonuniform because the empirical JNDs are nonuniform [19],
[21], [22], [34]. The similarity between the impulse responses
of the perceptual filters of the transform
and the basis func-
tions of the local frequency transforms used in image and video
coding has been used to apply the experimental properties of
the perceptual transform domain to the block DCT transform as
a reasonable approximation [13], [14], [25], [26], [38], [39]. In
this paper,
is formulated in the DCT domain through an ex-
plicit design criterion based on a distortion metric that includes
the HVS nonlinearities [15], [16] and some temporal perceptual
features [20]–[22].
A. Maximum Perceptual Error (MPE) Criterion for Quantizer
Design
The natural way of assessing the quality of an encoded pic-
ture (or sequence) involves a one-to-one comparison between
the original and the encoded version. The result of this compar-
ison is related to the ability of the observer to notice the partic-
ular quantization noise in the presence of the original (masking)
pattern. This one-to-one noise detection or assessment is clearly
related to the tasks behind the standard pattern discrimination
models [34], [35], in which an observer has to evaluate the dis-
tortion from a masking stimulus. In contrast, a hypothetical re-
quest of assessing the global performance of a quantizer over
a set of images or sequences would involve a sort of averaging
of each one-to-one comparison. It is unclear how a human ob-
server does this kind of averaging to obtain a global feeling of
performance and the task itself is far from the natural one-to-one
comparison that arises when one looks at a particular picture.
The conventional techniques of transform quantizer design
use average design criteria in such a way that the final quantizer
achieves the minimum average error over the training set (sum
of the one-to-one distortions weighted by their probability) [27].
However, the minimization of an average error measure does not
guarantee a satisfactory subjective performance on individual
comparisons [9]. Even if a perceptual weighting is used, the av-
erage criteria may bias the results. For instance, Macq [28] used
uniform quantizers instead of the optimal Lloyd-Max quantizers
[27], [40], due to the perceptual artifacts caused by the outliers
on individual images.
To prevent large perceptual distortions on individual images
arising from outlier coefficients, the coder should restrict the
maximum perceptual error (MPE) in each coefficient and am-
plitude [13], [14]. This requirement is satisfied by a perceptu-
ally uniform distribution of the available quantization levels in
the transform domain. If the perceptual distance between levels
is constant, the MPE in each component is bounded regardless
of the amplitude of the input.
In this paper, the restriction of the MPE will be used as a
design criterion. This criterion can be seen as a perceptual ver-
sion of the minimum maximum error criterion [9]. This idea
has been implicitly used in still image compression [25], [26] to
achieve a constant error contribution from each frequency com-
ponent on an individual image. It has been shown that bounding
the perceptual distortion in each DCT coefficient may be sub-
jectively more effective than minimizing the average percep-
tual error [13], [14]. Moreover, the MPE quantizers reduce to
the JPEG and MPEG quantizers if a simple (linear) perception
model is considered.
B. Optimal Spatial Quantizers Under the MPE Criterion
The design of a transform quantizer for a given block trans-
form involves finding the optimal number of quantization levels
for each coefficient (bit allocation) and the optimal distribution
of these quantization levels in each case [27].
Let us assume that the squared perceptual distance between
two similar patterns in the transform domain
and is
given by a weigthed sum of the distortion in each coefficient
(2)
where
is a frequency and amplitude-dependent percep-
tual metric.
In order to prevent large perceptual errors on individual im-
ages coming from outlier coefficient values, the coder should be
designed to bound the MPE for every frequency
and ampli-
tude
.
If a given coefficient (at frequency
) is represented by
quantization levelsdistributed according to a density the
maximum Euclidean quantization error at an amplitude
will
be bounded by half the Euclidean distance between two levels
(3)
The MPE for that frequency and amplitude will be related to the
metric and the density of levels:
(4)

1414 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 10, OCTOBER 2001
The only density of quantization levels that gives a constant
MPE bound over the amplitude range is the one that varies as
the square root of the metric
(5)
With these optimal densities, the MPE in each coefficient
will
depend on the number of allocated levels and on the integrated
value of the metric
(6)
Fixing the same maximum distortion for each coefficient
and solving for , the optimal number of
quantization levels is obtained
(7)
The general form of the optimal MPE quantizer is given by (5)
and (7) as a function of the perceptual metric. Thus, the behavior
of the MPE quantizer will depend on the accuracy of the se-
lected
. Here, a perceptual metric related to the gradient of
the nonlinear response
and to the amplitude JNDs has been
considered [15], [16]
JND
CSF (8)
where
is the local mean luminance, CSF is the contrast sen-
sitivity function (the bandpass linear filter which characterizes
the HVS performance for low amplitudes [20], [24], [28], [41]),
and
are empirical monotonically increasing functions
of amplitude for each spatial frequency to fit the amplitude JND
data [15]. In particular, we have used the CSF of Nygan et al.
[41]
CSF
(9)
and the following nonlinear functions
[15] (frequency
in cycles/degrees)
CSF
CSF
(10)
It is important to note that the metric weight for each coeffi-
cient in (8) has two contributions: one constant term (the CSF)
and one amplitude-dependent term that vanishes for low am-
plitudes. This second term comes from a nonlinear correction
to the linear threshold response described by the CSF. These
two terms in the metric give two interesting particular cases of
the MPE formulation. First, if a simple linear perception model
is assumed, a CSF-based MPEG-like quantizer is obtained. If
the nonlinear correction in (8) is neglected, uniform quantizers
are obtained for each coefficient and
becomes proportional
to the CSF, which is one of the recommended options in the
JPEG and MPEG standards [1]–[3]. Second, if both factors of
the metric are taken into account, the algorithm of [13], [14],
[26] is obtained: the quantization step size is input-dependent
and proportional to the JNDs and bit allocation is proportional
to the integral of the inverse of the JNDs. From now on, these
two cases will referred to as linear and nonlinear MPE, respec-
tively.
The CSF-based (linear MPE) quantizer used in MPEG
[1]–[3] and the proposed nonlinear MPE quantizer [13], [14],
[26], represent different degrees of approximation to the actual
quantization process,
, eventually carried out by the HVS.
The scheme that takes into account the perceptual amplitude
nonlinearities will presumably be more efficient in removing
the subjective redundancy from the DFD.
Fig. 1 shows the product
for the linear (MPEG-
like) and the nonlinear MPE quantizers. This product represents
the number of quantization levels per unit of area in the fre-
quency and amplitude plane. This surface is a useful description
of where a quantizer concentrates the encoding effort[14].Fig. 2
showsthe bit allocation solutions (number of quantization levels
per coefficient
) in the linear and the nonlinear MPE cases.
Note how the amplitude nonlinearities enlarge the bandwidth
of the quantizer in comparison to the CSF-based case. This en-
largement will make a difference when dealing with wide spec-
trum signals like the DFD.
C. Introducing HVS Temporal Properties in the Prediction
Loop
The previous considerations about optimal MPE 2-D
transform quantizers can be extended to three-dimensional
(3-D) spatio-temporal transforms. The HVS motion perception
models extend the 2-D spatial filter bank to nonzero temporal
frequencies [42], [43]. The CSF filter is also defined for
moving gratings [20] and the contrast discrimination curves
for spatio-temporal gratings show roughly the same shape as
the curves for still stimuli [21], [22]. By using the 3-D CSF
and similar nonlinear corrections for high amplitudes, the
expression of (8) could be employed to measure differences
between local moving patterns. In this way, optimal MPE
quantizers could be defined in a spatio-temporal frequency
transform domain. However, the frame-by-frame nature of any
motion compensated scheme makes the implementation of a
3-D transform quantizer in the prediction loop more difficult.
In order to exploit the subjective temporal redundancy re-
moval to some extent, the proposed 2-D MPE quantizer can be
complemented with one-dimensional (1-D) temporal filtering
based on the perceptual bit allocation in the temporal dimen-
sion. This temporal filter can be implemented by a simple fi-
nite impulse response weighting of the incoming error frames.
The temporal frequency response of the proposed 1-D filter
is set proportional to the number of quantization levels that
should be allocated in each temporal frequency frame of a 3-D
MPE optimal quantizer. For each spatio-temporal coefficient
, the optimal number of quantization levels is given

MALO et al.: PERCEPTUAL FEEDBACK IN MULTIGRID MOTION ESTIMATION 1415
Fig. 1. Relative number of quantization levels allocated in the frequency and amplitude plane for a) nonlinear MPE and b) linear MPE quantizers. The surfaces
are scaled to have unit integral (the same total number of quantization levels). The distribution of the quantization levels in amplitude for a certain coefficient is just
the corresponding slice of the surface at the desired frequency. The MPE design [(5) and (7)] implies that this surface is proportional to the metric
N
1
(
a
)
/
W
(
a
)
so different perception models (different metrics) give rise to a different distribution of quantization levels. Note that the distribution is uniform for
every frequency in the linear MPE (MPEG-like) case and nonuniform (peaked at low amplitides) in the nonlinear MPE case.
Fig. 2. Bit allocation results (relative number of quantization levels per
coefficient) for the linear MPE (MPEG-like case) and for the nonlinear MPE
case. The curves are scaled to have unit integral (the same total number of
quantization levels). In the linear case, the metric is just the square of the CSF
and then
N
/
CSF as recommended by JPEG and MPEG. A more complex
(nonlinear) model gives rise to a wider quantizer bandpass.
by (7). Integrating over the spatial frequency, the number of
quantization levels for that temporal frequency is
(11)
Fig. 3(a) shows the number of quantization levels for each
spatio-temporal frequency of a 3-D nonlinear MPE quantizer.
This is the 3-D version of the 2-D nonlinear bit allocation of
Fig. 2. Note that (except for a scale factor) the spatial frequency
curve for the zero temporal frequency is just the solid curve of
Fig. 2. Fig. 3(b) shows the temporal frequency response that is
obtained by integrating over the spatial frequencies.
IV. P
ERCEPTUAL FEEDBACK IN THE MOTION ESTIMATION
Any approximation to the actual perceptual quantization
process
has an obvious application in the DFD quantizer
design, but it may also have interesting effects on the com-
putation of the DVF if the proper feedback from the DFD
quantization is established in the prediction loop.
If all the details of the DFD are considered to be of equal im-
portance, we would have an unweighted splitting criterion as in
the difference-based criteria [6], [29], [32], [33] or as in the spa-
tial entropy-based criterion of Dufaux et al. [7], [8]. However,
as the DFD is going to be simplified by some nontrivial quan-
tizer
, which represents the selective bottleneck of early per-
ception, not every additional detail predicted by a better motion
compensation will be significant to the quantizer. In this way,
the motion estimation effort has to be focused on the moving
regions that contain perceptually significant motion informa-
tion. In order to formalize the concept of perceptually signifi-
cant motion information, the work of Watson [36] and Daugman
[37] on entropy reduction in the HVS should be taken into ac-
count. They assume a model of early contrast processing based
on a pair (
) and suggest that the entropy of the cortical
scene representation (a measure of the perceptual entropy of the
signal) is just the entropy of the quantized version of the trans-
formed image. Therefore, a measure of the perceptual entropy
of a signal is
T (12)
Using this perceptual entropy measure (which is simply the
entropy of the output of a MPE quantizer), we can propose an
explicit definitionof what perceptually significant motion infor-
mation is. Let us motivate the definition as follows. Given a cer-

Citations
More filters
01 Dec 1996

452 citations

Journal ArticleDOI
TL;DR: A new perceptually-adaptive video coding (PVC) scheme for hybrid video compression is explored, in order to achieve better perceptual coding quality and operational efficiency and to integrate spatial masking factors with the nonlinear additivity model for masking (NAMM).
Abstract: We explore a new perceptually-adaptive video coding (PVC) scheme for hybrid video compression, in order to achieve better perceptual coding quality and operational efficiency. A new just noticeable distortion (JND) estimator for color video is first devised in the image domain. How to efficiently integrate masking effects together is a key issue of JND modelling. We integrate spatial masking factors with the nonlinear additivity model for masking (NAMM). The JND estimator applies to all color components and accounts for the compound impact of luminance masking, texture masking and temporal masking. Extensive subjective viewing confirms that it is capable of determining a more accurate visibility threshold that is close to the actual JND bound in human eyes. Secondly, the image-domain JND profile is incorporated into hybrid video encoding via the JND-adaptive motion estimation and residue filtering process. The scheme works with any prevalent video coding standards and various motion estimation strategies. To demonstrate the effectiveness of the proposed scheme, it has been implemented in the MPEG-2 TM5 coder and demonstrated to achieve average improvement of over 18% in motion estimation efficiency, 0.6 dB in average peak signal-to perceptual-noise ratio (PSPNR) and most remarkably, 0.17 dB in the objective coding quality measure (PSNR) on average. Theoretical explanation is presented for the improvement on the objective coding quality measure. With the JND-based motion estimation and residue filtering process, hybrid video encoding can be more efficient and the use of bits is optimized for visual quality.

305 citations


Cites background or methods from "Perceptual feedback in multigrid mo..."

  • ...A few attempts have been made to non-standard video coding [6,26]....

    [...]

  • ...In [26], a subband JND model has been used in the quantization process and also in controlling the block splitting process in variable-size motion search....

    [...]

  • ..., discrete cosine transform (DCT) or wavelet domain) [1,2,31,33–36,14,15,26,8] and image-domain [6–8]....

    [...]

Journal ArticleDOI
TL;DR: A foveation model as well as a foveated JND (FJND) model in which the spatial and temporal JND models are enhanced to account for the relationship between visibility and eccentricity is described.
Abstract: Traditional video compression methods remove spatial and temporal redundancy based on the signal statistical correlation. However, to reach higher compression ratios without perceptually degrading the reconstructed signal, the properties of the human visual system (HVS) need to be better exploited. Research effort has been dedicated to modeling the spatial and temporal just-noticeable-distortion (JND) based on the sensitivity of the HVS to luminance contrast, and accounting for spatial and temporal masking effects. This paper describes a foveation model as well as a foveated JND (FJND) model in which the spatial and temporal JND models are enhanced to account for the relationship between visibility and eccentricity. Since the visual acuity decreases when the distance from the fovea increases, the visibility threshold increases with increased eccentricity. The proposed FJND model is then used for macroblock (MB) quantization adjustment in H.264/advanced video coding (AVC). For each MB, the quantization parameter is optimized based on its FJND information. The Lagrange multiplier in the rate-distortion optimization is adapted so that the MB noticeable distortion is minimized. The performance of the FJND model has been assessed with various comparisons and subjective visual tests. It has been shown that the proposed FJND model can increase the visual quality versus rate performance of the H.264/AVC video coding scheme.

194 citations

Journal ArticleDOI
TL;DR: A new numerical measure for visual attention's modulatory aftereffects, perceptual quality significance map (PQSM), is proposed and demonstrates the performance improvement on two PQSM-modulated visual sensitivity models and two P QSM-based visual quality metrics.
Abstract: With the fast development of visual noise-shaping related applications (visual compression, error resilience, watermarking, encryption, and display), there is an increasingly significant demand on incorporating perceptual characteristics into these applications for improved performance. In this paper, a very important mechanism of the human brain, visual attention, is introduced for visual sensitivity and visual quality evaluation. Based upon the analysis, a new numerical measure for visual attention's modulatory aftereffects, perceptual quality significance map (PQSM), is proposed. To a certain extent, the PQSM reflects the processing ability of the human brain on local visual contents statistically. The PQSM is generated with the integration of local perceptual stimuli from color contrast, texture contrast, motion, as well as cognitive features (skin color and face in this study). Experimental results with subjective viewing demonstrate the performance improvement on two PQSM-modulated visual sensitivity models and two PQSM-based visual quality metrics.

194 citations

Journal ArticleDOI
TL;DR: A new JND estimator for color video is devised in image-domain with the nonlinear additivity model for masking and is incorporated into a motion-compensated residue signal preprocessor for variance reduction toward coding quality enhancement, and both perceptual quality and objective quality are enhanced in coded video at a given bit rate.
Abstract: We present a motion-compensated residue signal preprocessing scheme in video coding scheme based on just-noticeable-distortion (JND) profile Human eyes cannot sense any changes below the JND threshold around a pixel due to their underlying spatial/temporal masking properties An appropriate (even imperfect) JND model can significantly help to improve the performance of video coding algorithms From the viewpoint of signal compression, smaller variance of signal results in less objective distortion of the reconstructed signal for a given bit rate In this paper, a new JND estimator for color video is devised in image-domain with the nonlinear additivity model for masking (NAMM) and is incorporated into a motion-compensated residue signal preprocessor for variance reduction toward coding quality enhancement As the result, both perceptual quality and objective quality are enhanced in coded video at a given bit rate A solution of adaptively determining the parameter for the residue preprocessor is also proposed The devised technique can be applied to any standardized video coding scheme based on motion compensated prediction It provides an extra design option for quality control, besides quantization, in contrast with most of the existing perceptually adaptive schemes which have so far focused on determination of proper quantization steps As an example for demonstration, the proposed scheme has been implemented in the MPEG-2 TM5 coder, and achieved an average peak signal-to-noise (PSNR) increment of 0505 dB over the twenty video sequences which have been tested The perceptual quality improvement has been confirmed by the subjective viewing tests conducted

191 citations


Cites background or methods from "Perceptual feedback in multigrid mo..."

  • ...A few attempts have been made to nonstandard video coding [14], [ 9 ]....

    [...]

  • ...In [ 9 ], subband JND has been used in the quantization process and also in controlling the block splitting process in variable-size motion search....

    [...]

References
More filters
Journal ArticleDOI
S. P. Lloyd1
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Abstract: It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.

11,872 citations

S. P. Lloyd1
01 Jan 1982
TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
Abstract: It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.

9,602 citations


"Perceptual feedback in multigrid mo..." refers background in this paper

  • ...Moreover, the MPE quantizers reduce to the JPEG and MPEG quantizers if a simple (linear) perception model is considered....

    [...]

Book
01 Jan 1991
TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
Abstract: 1 Introduction- 11 Signals, Coding, and Compression- 12 Optimality- 13 How to Use this Book- 14 Related Reading- I Basic Tools- 2 Random Processes and Linear Systems- 21 Introduction- 22 Probability- 23 Random Variables and Vectors- 24 Random Processes- 25 Expectation- 26 Linear Systems- 27 Stationary and Ergodic Properties- 28 Useful Processes- 29 Problems- 3 Sampling- 31 Introduction- 32 Periodic Sampling- 33 Noise in Sampling- 34 Practical Sampling Schemes- 35 Sampling Jitter- 36 Multidimensional Sampling- 37 Problems- 4 Linear Prediction- 41 Introduction- 42 Elementary Estimation Theory- 43 Finite-Memory Linear Prediction- 44 Forward and Backward Prediction- 45 The Levinson-Durbin Algorithm- 46 Linear Predictor Design from Empirical Data- 47 Minimum Delay Property- 48 Predictability and Determinism- 49 Infinite Memory Linear Prediction- 410 Simulation of Random Processes- 411 Problems- II Scalar Coding- 5 Scalar Quantization I- 51 Introduction- 52 Structure of a Quantizer- 53 Measuring Quantizer Performance- 54 The Uniform Quantizer- 55 Nonuniform Quantization and Companding- 56 High Resolution: General Case- 57 Problems- 6 Scalar Quantization II- 61 Introduction- 62 Conditions for Optimality- 63 High Resolution Optimal Companding- 64 Quantizer Design Algorithms- 65 Implementation- 66 Problems- 7 Predictive Quantization- 71 Introduction- 72 Difference Quantization- 73 Closed-Loop Predictive Quantization- 74 Delta Modulation- 75 Problems- 8 Bit Allocation and Transform Coding- 81 Introduction- 82 The Problem of Bit Allocation- 83 Optimal Bit Allocation Results- 84 Integer Constrained Allocation Techniques- 85 Transform Coding- 86 Karhunen-Loeve Transform- 87 Performance Gain of Transform Coding- 88 Other Transforms- 89 Sub-band Coding- 810 Problems- 9 Entropy Coding- 91 Introduction- 92 Variable-Length Scalar Noiseless Coding- 93 Prefix Codes- 94 Huffman Coding- 95 Vector Entropy Coding- 96 Arithmetic Coding- 97 Universal and Adaptive Entropy Coding- 98 Ziv-Lempel Coding- 99 Quantization and Entropy Coding- 910 Problems- III Vector Coding- 10 Vector Quantization I- 101 Introduction- 102 Structural Properties and Characterization- 103 Measuring Vector Quantizer Performance- 104 Nearest Neighbor Quantizers- 105 Lattice Vector Quantizers- 106 High Resolution Distortion Approximations- 107 Problems- 11 Vector Quantization II- 111 Introduction- 112 Optimality Conditions for VQ- 113 Vector Quantizer Design- 114 Design Examples- 115 Problems- 12 Constrained Vector Quantization- 121 Introduction- 122 Complexity and Storage Limitations- 123 Structurally Constrained VQ- 124 Tree-Structured VQ- 125 Classified VQ- 126 Transform VQ- 127 Product Code Techniques- 128 Partitioned VQ- 129 Mean-Removed VQ- 1210 Shape-Gain VQ- 1211 Multistage VQ- 1212 Constrained Storage VQ- 1213 Hierarchical and Multiresolution VQ- 1214 Nonlinear Interpolative VQ- 1215 Lattice Codebook VQ- 1216 Fast Nearest Neighbor Encoding- 1217 Problems- 13 Predictive Vector Quantization- 131 Introduction- 132 Predictive Vector Quantization- 133 Vector Linear Prediction- 134 Predictor Design from Empirical Data- 135 Nonlinear Vector Prediction- 136 Design Examples- 137 Problems- 14 Finite-State Vector Quantization- 141 Recursive Vector Quantizers- 142 Finite-State Vector Quantizers- 143 Labeled-States and Labeled-Transitions- 144 Encoder/Decoder Design- 145 Next-State Function Design- 146 Design Examples- 147 Problems- 15 Tree and Trellis Encoding- 151 Delayed Decision Encoder- 152 Tree and Trellis Coding- 153 Decoder Design- 154 Predictive Trellis Encoders- 155 Other Design Techniques- 156 Problems- 16 Adaptive Vector Quantization- 161 Introduction- 162 Mean Adaptation- 163 Gain-Adaptive Vector Quantization- 164 Switched Codebook Adaptation- 165 Adaptive Bit Allocation- 166 Address VQ- 167 Progressive Code Vector Updating- 168 Adaptive Codebook Generation- 169 Vector Excitation Coding- 1610 Problems- 17 Variable Rate Vector Quantization- 171 Variable Rate Coding- 172 Variable Dimension VQ- 173 Alternative Approaches to Variable Rate VQ- 174 Pruned Tree-Structured VQ- 175 The Generalized BFOS Algorithm- 176 Pruned Tree-Structured VQ- 177 Entropy Coded VQ- 178 Greedy Tree Growing- 179 Design Examples- 1710 Bit Allocation Revisited- 1711 Design Algorithms- 1712 Problems

7,015 citations


"Perceptual feedback in multigrid mo..." refers background in this paper

  • ...The design of a transform quantizer for a given block transform involves finding the optimal number of quantization levels for each coefficient (bit allocation) and the optimal distribution of these quantization levels in each case [27]....

    [...]

  • ...Moreover, the MPE quantizers reduce to the JPEG and MPEG quantizers if a simple (linear) perception model is considered....

    [...]

Journal ArticleDOI
TL;DR: These comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques the authors implemented.
Abstract: While different optical flow techniques continue to appear, there has been a lack of quantitative evaluation of existing methods. For a common set of real and synthetic image sequences, we report the results of a number of regularly cited optical flow techniques, including instances of differential, matching, energy-based, and phase-based methods. Our comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques we implemented.

4,771 citations

Journal ArticleDOI
TL;DR: Design of the MPEG algorithm presents a difficult challenge since quality requirements demand high compression that cannot be achieved with only intraframe coding, and the algorithm’s random access requirement is best satisfied with pure intraframes coding.
Abstract: The Moving Picture Experts Group (MPEG) standard addresses compression of video signals at approximately 1.5M-bits. MPEG is a generic standard and is independent of any particular applications. Applications of compressed video on digital storage media include asymmetric applications such as electronic publishing, games and entertainment. Symmetric applications of digital video include video mail, video conferencing, videotelephone and production of electronic publishing. Design of the MPEG algorithm presents a difficult challenge since quality requirements demand high compression that cannot be achieved with only intraframe coding. The algorithm’s random access requirement, however, is best satisfied with pure intraframe coding. MPEG uses predictive and interpolative coding techniques to answer this challenge. Extensive details are presented.

2,447 citations


"Perceptual feedback in multigrid mo..." refers background or methods in this paper

  • ...REFERENCES [1] D. LeGall, “MPEG: A video compression standard for multimedia applications,”Commun....

    [...]

  • ...1 shows the product for the linear (MPEGlike) and the nonlinear MPE quantizers....

    [...]

  • ...To achieve this aim, current video coders are based on motion compensation and two-dimensional (2-D) transform coding of the residual error [1]–[4]....

    [...]

  • ...• Proposed scheme versus previous comparable schemes (H.263, MPEG-1)....

    [...]

  • ...8 shows a representative example of the decoded results using the same MPEG-like quantizer and the different considered motion estimations at a fixed bit-rate (frame 7 of theTaxi sequence)....

    [...]

Frequently Asked Questions (1)
Q1. What are the contributions in "Perceptual feedback in multigrid motion estimation using an improved dct quantization" ?

In this paper, a multigrid motion compensation video coder based on the current human visual system ( HVS ) contrast discrimination models is proposed.