scispace - formally typeset
Open AccessJournal ArticleDOI

Rate-constrained coder control and comparison of video coding standards

Reads0
Chats0
TLDR
A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented.
Abstract
A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented. The performance of the various standards is compared by means of PSNR and subjective testing results. The results indicate that H.264/AVC compliant encoders typically achieve essentially the same reproduction quality as encoders that are compliant with the previous standards while typically requiring 60% or less of the bit rate.

read more

Content maybe subject to copyright    Report

688 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003
Rate-Constrained Coder Control and Comparison
of Video Coding Standards
Thomas Wiegand, Heiko Schwarz, Anthony Joch, Faouzi Kossentini, Senior Member, IEEE, and
Gary J. Sullivan, Senior Member, IEEE
Abstract—A unified approach to the coder control of video
coding standards such as MPEG-2, H.263, MPEG-4, and the draft
video coding standard H.264/AVC is presented. The performance
of the various standards is compared by means of PSNR and
subjective testing results. The results indicate that H.264/AVC
compliant encoders typically achieve essentially the same repro-
duction quality as encoders that are compliant with the previous
standards while typically requiring 60% or less of the bit rate.
Index Terms—Coder control, Lagrangian, H.263, H.264/AVC,
MPEG-2, MPEG-4, rate-constrained, standards, video.
I. INTRODUCTION
T
HE specifications of most video coding standards in-
cluding MPEG-2 Visual [1] , H.263 [2], MPEG-4 Visual
[3] and H.264/AVC [4] provide only the bit-stream syntax and
the decoding process in order to enable interoperability. The
encoding process is left out of the scope to permit flexible
implementations. However, the operational control of the
source encoder is a key problem in video compression. For the
encoding of a video source, many coding parameters such as
macroblock modes, motion vectors, and transform coefficient
levels have to be determined. The chosen values determine the
rate-distortion efficiency of the produced bitstream of a given
encoder.
In this paper, the operational control of MPEG-2, H.263,
MPEG-4, and H.264/AVC encoders is optimized with respect
to their rate-distortion efficiency using Lagrangian optimization
techniques. The optimization is based on [5] and [6], where
the encoder control for the ITU-T Recommendation H.263
[2] is addressed. The Lagrangian coder control as described
in this paper was also integrated into the test models TMN-10
[7] and JM-2 [8] for the ITU-T Recommendation H.263 and
H.264/AVC, respectively. The same Lagrangian coder control
method was also applied to the MPEG-4 verification model
VM-18 [9] and the MPEG-2 test model TM-5 [10]. In addition
to achieving performance gains, the use of similar rate-dis-
Manuscript received December 12, 2001; revised May 10, 2003.
T. Wiegand and H. Schwarz are with the Fraunhofer-Institute for Telecom-
munications, Heinrich-Hertz Institute, 10587 Berlin, Germany (e-mail:
wiegand@hhi.de; hschwarz@hhi.de)
A. Joch is with UB Video Inc., Vancouver, BC V6B 2R9, Canada (e-mail:
anthony@ubvideo.com).
F. Kossentini is with the Department of Electrical and Computer Engi-
neering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
(e-mail: faouzi@ece.ubc.ca).
G. J. Sullivan is with the Microsoft Corporation, Redmond, WA 98052 USA
(e-mail: garysull@microsoft.com).
Digital Object Identifier 10.1109/TCSVT.2003.815168
tortion optimization methods in all encoders allows a useful
comparison between the encoders in terms of coding efficiency.
This paper is organized as follows. Section II gives an
overview of the syntax features of MPEG-2 Video, H.263,
MPEG-4 Visual, and H.264/AVC. The rate-distortion-opti-
mized codercontrolis describedin Section III,and experimental
results are presented in Section IV.
II. S
TANDARD SYNTAX AND DECODERS
All ITU-T and ISO/IEC JTC1 standards since H.261 [11]
have incommon that theyare based onthe so-called block-based
hybrid video coding approach. The basic source-coding algo-
rithm is a hybrid of inter-picture prediction to utilize temporal
redundancy and transform coding of the prediction error signal
to reduce spatial redundancy. Each picture of a video signal
is partitioned into fixed-size macroblocks of 16
16 samples,
which can be transmitted in one of several coding modes de-
pending on the picture or slice coding type. Common to all
standards is the definition of INTRA coded pictures or I-pic-
tures. In I-pictures, all macroblocks are coded without refer-
ring to other pictures in the video sequence. Also common is
the definition of predictive-coded pictures, so-called P-pictures
and B-pictures, with the latter being extended conceptually in
H.264/AVC coding. In predictive-coded pictures, typically one
of a variety of INTER coding modes can be chosen to encode
each macroblock.
In order to manage the large number of coding tools included
in standards and the broad range of formats and bit rates sup-
ported, the concept of profiles and levels is typically employed
to define a set of conformance points, each targeting a specific
class of applications. These conformance points are designed
to facilitate interoperability between various applications of the
standard that have similar functional requirements. A profile de-
fines a set of coding tools or algorithms that can be used in gen-
erating a compliant bitstream, whereas a level places constraints
on certain key parameters of the bitstream, such as the picture
resolution and bit rate.
Although MPEG-2, H.263, MPEG-4, and H.264/AVC de-
fine similar coding algorithms, they contain features and en-
hancements that make them differ. These differences mainly
involve the formation of the prediction signal, the block sizes
used for transform coding, and the entropy coding methods. In
the following, the description of the various standards is limited
to those features relevant to the comparisons described in this
paper.
1051-8215/03$17.00 © 2003 IEEE

WIEGAND et al.: RATE-CONSTRAINED CODER CONTROL AND COMPARISON OF VIDEO CODING STANDARDS 689
A. ISO/IEC Standard 13818-2/ITU-T Recommendation
H.262: MPEG-2
MPEG-2 forms the heart of broadcast-quality digital tele-
vision for both standard-definition and high-definition televi-
sion (SDTV and HDTV) [1], [12], [13]. MPEG-2 video (IS
13 818-2/ITU-T Recommendation H.262) was designed to en-
compass MPEG-1 [14] and to also provide high quality with
interlaced video sources at bit rates in the range of 4–30 Mbit/s.
Although usually thought of as an ISO standard, MPEG-2 video
was developed as an official joint project of both the ISO/IEC
JTC1 and ITU-T organizations, and was completed in late 1994.
MPEG-2 incorporates various features from H.261 and
MPEG-1. It uses the basic coding structure that is still predomi-
nant today. For each macroblock, which consists of one 16
16
luminance block and two 8
8 chrominance blocks for 4:2:0
formatted video sequences, a syntax element indicating the
macroblock coding mode (and signalling a quantizer change)
is transmitted. While all macroblocks of I-pictures are coded
in INTRA mode, macroblocks of P-pictures can be coded in
INTRA, INTER-16
16, or SKIP mode. For the SKIP mode,
runs of consecutive skipped macroblocks are transmitted and
the representation of the picture in the skipped region is rep-
resented using INTER prediction without adding any residual
difference representation. In B-pictures, the prediction signal
for the motion-compensated INTER-16
16 mode can be
formed by forward, backward, or bidirectionally interpolated
prediction. The motion compensation is generally based on
16
16 blocks and utilizes half-pixel accurate motion vectors,
with bilinear interpolation of half-pixel positions. The motion
vectors are predicted from a single previously encoded motion
vector in the same slice.
Texture coding is conducted using a DCT on blocks of 8
8
samples, and uniform scalar quantization (with the exception
of the central dead-zone) is applied that can be adjusted using
quantization values from 2 to 62. Additionally, a perceptually
weighted matrix based on the frequency of each transform co-
efficient (except the Intra DC coefficient) can be used. The en-
tropy coding is performed using zig-zag scanning and two-di-
mensional run-level variable-length coding (VLC). There are
two available VLC tables for transmitting the transform coef-
ficient levels, of which one must be used for predictive-coded
macroblocks and either can be used for INTRA macroblocks,
as selected by the encoder on the picture level.
For the coding of interlaced video sources, MPEG-2 pro-
vides the concept of field pictures and field-coded macroblocks
in frame pictures. The top and bottom field of an interlaced
frame can be coded together as frame picture or as two sepa-
rate field pictures. In addition to the macroblock coding modes
described above, field-picture macroblocks can also be coded in
INTER-16
8 prediction mode, in which two different predic-
tion signals are used, one for the upper and one for the lower half
of a macroblock. For macroblocks in frame pictures, a similar
coding mode is provided that uses different prediction signals
for the top and bottom field lines of a macroblock. Macroblocks
of both field and frame pictures can also be transmitted in dual
prime mode. In this coding mode, the final prediction for each
field is formed by averaging two prediction signals, of which
one is obtained by referencing the field with the same parity
and the other is obtained by referencing the field with the oppo-
site parity as the current field. For coding of the residual data,
MPEG-2 provides the possibility to use an alternative scanning
pattern, which can be selected on picture level, and to choose
between a frame- and field-based DCT coding of the prediction
error signal.
The most widely implemented conformance point in the
MPEG-2 standard is the Main profile at the Main Level
(MP@ML). MPEG-2 MP@ML compliant encoders find
application in DVD-video, digital cable television, terrestrial
broadcast of standard definition television, and direct-broadcast
satellite (DBS) systems. This conformance point supports
coding of CCIR 601 content at bit rates up to 15 Mbit/s and
permits use of B-pictures and interlaced prediction modes. In
this work, an MPEG-2 encoder is included in the comparisons
of video encoders for streaming and entertainment applications.
The MPEG-2 bitstreams generated for our comparisons are
compliant with the popular MP@ML conformance point with
exception of the HDTV bitstreams, which are compliant with
the MP@HL conformance point.
B. ITU-T Recommendation H.263
The first version of ITU-T Recommendation H.263 [2]
defines a basic source-coding algorithm similar to that of
MPEG-2, utilizing the INTER-16
16, INTRA, and SKIP
coding modes. But H.263 Baseline contains significant changes
that make it more efficient at lower bit rates including median
motion vector prediction and three-dimensional run-level-last
VLC with tables optimized for lower bit rates.
Moreover, version 1 of H.263 contains eight Annexes (An-
nexes A–G) including four Annexes permitting source coding
options (Annexes D, E, F,and G) for improvedcompression per-
formance. Annexes D and F are in frequent use today. Annex D
specifies the option for motion vectors to point outside the ref-
erence picture and to have longer motion vectors than H.263
Baseline. Annex F specifies the use of overlapped block mo-
tion compensation and four motion vectors per macroblock with
each motion vector assigned to an 8
8 subblock, i.e., the use of
variable block sizes. Hence, an INTER-8
8 coding mode is
added to the set of possible macroblock modes.
H.263+ is the second version of H.263 [2], [15], where sev-
eral optional features are added to H.263 as Annexes I through
T. Annex J of H.263+ specifies a deblocking filter that is ap-
plied inside the motion prediction loop and is used together with
the variable block-size feature of Annex F. H.263+ also adds
some improvements in compression efficiency for the INTRA
macroblock mode through prediction of intra-DCT transform
coefficients from neighboring blocks and specialized quantiza-
tion and VLC coding methods for intra coefficients. This ad-
vanced syntax is described in Annex I of the ITU-T Recom-
mendation H.263+. Annex I provides significant rate-distortion
improvementsbetween 1 and 2 dB compared to the H.263 Base-
line INTRA macroblock coding mode when utilizing the same
amount of bits for both codecs [15]. Annex Tof H.263+ removes
some limitations of the Baseline syntax in terms of quantization
and also improves chrominance fidelity by specifying a smaller

690 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003
step size for chrominance coefficients than for luminance. The
remaining Annexes contain additional functionalities including
specifications for custom and flexible video formats, scalability,
and backward-compatible supplemental enhancement informa-
tion.
A second set of extensions that adds three more optional
modes to H.263 [2] was completed and approved late in the
year 2000. This version is often referred to as H.263++. The
data partitioned slice mode (Annex V) can provide enhanced
resilience to bit-stream corruption, which typically occurs
during transmission over wireless channels, by separating
header and motion vector information from transform coef-
ficients. Annex W specifies additional backward-compatible
supplemental enhancement information including interlaced
field indications, repeated picture headers, and the indication
of the use of a specific fixed-point inverse DCT. Compression
efficiency and robustness to packet loss can be improved by
using the enhanced reference picture selection mode (Annex
U), which enables long-term memory motion compensation
[22], [23]. In this mode, the spatial displacement vectors that
indicate motion-compensated prediction blocks are extended
by variable time delay, permitting the predictions to originate
from reference pictures other than the most recently decoded
reference picture. Motion-compensation performance is im-
proved because of the larger number of possible predictions
that are available by including more reference frames in the
motion search. In Annex U, two modes are available for the
buffering of reference pictures. The sliding-window mode—in
which only the most recent reference pictures are stored—is
the simplest and most commonly implemented mode. In the
more flexible adaptive buffering mode, buffer management
commands can be inserted into the bitstream as side informa-
tion, permitting an encoder to specify how long each reference
picture remains available for prediction, with a constraint on
the total size of the picture buffer. The maximum number of
reference pictures is typically 5 or 10 when conforming to one
of H.263’s normative profiles, which are discussed next.
The ITU-T has recently approved Annex X of H.263, which
provides a normative definition of profiles, or preferred combi-
nations of optional modes, and levels, which specify maximum
values for several key parameters of an H.263 bitstream. Similar
to their use in MPEG-2, each profile is designed to target a spe-
cific key application, or group of applications that require sim-
ilar functionality. In this work, the rate-distortion capabilities
of the Baseline profile and the Conversational High Compres-
sion (CHC) profile are compared to other standards for use in
videoconferencing applications. The Baseline profile supports
only Baseline H.263 syntax (i.e., no optional modes) and exists
to provide a profile designation to the minimal capability that
all compliant decoders must support. The CHC profile includes
most of the optional modes that provide enhanced coding effi-
ciency without the added delay that is introduced by B-pictures
and without any optional error resilience features. Hence, it is
the best profile to demonstrate the optimal rate-distortion capa-
bilities of the H.263 standard for use in interactive video appli-
cations. Additionally, the High-Latency (HL) profile of H.263,
which adds support for B-pictures to the coding efficiency tools
of the CHC profile, is included in the comparison of encoders
for streaming applications, in which the added delay introduced
by B-pictures is acceptable.
C. ISO/IEC Standard 14496-2: MPEG-4
MPEG-4 Visual [3] standardizes efficient coding methods for
many types of audiovisual data, including natural video con-
tent. For this purpose, MPEG-4 Visual uses the Baseline H.263
algorithm as a starting point so that all compliant MPEG-4 de-
coders must be able to decode any valid Baseline H.263 bit-
stream. However, MPEG-4 includes several additional features
that can improve coding efficiency.
While spatial coding in MPEG-4 uses the 8
8 DCT and
scalar quantization, MPEG-4 supports two different scalar
quantization methods that are referred to as MPEG-style and
H.263-style. In the MPEG-style quantization, perceptually
weighted matrices, similar to those used in MPEG-2 assign
a specific quantizer to each coefficient in a block, whereas
in the H.263 method, the same quantizer is used for all ac
coefficients. Quantization of DC coefficients uses a special
nonlinear scale that is a function of the quantization parameter.
Quantized coefficients are scanned in a zig-zag pattern and
assigned run-length codes, as in H.263. MPEG-4 also includes
alternate scan patterns for horizontally and vertically predicted
INTRA blocks and the use of a separate VLC table for INTRA
coefficients. These techniques are similar to those defined in
Annex I of H.263.
Motion compensation in MPEG-4 is based on 16
16 blocks
and supports variable block sizes, as in Annex F of H.263,
so that one motion vector can be specified for each of the
8
8 subblocks of a macroblock, permitting the use of the
INTER-8
8 mode. Version 1 of MPEG-4 supports onlymotion
compensation at half-pixel accuracy, with bilinear interpolation
used to generate values at half-pixel positions. Version 2 of
MPEG-4 additionally supports the use of quarter-pixel accurate
motion compensation,with a windowed 8-tap sincfunction used
to generate half-pixel positions and bilinear interpolation for
quarter-pixel positions. Motion vectors are permitted to point
outside the reference picture and are encoded differentially
after median prediction, according to H.263. MPEG-4 does
not include a normative de-blocking filter inside the motion
compensation loop, as in Annex J of H.263, but post filters
may be applied to the reconstructed output at the decoder
to improve visual quality.
The MPEG-4 Simple profile includes all features mentioned
above, with the exception of the MPEG-style quantization
method and quarter-pixel motion compensation. The Advanced
Simple profile adds these two features, plus B-pictures, global
motion compensation (GMC) and special tools for efficient
coding of interlaced video. A video coder compliant with the
Simple profile and the Advanced Simple profile will be used
in our experiments.
D. ITU-T Recommendation H.264/ISO/IEC Standard
14496-10 AVC: H.264/AVC
H.264/AVC [4] is the latest joint project of the ITU-T VCEG
and ISO/IEC MPEG. The H.264/AVC design covers a video
coding layer (VCL) and a Network Adaptation Layer (NAL).

WIEGAND et al.: RATE-CONSTRAINED CODER CONTROL AND COMPARISON OF VIDEO CODING STANDARDS 691
Although the VCL design basically follows the design of prior
video coding standards such as MPEG-2, H.263, and MPEG-4,
it contains newfeatures that enable it to achieve a significant im-
provement in compression efficiency in relation to prior coding
standards. For details, please refer to [16]. Here, we will give a
very brief description of the necessary parts of H.264/AVC in
order to make the paper more self-contained.
In H.264/AVC, blocks of 4
4 samples are used for transform
coding, and thus a macroblock consists of 16 luminance and 8
chrominance blocks. Similar tothe I-, P-, and B-pictures defined
for MPEG-2, H.263, and MPEG-4, the H.264/AVC syntax sup-
ports I-, P-, and B-slices. A macroblock can always be coded in
one of several INTRA coding modes. There are two classes of
INTRA coding modes, which are denoted as INTRA-16
16
and INTRA-4
4 in the following. In contrast to previous stan-
dards where only some of the DCT-coefficients can be predicted
from neighboring INTRA-blocks, in H.264/AVC, prediction is
always utilized in the spatial domain by referring to neighboring
samples of already coded blocks. When using the INTRA-4
4
mode, each 4
4 block of the luminance component utilizes
one of nine prediction modes. The chosen modes are trans-
mitted as side information. With the INTRA-16
16 mode, a
uniform prediction is performed for the whole luminance com-
ponent of a macroblock. Four prediction modes are supported in
the INTRA-16
16 mode. For both classes of INTRA coding
modes, the chrominance components are predicted using one of
four possible prediction modes.
In addition to the INTRA modes, H.264/AVC provides var-
ious other motion-compensated coding modes for macroblocks
in P-slices. Each motion-compensated mode corresponds to
a specific partition of the macroblock into fixed size blocks
used for motion description. Macroblock partitions with block
sizes of 16
16, 16 8, 8 16, and 8 8 luminance samples are
supported by the syntax corresponding to the INTER-16
16,
INTER-16
8, INTER-8 16, and INTER-8 8 macroblock
modes, respectively. In case the INTER-8
8 macroblock
mode is chosen, each of the 8
8 submacroblocks can be
further partitioned into blocks of 8
8, 8 4, 4 8, or 4 4 lu-
minance samples. H.264/AVC generally supports multi-frame
motion-compensated prediction. That is, similar to Annex
U of H.263, more than one prior coded picture can be used
as reference for the motion compensation. In H.264/AVC,
motion compensation is performed with quarter-pixel accurate
motion vectors. Prediction values at half-pixel locations are
obtained by applying a one-dimensional six-tap finite impulse
response (FIR) filter in each direction requiring a half-sample
offset (horizontal or vertical or both, depending on the value
of the motion vector), and prediction values at quarter-pixel
locations are generated by averaging samples at the integer-
and half-pixel positions. The motion vector components are
differentially coded using either median or directional predic-
tion from neighboring blocks.
In comparison to MPEG-2, H.263, and MPEG-4, the concept
of B-slices is generalized in H.264/AVC. For details please refer
to [17]. B-slices utilize two distinct reference picture lists, and
four different types of INTER prediction are supported: list 0,
list 1, bi-predictive, and direct prediction. While list 0 predic-
tion indicates that the prediction signal is formed by motion
compensation from a picture of the first reference picture list,
a picture of the second reference picture list is used for building
the prediction signal if list 1 prediction is used. In the bi-pre-
dictive mode, the prediction signal is formed by a weighted
average of a motion-compensated list 0 and list 1 prediction
signal. The direct prediction mode differs from the one used in
H.263 andMPEG-4 in that no delta motion vectoris transmitted.
Furthermore, there are two methods for obtaining the predic-
tion signal, referred to as temporal and spatial direct prediction,
which can be selected by an encoder on the slice level. B-slices
utilize a similar macroblock partitioning to P-slices. Besides the
INTER-16
16, INTER-16 8, INTER-8 16, INTER-8 8
and the INTRA modes, a macroblock mode that utilizes direct
prediction, the DIRECT mode, is provided. Additionally, for
each 16
16, 16 8, 8 16, and 8 8 partition, the prediction
method (list 0, list 1, bi-predictive) can be chosen separately.
An 8
8 partition of a B-slice macroblock can also be coded in
DIRECT-8
8 mode. If no prediction error signal is transmitted
for a DIRECT macroblock mode, it is also referred to as B-slice
SKIP mode.
H.264/AVC is basically similar to prior coding standards in
that it utilizes transform coding of the prediction error signal.
However, in H.264/AVC the transformation is applied to 4
4
blocks and, instead of the DCT, H.264/AVC uses a separable
integer transform with basically the same properties as a 4
4
DCT. Since the inverse transform is defined by exact integer
operations, inverse-transform mismatches are avoided. An ad-
ditional 2
2 transform is applied to the four DC coefficients of
each chrominance component. If the INTRA 16
16-mode is in
use, a similar operation extending the length of the transform
basis functions is performed on the 4
4 DC coefficients of the
luminance signal.
For the quantization of transform coefficients, H.264/AVC
uses scalar quantization, but without an extra-wide dead-zone
around zero as found in H.263 and MPEG-4. One of 52 quan-
tizers is selected for each macroblock by the quantization pa-
rameter
. The quantizers are arranged in a way that there is
an increase of approximately 12.5% in quantization step size
when incrementing
by one. The transform coefficient levels
are scanned in a zig-zag fashion if the block is part of a mac-
roblock coded in frame mode; for field-mode macroblocks, an
alternative scanning pattern is used. The 2
2 DC coefficients of
the chrominance components are scanned in raster-scan order.
All syntax elements of a macroblock including the vectors of
scanned transform coefficient levels are transmitted by entropy
coding methods.
Two methods of entropy coding are supported by
H.264/AVC. The default entropy coding method uses a
single infinite-extend codeword set for all syntax elements
except the residual data. The vectors of scanned transform
coefficient levels are transmitted using a more sophisticated
method called context-adaptive VLC (CAVLC). This scheme
basically uses the concept of run-length coding as it is found
in MPEG-2, H.263, and MPEG-4; however, VLC tables for
various syntax elements are switched depending on the values
of previously transmitted syntax elements. Since the VLC
tables are well designed to match the corresponding conditional
statistics, the entropy coding performance is improved in com-

692 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 7, JULY 2003
parison to schemes using a single VLC table. The efficiency of
entropy coding can be improved further if the context-adaptive
binary arithmetic coding (CABAC) is used. On the one hand,
the usage of arithmetic coding allows the assignment of a
noninteger number of bits to each symbol of an alphabet,
which is extremely beneficial for symbol probabilities much
greater than 0.5. On the other hand, the usage of adaptive codes
permits adaptation to nonstationary symbol statistics. Another
important property of CABAC is its context modeling. The
statistics of already coded syntax elements are used to estimate
conditional probabilities of coding symbols. Inter-symbol
redundancies are exploited by switching several estimated
probability models according to already coded symbols in
the neighborhood of the symbol to encode. For details about
CABAC, please refer to [18].
For removing block-edge artifacts, the H.264/AVC design in-
cludes a de-blocking filter, which is applied inside the motion
prediction loop. The strength of filtering is adaptively controlled
by the values of several syntax elements.
Similar to MPEG-2, a frame of interlaced video can be coded
as a single frame picture or two separate field pictures. Addi-
tionally, H.264/AVC supports a macroblock-adaptive switching
between frame and field coding. Therefore, a pair of vertically
adjacent macroblocks is considered as a coding unit, which can
be either transmitted as two frame macroblocks or a top and a
bottom field macroblock.
In H.264/AVC, three profiles are defined. The Baseline pro-
file includes all described features except B-slices, CABAC,
and the interlaced coding tools. Since the main target appli-
cation area of the Baseline profile is the interactive transmis-
sion of video, it is used in the comparison of video encoders
for videoconferencing applications. In the comparison for video
streaming and entertainment applications, which allow a larger
delay, the Main profile of H.264/AVC is used. The Main profile
adds support for B-slices, the highly efficient CABAC entropy
coding method, as well as the interlaced coding tools.
III. V
IDEO CODER CONTROL
One key problem in video compression is the operational
control of the source encoder. This problem is compounded
because typical video sequences contain widely varying con-
tent and motion, necessitating the selection between different
coding options with varying rate-distortion efficiency for dif-
ferent parts of the image. The task of coder control is to deter-
mine a set of coding parameters, and thereby the bitstream, such
that a certain rate-distortion trade-off is achieved for a given de-
coder. This article focuses on coder control algorithms for the
case of error-free transmission of the bitstream. For a discus-
sion of the application of coder control algorithms in the case of
error-prone transmission, please refer to [19]. A particular em-
phasis is on Lagrangian bit-allocation techniques, which have
emerged to form the most widely accepted approach in recent
standard development. The popularity of this approach is due
to its effectiveness and simplicity. For completeness, we briefly
review the Lagrangian optimization techniques and their appli-
cation to video coding.
A. Optimization Using Lagrangian Techniques
Consider
source samples that are collected in the -tuple
. A source sample can be a scalar or
vector. Each source sample
can be quantized using several
possible coding options that are indicated by an index out of the
set
. Let be the selected index
to code
. Then the coding options assigned to the elements in
are given by the components in the -tuple .
The problem of finding the combination of coding options that
minimizes the distortion for the given sequence of source sam-
ples subject to a given rate constraint
can be formulated as
(1)
Here,
and represent the total distortion and rate,
respectively, resulting from the quantization of
with a partic-
ular combination of coding options
. In practice, rather than
solving the constrained problem in (1), an unconstrained for-
mulation is employed, that is
(2)
and
being the Lagrange parameter. This unconstrained
solution to a discrete optimization problem was introduced by
Everett [20]. The solution
to (2) is optimal in the sense that
if a rate constraint
corresponds to , then the total distortion
is minimum for all combinations of coding options
with bit rate less or equal to
.
We can assume additive distortion and rate measures, and let
these two quantities be only dependent on the choice of the pa-
rameter corresponding to each sample. Then, a simplified La-
grangian cost function can be computed using
(3)
In this case, the optimization problem in (3) reduces to
(4)
and can be easily solved by independently selecting the coding
option for each
. Forthis particular scenario, the problem
formulation is equivalent to the bit-allocation problem for an ar-
bitrary set of quantizers, proposed by Shoham and Gersho [21].
B. Lagrangian Optimization in Hybrid Video Coding
The application of Lagrangian techniques to control a hybrid
video coder is not straightforward because of temporal and spa-
tial dependencies of the rate-distortion costs. Consider a block-
based hybrid video codec such as H.261, H.263, H.264/AVC
or MPEG-1/2/4. Let the image sequence
be partitioned into
distinct blocks and the associated pixels be given as .
The options
to encode each block are categorized into
INTRA and INTER, i.e., predictive coding modes with associ-
ated parameters. The parameters are transform coefficients and
quantizer value
for both modes plus one or more motion
vectors for the INTER mode. The parameters for both modes

Citations
More filters
Journal ArticleDOI

Overview of the H.264/AVC video coding standard

TL;DR: An overview of the technical features of H.264/AVC is provided, profiles and applications for the standard are described, and the history of the standardization process is outlined.
Journal ArticleDOI

Overview of the High Efficiency Video Coding (HEVC) Standard

TL;DR: The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality.
Journal ArticleDOI

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

TL;DR: An overview of the basic concepts for extending H.264/AVC towards SVC are provided and the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.
Journal ArticleDOI

Scope of validity of PSNR in image/video quality assessment

TL;DR: Experimental data are presented that clearly demonstrate the scope of application of peak signal-to-noise ratio (PSNR) as a video quality metric and it is shown that as long as the video content and the codec type are not changed, PSNR is a valid quality measure.
References
More filters
Journal ArticleDOI

Overview of the H.264/AVC video coding standard

TL;DR: An overview of the technical features of H.264/AVC is provided, profiles and applications for the standard are described, and the history of the standardization process is outlined.
Journal ArticleDOI

Rate-distortion optimization for video compression

TL;DR: Based on the well-known hybrid video coding structure, Lagrangian optimization techniques are presented that try to answer the question: what part of the video signal should be coded using what method and parameter settings?
Journal ArticleDOI

Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard

TL;DR: Context-based adaptive binary arithmetic coding (CABAC) as a normative part of the new ITU-T/ISO/IEC standard H.264/AVC for video compression is presented, and significantly outperforms the baseline entropy coding method of H.265.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "Rate-constrained coder control and comparison of video coding standards" ?

A unified approach to the coder control of video coding standards such as MPEG-2, H. 263, MPEG-4, and the draft video coding standard H. 264/AVC is presented. 

The technical features that should contribute to this improvement include allowing motion compensation on 8 8 blocks, extrapolation of motion vectors over picture boundaries, and improved intra coding efficiency. 

The sliding-window mode—in which only the most recent reference pictures are stored—is the simplest and most commonly implemented mode. 

In the more flexible adaptive buffering mode, buffer management commands can be inserted into the bitstream as side information, permitting an encoder to specify how long each reference picture remains available for prediction, with a constraint on the total size of the picture buffer. 

To achieve the target bit rate, a fixed quantization paramater setting was selected that resulted in the bit rate shown (with a change in the quantization parameter setting at one point during the sequence to fine-tune the target rate). 

MPEG-2 MP@ML compliant encoders find application in DVD-video, digital cable television, terrestrial broadcast of standard definition television, and direct-broadcast satellite (DBS) systems. 

In order to make such comparisons between several rate-distortion curves, the curve of the encoder with the poorest performance is used as a common base for comparison against all of the other encoders. 

A motion search range of integer pixels was employed by all encoders with the exception of H.263 Baseline, which is constrained by its syntax to a maximum range of integer pixels. 

The authors have chosen to ignore this constraint in their analysis in order to measure the performance of the underlying technology rather than the confining the analysis only to cases within all limits of the MPEG-4 Visual specification. 

The remaining Annexes contain additional functionalities including specifications for custom and flexible video formats, scalability, and backward-compatible supplemental enhancement information. 

The Lagrangian mode decision for a macroblock proceeds by minimizing(5)where the macroblock mode is varied over the sets of possible macroblock modes for the various standards. 

With this in mind, encoders were configured by only including the optional modes from each profile that would produce the best possible rate-distortion performance, while satisfying the low delay and complexity requirements of interactive video applications.