scispace - formally typeset
Open AccessJournal ArticleDOI

Video transcoding: an overview of various techniques and research issues

Reads0
Chats0
TLDR
An overview of several video transcoding techniques and some of the related research issues is provided, to propose solutions to some of these research issues, and identify possible research directions.
Abstract
One of the fundamental challenges in deploying multimedia systems, such as telemedicine, education, space endeavors, marketing, crisis management, transportation, and military, is to deliver smooth and uninterruptible flow of audio-visual information, anytime and anywhere. A multimedia system may consist of various devices (PCs, laptops, PDAs, smart phones, etc.) interconnected via heterogeneous wireline and wireless networks. In such systems, multimedia content originally authored and compressed with a certain format may need bit rate adjustment and format conversion in order to allow access by receiving devices with diverse capabilities (display, memory, processing, decoder). Thus, a transcoding mechanism is required to make the content adaptive to the capabilities of diverse networks and client devices. A video transcoder can perform several additional functions. For example, if the bandwidth required for a particular video is fluctuating due to congestion or other causes, a transcoder can provide fine and dynamic adjustments in the bit rate of the video bitstream in the compressed domain without imposing additional functional requirements in the decoder. In addition, a video transcoder can change the coding parameters of the compressed video, adjust spatial and temporal resolution, and modify the video content and/or the coding standard used. This paper provides an overview of several video transcoding techniques and some of the related research issues. We introduce some of the basic concepts of video transcoding, and then review and contrast various approaches while highlighting critical research issues. We propose solutions to some of these research issues, and identify possible research directions.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005 793
Video Transcoding: An Overview of Various
Techniques and Research Issues
Ishfaq Ahmad, Senior Member, IEEE, Xiaohui Wei, Student Member, IEEE, Yu Sun, Student Member, IEEE, and
Ya-Qin Zhang, Fellow, IEEE
Abstract—One of the fundamental challenges in deploying mul-
timedia systems, such as telemedicine, education, space endeavors,
marketing, crisis management, transportation, and military, is to
deliver smooth and uninterruptible flow of audio-visual informa-
tion, anytime and anywhere. A multimedia system may consist of
various devices (PCs, laptops, PDAs, smart phones, etc.) intercon-
nected via heterogeneous wireline and wireless networks. In such
systems, multimedia content originally authored and compressed
with a certain format may need bit rate adjustment and format
conversion in order to allow access by receiving devices with
diverse capabilities (display, memory, processing, decoder). Thus,
a transcoding mechanism is required to make the content adaptive
to the capabilities of diverse networks and client devices. A video
transcoder can perform several additional functions. For example,
if the bandwidth required for a particular video is fluctuating
due to congestion or other causes, a transcoder can provide fine
and dynamic adjustments in the bit rate of the video bitstream in
the compressed domain without imposing additional functional
requirements in the decoder. In addition, a video transcoder can
change the coding parameters of the compressed video, adjust
spatial and temporal resolution, and modify the video content
and/or the coding standard used. This paper provides an overview
of several video transcoding techniques and some of the related
research issues. We introduce some of the basic concepts of video
transcoding, and then review and contrast various approaches
while highlighting critical research issues. We propose solutions
to some of these research issues, and identify possible research
directions.
Index Terms—Frequency domain, heterogeneous video systems,
H.26X, MPEG-X, motion vector refinement, spatial domain, video
transcoding.
I. INTRODUCTION
V
IDEO transcoding performs one or more operations, such
as bit rate and format conversions, to transform one com-
pressed video stream to another. Transcoding can enable mul-
timedia devices of diverse capabilities and formats to exchange
video content on heterogeneous network platforms such as the
Internet. One scenario is delivering a high-quality multimedia
source (such as a DVD or HDTV) to various receivers (such
Manuscript received April 17, 2003; revised April 3, 2004. The associate ed-
itor coordinating the review of this manuscript and approving it for publication
was Prof. Suh-Yin Lee.
I. Ahmad and X. Wei are with the Department of Computer Science and En-
gineering, University of Texas at Arlington, Arlington, TX 76019 USA (e-mail:
iahmad@cse.uta.edu; xhwei@cse.uta.edu).
Y. Sun was with the Department of Computer Science and Engineering, Uni-
versity of Texas at Arlington, Arlington, TX 76019 USA. She is now with the
Department of Computer Science, University of Central Arkansas, Conway, AR
72035 USA (e-mail: yusun@mail.uca.edu).
Y.-Q. Zhang is with the Mobile and Embedded Devices Division, Microsoft
Corporation, Seattle, WA 98052 USA (e-mail: yzhang@microsoft.com).
Digital Object Identifier 10.1109/TMM.2005.854472
Fig. 1. Video transcoding operations.
as PDAs, Pocket PCs, and fast desktop PCs) on wireless and
wireline networks. Here, a transcoder (placed at the transmitter,
receiver or somewhere in the network) can generate appropriate
bitstream threads directly from the original bitstream without
having to decode and re-encode. To suit available network band-
width, a video transcoder can perform dynamic adjustments in
the bit-rate of the video bitstream without additional functional
requirements in the decoder. Another scenario is a video con-
ferencing system on the Internet in which the participants may
be using different terminals. Here, a video transcoder can offer
dual functionality: provide video format conversion to enable
content exchange, and perform dynamic bit rate adjustment to
facilitate proper scheduling of network resources. Thus, video
transcoding is one of the essential components for current and
future multimedia systems that aim to provide
universal ac-
cess[13].
Currently, several video compression standards exist for dif-
ferent multimedia applications. Each standard may be used in a
range of applications but is optimized for a limited range. H.261,
H.263, H.263
designed by ITU (International Telecommuni-
cation Unit) are aimed for low-bit-rate video applications such
as videophone and videoconferencing. MPEG standards are de-
fined by ISO (International Organization for Standardization).
MPEG-2 is aimed for high bit rate high quality applications such
as digital TV broadcasting and DVD, and MPEG-4 is aimed at
multimedia applications including streaming video applications
on mobile devices. As the number of applications increases and
various networks such as wireline and wireless integrate with
each other, inter-compatibility between different systems and
different platforms are becoming highly desirable. Transcoding
is needed both within and across different standards to allow
the interoperation of multimedia streams. As shown in Fig. 1,
adjustment of coding parameters of compressed video, spatial
and temporal resolution conversions, insertion of new informa-
tion such as digital watermarks or company logos, and enhanced
error resilience can also be done through transcoding.
Scalable coding is another approach to enable bit-rate ad-
justment. Traditional scalability in video compression can be
of three types: SNR scalability, spatial scalability, and temporal
scalability. To achieve different levels of video quality, the video
1520-9210/$20.00 © 2005 IEEE

794 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005
source is rst encoded with low PSNR, low spatial resolution,
or low frame-rate to form a base layer. The residual information
between the base layer and the original input is then encoded to
form one or more enhancement layers. Additional enhancement
layers enhance the quality by adding the residual information.
However, if pre-encoded video is used, scalable coding is inex-
ible since the number of different predened layers is limited
1
and the bit-rate of the target video cannot be reduced lower than
the bit-rate of the base layer. Thus, scalability alone does not
solve the bit-rate adjustment problem.
This paper provides a comprehensive survey of video
transcoding techniques. We discuss various research issues
arising in transcoding and illustrate them using an
architec-
tural approach. An architecture, which can be implemented
in hardware or software, shows various algorithmic modules,
as well as their operations. We present several transcoding
architectures with varying levels of efciency and functional
modules. We categorize these architectures and present various
examples within a category. We discuss various outstanding
issues and provide future directions. The organization of this
paper is as follows. Section II provides the basic requirements
and functionalities of transcoding. Section III classies various
transcoding architectures and discusses the basic problems.
Sections IV and V describe techniques of homogeneous
transcoding (with similar standard) and heterogeneous video
transcoding (between different standards), respectively. Sec-
tion VI reviews some research issues. Section VII concludes
the paper with nal remarks.
II. R
EQUIREMENTS AND
FUNCTIONALITIES
The rst and most important challenge in the context of a
video conferencing is to provide transcoding on the y with real-
time speed and without any interruption of video ow [17], [49].
There are three basic requirements in transcoding [2], [42]: 1)
the information in the original bitstream should be exploited
as much as possible; 2) the resulting video quality of the new
bitstream should be as high as possible, or as close as possible
to the bitstream created by coding the original source video at
the reduced rate; 3) in real-time applications, the transcoding
delay and memory requirement should be minimized to meet
real-time constraints.
A video transcoder can provide several functions, including
adjustment of bit rate and format conversion. We illustrate these
functionalities and their classication in Fig. 2.
Homogeneous transcoding performs conversion between
video bitstreams of the same standard. A simple technique to
transcode a video to lower bit rate is to increase the quantization
step at the encoder part in the transcoder [35], [43]. Spatial
resolution can be done in a number of ways (see Fig. 3) [24].
One possibility is to transcode from normal video to a video
containing only the region of interest. Fig. 4 illustrates that a
transcoder can down-sample a scene to the object of interest
(determined through meta information). This may be done
using some meta information. In subsampling, ltering and
pixel averaging to reduce spatial resolution [24], [30] problems
arise when passing motion vectors directly from the decoder to
1
MPEG-4 FGS allows more exible control.
Fig. 2. Various transcoding operations and their classication.
Fig. 3. Various ways of spatial transcoding.
Fig. 4. Transcoding with normal down-sampling and with interest-based
object.
the encoder. Thus, motion vectors need to be rened [32], [37].
Frame-rate conversion is needed when the end-system supports
only a lower frame-rate. With dropped frames, the incoming
motion information is invalid because they point to the frames
that do not exist in the transcoded bitstream.
A heterogeneous video transcoder provides conversions be-
tween existing and future video coding standards. It provides
syntax conversion between these standards. Further, a hetero-
geneous video transcoder may also provide the functionalities
of homogeneous transcoding. Transcoding may include addi-
tional functions such as error-resilience and logo or watermark
insertion. These functions will be described in the paper subse-
quently.
III. V
IDEO TRANSCODING ARCHITECTURES
A. Open-Loop Transcoder and Closed-Loop Transcoder
The most straightforward transcoding architecture is to cas-
cade the decoder and encoder directly as shown in Fig. 5(a). In
this architecture, the incoming source video stream
is fully
decoded, and then re-encoded the decoded video into the target
video stream
with desirable bit-rate or format, with no
degradation in the visual quality due to transcoding. The more

AHMAD et al.: VIDEO TRANSCODING: AN OVERVIEW OF VARIOUS TECHNIQUES AND RESEARCH ISSUES 795
Fig. 5. Cascaded decoder and encoder transcoder: (a) function and (b) details.
detailed manifestation of the cascaded transcoder is shown in
Fig. 5(b).
In predictive coding, a coded video frame is predicted from
other frames and only the prediction error (residue error) is
coded. For the decoder to operate properly, the video frames re-
constructed and stored in decoder predictor must be exactly the
same as those in the encoder predictor. Decoding of a transcoded
video would result in errors if the predictors of the decoder
are different from those of the original encoder; these errors
would accumulate with time through the whole group of pic-
tures (GOP). The error accumulation resulting from encoder/de-
coder predictor mismatch is called drift error [7].
In order to understand how the drift error comes, let us
consider the architectures of the cascaded decoder and encoder
transcoder in Fig. 5(b) and an open-loop transcoder with
re-quantization scheme in Fig. 6.
From Fig. 5(b), we can get
(1)
(2)
(3)
where
In Fig. 6, the open-loop transcoder starts with the de-quanti-
zation of the DCT coefcients using the original quantizer levels
. These coefcients are re-encoded with a different quan-
tizer
for output bit rate reduction. From Fig. 6, we get
(4)
Fig. 6. Open-loop transcoder with re-quantization scheme.
Comparing (4) with (3), the drift error of frame can
be expressed as
We can see that represents an error in the reference
picture that is used for motion compensation (MC). This error
may be caused by re-quantization, elimination of some nonzero
DCT coefcients, or by integer truncation [47]. In video com-
pression, Intra-coded frames (I frames) are encoded without ref-
erence frame, MC is not needed in encoding I frames, so the
transcoding of I frames is not subject to the drift. Bi-direction-
ally predictive coded frames (B frames) are not used for pre-
dicting future frames [7]. Therefore, the transcoding of B frames
does not contribute to the propagation and accumulation of the
drift. The drift error is only caused by the transcoding operation
of INTER coded frames, and can accumulate through a GOP, the
quality deterioration gradually increases until the next I-frame
refreshes the video scene [1], [3], [47].
Open-loop transcoders contain no feedback loop in the
transcoding architecture for compensating the drift error.
They aim for minimum transcoding complexity, and thus only
modify the encoded DCT coefcients to reduce the overall bit
rate [1]. Open-loop transcoders include selective transmission

796 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005
Fig. 7. SDTA with motion vector reused. (a) SDTA with STR. (b) Simplied
SDTA without STR.
and re-quantization. Selective transmission [8], [34] discards
high frequency DCT coefcients of a block. Re-quantization
architectures re-quantize the motion compensated residue er-
rors to adapt to the bit-rate requirement [34], as shown in Fig. 6.
Both approaches operate in the frequency domain and are
rather simple to implement. Both of them change the residue
error and alter the content in the decoder predictor. Therefore,
when the decoder decodes the video processed by an open-loop
algorithm, the predictors would be different from those of the
original encoder, leading to drift errors.
Closed-loop transcoders contain a feedback loop in the
transcoding architecture in order to correct the transcoding
distortion (see Figs. 7 and 8 as examples) by compensating
the drift in the transcoder [2], [17], [34]. We will focus on
the closed-loop architectures in the following subsections and
classify them in various categories.
B. Spatial-Domain Video Transcoding
Fig. 5(b) shows a spatial-domain transcoding architecture
(SDTA) that can perform dynamic bit-rate adaptation via the
rate-control at the encoder side. This architecture is exible
since the decoder-loop and the encoder-loop can be totally
independent of each other (e.g., they can operate at different
bit-rates, frame-rates, picture resolutions, coding modes, and
even different standards). This architecture is drift-free, but its
computational complexity is high for real-time applications.
Fig. 8. Frequency domain transcoder architecture (FDTA).
Since a pre-encoded video stream arriving at a transcoder al-
ready carries useful information such as the picture type, mo-
tion vectors (MV), quantization step-size, bit-allocation statis-
tics, etc., it is possible to construct transcoders with different
complexity and performance in terms of coding efciency and
video quality. Intuitively, most of the motion information and
the mode decision information received in the video decoder can
be reused in the video encoder without introducing signicant
degradation on visual quality. Thus, motion estimation, the most
time-consuming operation in video encoding which accounts
for 60%70% of the encoder computation [30], is avoided. This
leads to an SDTA that can reuse MVs [shown in Fig. 7(a)].
This architecture saves the motion estimation operation, which
is the most time-consuming module. The pre-encoded source
video is decoded in the spatial-domain by performing variable-
length decoding (VLD), inverse quantization
, IDCT, and
motion compensation. In the encoder, the motion compensated
residue errors are encoded into frequency-domain through DCT,
re-quantization
, and variable length coding (VLC). The
motion compensation operation at the encoding end is also per-
formed in the spatial domain for the prediction operation. The
MV reuse approach is useful in complexity reduction for mo-
tion estimation in video transcoding [17].
The architectures in Fig. 7(a) and the Figs. 8 and 9 include
two optional functional blocks placed between the decoder
and encoder: spatial/temporal resolution reduction (STR)
module and MV composition and renement (MVCR) module.
STR allows the source video to be transcoded to target video
with different spatial/temporal resolution accordingly. MVRC
is needed to adjust the MVs when STR is applied. When
transcoding without spatial/temporal resolution reduction, the
SDTA architecture can be further simplied into Fig. 7(b), [2],
in which only one feedback loop is employed.
C. Frequency-Domain Transcoding
Exploiting the structural redundancy of the architecture in
Fig. 7 and the linearity of the DCT/IDCT, a structurally sim-
pler but functionally equivalent frequency-domain transcoding
architecture is possible [8], which can be further simplied [2],
[4], [23], as shown in Fig. 8. In this architecture, only VLD
and inverse quantization are performed to get DCT value of
each block in the decoder end. At the encoder end, the motion

AHMAD et al.: VIDEO TRANSCODING: AN OVERVIEW OF VARIOUS TECHNIQUES AND RESEARCH ISSUES 797
Fig. 9. Hybrid-domain transcoding architecture (HDTA).
compensated residue errors are encoded through re-quantiza-
tion, and VLC. The reference frame memory in the encoder end
stores the DCT values after inverse quantization, that are then
fed to the frequency-domain MC module to reduce drift error.
This is referred to as frequency-domain transcoding architecture
(FDTA).
In this architecture, motion compensation is performed in
the frequency domain using a MV reusing algorithm. Detail
frequency domain MC algorithm can be found in [2] and [31].
An FDTA may need less computation but suffer from the
drift problem due to nonlinearity operations, which includes
subpixel motion compensation, and DCT coefcients clipping
during MC. FDTAs also lack exibility and are mostly tted
for bi-rate transcoding. Recently, researchers have studied
frequency-domain motion estimation that may eliminate some
of these constraints [18].
D. A Hybrid-Domain Transcoding Architecture
Various transcoding algorithms provide tradeoff between the
computational complexity and reconstructed video quality. In
order to reduce the computational complexity while maintain
the reconstructed video quality, ME should be omitted and
DCT/IDCT should be avoided if possible. For example, the
architecture in [45] uses MC for P frames only. I frames are
intra coded, which need no ME and MC, and thus, IDCT/DCT
for I frames can be omitted in principle. But since I frames are
the anchors for subsequent P and B frames, the IDCT at the
decoder stage, inverse quantization and IDCT at the encoder
stage for I frames are still needed to reconstruct the reference
frames, while DCT at the encoder stage can be omitted. Since
P frames are also the anchors for the following P and B frames,
MC, DCT, and IDCT cannot be omitted. For B frames, which
are not the reference frames for the subsequent frames, drift
error generated in B frames would not propagated through the
video sequence, so MC of B frames can be removed without
introducing signicant degradation on visual quality of re-
constructed pictures. Thus, DCT/IDCT in all B frames can be
omitted, and the transcoding of B frames can be directly done
in the DCT domain.
We can further reduce the transcoding delay without de-
grading the video quality in this architecture. P frames with
frequent scene changes and rapid motion may contain a large
number of INTRA blocks. One can further omit the IDCT/DCT
and MC operation of these INTRA blocks in P frames. In other
words, blocks of I and B pictures and INTRA blocks of P pic-
tures are transcoded in frequency-domain, the spatial-domain
motion compensation is done only when the block is inter block
in P frames. We call this transcoding architecture as hybrid
domain transcoding architecture (HDTA), as shown in Fig. 9.
From the simulation results in [45], compared to SDTA with
MV reused, the HDTA has less complexity, which speeds up the
transcoding operation, but has the expense of some degradation
in picture quality. Compared with frequency domain transcoder,
this transcoder performs DCT/IDCT and MC when the block is
INTER block in P frames, which may increase the transcoding
delay but has better visual quality.
IV. H
OMOGENEOUS VIDEO
TRANSCODING
Homogeneous transcoding performs conversion between
video bitstreams of the same standard. A high quality source
video may be transcoded to a target video bitstream of lower
quality, with different spatial/temporal resolutions, and dif-
ferent bit rates. The following subsections describe some of the
research issues in homogeneous transcoding.
A. Reducing Bits With Fixed Resolution
For xed spatial and temporal resolution, we can reduce the
bit rate using the following two techniques:
Re-Quantization: A simple technique to transcoding a video
to lower bit rate is to increase the quantization step at the encoder
part in the transcoder [26], [35], [43]. This decreases the number
of nonzero quantized coefcients thus decreasing the amount of
bits in the outgoing bitstream. Requantizing is a good compro-
mise between the complexity and reconstructed image quality,
and can control the bit-rate reduction.
Selective Transmission: Since most of the energy is con-
centrated at the lower frequency band of an image, discarding
(truncating) some of the higher ac frequency coefcients [1],
[30], [34] can preserve the picture quality, but may introduce a
blocking effect in the reconstructed target video.
B. Spatial Resolution Reduction
Reduction in spatial resolution can obviously lower the
bit rate. In this subsection, we describe some common video
transcoding techniques.
Filtering and Subsampling: Filtering and subsampling are
common techniques to reduce spatial resolution [24], [30], [48].
Shanableh [30] proposed a lter that can be used both hori-
zontal and vertical directions for luminance and chrominance;
the image is then down-sampled by dropping every alternate
pixel in the both horizontal and vertical directions.
Pixel Averaging: Pixel averaging [30] is another common
technique in which every m
m pixels are represented by
a single pixel of their average value. Pixel averaging is the
simplest method but the reconstructed pictures may become
blurred.

Citations
More filters
Journal ArticleDOI

Video Adaptation for Small Display Based on Content Recomposition

TL;DR: A novel framework for video adaptation based on content recomposition is proposed to provide effective small size videos which emphasize the important aspects of a scene while faithfully retaining the background context.
Journal ArticleDOI

Vlogging: A survey of videoblogging technology on the web

TL;DR: A comprehensive survey of videoblogging (vlogging for short) as a new technological trend is presented and several multimedia technologies are introduced to empower vlogging technology with better scalability, interactivity, searchability, and accessability.
Journal ArticleDOI

Video Coding on Multicore Graphics Processors

TL;DR: A GPU based fast motion estimation is discussed to illustrate some design considerations in using GPUs for video coding, and the tradeoff between speedup and rate-distortion performance is highlighted.
Patent

System And Method For Transcoding Between Scalable And Non-Scalable Video Codecs

TL;DR: In this article, systems and methods are provided for performing transcoding in video communication system that use scalable video coding. But they are based on compressed domain processing, partial decoding-reencoding, or full decoding with side information, depending on the specific characteristics of the input and desired output signals.
Patent

Control Plane Architecture for Multicast Cache-Fill

TL;DR: In this paper, a multicast content delivery system can use both multicast and unicast streams to efficiently use available bandwidth to deliver content, where available multicast contents can be identified to gateways serving consumption devices and the gateways can receive requests for unicast content deliver, but honor the requests with multicast group sessions.
References
More filters
Journal ArticleDOI

Overview of fine granularity scalability in MPEG-4 video standard

TL;DR: An overview of the FGS video coding technique is provided in this Amendment of the MPEG-4 to address a variety of challenging problems in delivering video over the Internet.
Journal ArticleDOI

Adapting multimedia Internet content for universal access

TL;DR: This work presents a system that adapts multimedia Web documents to optimally match the capabilities of the client device requesting it using a representation scheme called the InfoPyramid that provides a multimodal, multiresolution representation hierarchy for multimedia.
Journal ArticleDOI

Manipulation and compositing of MC-DCT compressed video

TL;DR: This work proposes algorithms to manipulate compressed video in the compressed domain using the discrete cosine transform with or without motion compensation (MC), and derives a complete set of algorithms for all aforementioned manipulation functions in the transform domain.
Journal ArticleDOI

Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats

TL;DR: This work transcoding of pre-encoded MPEG-1, 2 video into lower bit rates is realized through altering the coding algorithm into H.261/H.263 standards with lower spatio-temporal resolutions through heterogeneous transcoding.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What have the authors contributed in "Video transcoding: an overview of various techniques and research issues" ?

For example, if the bandwidth required for a particular video is fluctuating due to congestion or other causes, a transcoder can provide fine and dynamic adjustments in the bit rate of the video bitstream in the compressed domain without imposing additional functional requirements in the decoder. This paper provides an overview of several video transcoding techniques and some of the related research issues. The authors introduce some of the basic concepts of video transcoding, and then review and contrast various approaches while highlighting critical research issues. The authors propose solutions to some of these research issues, and identify possible research directions. 

To obtain inter-compatibility between H. 264 and other standards, H. 264 related transcoding would become a more challenge issue in the future research of video transcoding.