What are the future works in "Video transcoding: an overview of various techniques and research issues" ?

To obtain inter-compatibility between H. 264 and other standards, H. 264 related transcoding would become a more challenge issue in the future research of video transcoding.

(Open Access) Video transcoding: an overview of various techniques and research issues (2005) | Ishfaq Ahmad

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005 793

Video Transcoding: An Overview of Various

Techniques and Research Issues

Ishfaq Ahmad, Senior Member, IEEE, Xiaohui Wei, Student Member, IEEE, Yu Sun, Student Member, IEEE, and

Ya-Qin Zhang, Fellow, IEEE

Abstract—One of the fundamental challenges in deploying mul-

timedia systems, such as telemedicine, education, space endeavors,

marketing, crisis management, transportation, and military, is to

deliver smooth and uninterruptible ﬂow of audio-visual informa-

tion, anytime and anywhere. A multimedia system may consist of

various devices (PCs, laptops, PDAs, smart phones, etc.) intercon-

nected via heterogeneous wireline and wireless networks. In such

systems, multimedia content originally authored and compressed

with a certain format may need bit rate adjustment and format

conversion in order to allow access by receiving devices with

diverse capabilities (display, memory, processing, decoder). Thus,

a transcoding mechanism is required to make the content adaptive

to the capabilities of diverse networks and client devices. A video

transcoder can perform several additional functions. For example,

if the bandwidth required for a particular video is ﬂuctuating

due to congestion or other causes, a transcoder can provide ﬁne

and dynamic adjustments in the bit rate of the video bitstream in

the compressed domain without imposing additional functional

requirements in the decoder. In addition, a video transcoder can

change the coding parameters of the compressed video, adjust

spatial and temporal resolution, and modify the video content

and/or the coding standard used. This paper provides an overview

of several video transcoding techniques and some of the related

research issues. We introduce some of the basic concepts of video

transcoding, and then review and contrast various approaches

while highlighting critical research issues. We propose solutions

to some of these research issues, and identify possible research

directions.

Index Terms—Frequency domain, heterogeneous video systems,

H.26X, MPEG-X, motion vector reﬁnement, spatial domain, video

transcoding.

I. INTRODUCTION

IDEO transcoding performs one or more operations, such

as bit rate and format conversions, to transform one com-

pressed video stream to another. Transcoding can enable mul-

timedia devices of diverse capabilities and formats to exchange

video content on heterogeneous network platforms such as the

Internet. One scenario is delivering a high-quality multimedia

source (such as a DVD or HDTV) to various receivers (such

Manuscript received April 17, 2003; revised April 3, 2004. The associate ed-

itor coordinating the review of this manuscript and approving it for publication

was Prof. Suh-Yin Lee.

I. Ahmad and X. Wei are with the Department of Computer Science and En-

gineering, University of Texas at Arlington, Arlington, TX 76019 USA (e-mail:

iahmad@cse.uta.edu; xhwei@cse.uta.edu).

Y. Sun was with the Department of Computer Science and Engineering, Uni-

versity of Texas at Arlington, Arlington, TX 76019 USA. She is now with the

Department of Computer Science, University of Central Arkansas, Conway, AR

72035 USA (e-mail: yusun@mail.uca.edu).

Y.-Q. Zhang is with the Mobile and Embedded Devices Division, Microsoft

Corporation, Seattle, WA 98052 USA (e-mail: yzhang@microsoft.com).

Digital Object Identiﬁer 10.1109/TMM.2005.854472

Fig. 1. Video transcoding operations.

as PDAs, Pocket PCs, and fast desktop PCs) on wireless and

wireline networks. Here, a transcoder (placed at the transmitter,

receiver or somewhere in the network) can generate appropriate

bitstream threads directly from the original bitstream without

having to decode and re-encode. To suit available network band-

width, a video transcoder can perform dynamic adjustments in

the bit-rate of the video bitstream without additional functional

requirements in the decoder. Another scenario is a video con-

ferencing system on the Internet in which the participants may

be using different terminals. Here, a video transcoder can offer

dual functionality: provide video format conversion to enable

content exchange, and perform dynamic bit rate adjustment to

facilitate proper scheduling of network resources. Thus, video

transcoding is one of the essential components for current and

future multimedia systems that aim to provide

universal ac-

cess[13].

Currently, several video compression standards exist for dif-

ferent multimedia applications. Each standard may be used in a

range of applications but is optimized for a limited range. H.261,

H.263, H.263

cation Unit) are aimed for low-bit-rate video applications such

as videophone and videoconferencing. MPEG standards are de-

ﬁned by ISO (International Organization for Standardization).

MPEG-2 is aimed for high bit rate high quality applications such

as digital TV broadcasting and DVD, and MPEG-4 is aimed at

multimedia applications including streaming video applications

on mobile devices. As the number of applications increases and

various networks such as wireline and wireless integrate with

each other, inter-compatibility between different systems and

different platforms are becoming highly desirable. Transcoding

is needed both within and across different standards to allow

the interoperation of multimedia streams. As shown in Fig. 1,

adjustment of coding parameters of compressed video, spatial

and temporal resolution conversions, insertion of new informa-

tion such as digital watermarks or company logos, and enhanced

error resilience can also be done through transcoding.

Scalable coding is another approach to enable bit-rate ad-

justment. Traditional scalability in video compression can be

of three types: SNR scalability, spatial scalability, and temporal

scalability. To achieve different levels of video quality, the video

794 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005

source is ﬁrst encoded with low PSNR, low spatial resolution,

or low frame-rate to form a base layer. The residual information

between the base layer and the original input is then encoded to

form one or more enhancement layers. Additional enhancement

layers enhance the quality by adding the residual information.

However, if pre-encoded video is used, scalable coding is inﬂex-

ible since the number of different predeﬁned layers is limited

and the bit-rate of the target video cannot be reduced lower than

the bit-rate of the base layer. Thus, scalability alone does not

solve the bit-rate adjustment problem.

This paper provides a comprehensive survey of video

transcoding techniques. We discuss various research issues

arising in transcoding and illustrate them using an

architec-

tural approach. An architecture, which can be implemented

in hardware or software, shows various algorithmic modules,

as well as their operations. We present several transcoding

architectures with varying levels of efﬁciency and functional

modules. We categorize these architectures and present various

examples within a category. We discuss various outstanding

issues and provide future directions. The organization of this

paper is as follows. Section II provides the basic requirements

and functionalities of transcoding. Section III classiﬁes various

transcoding architectures and discusses the basic problems.

Sections IV and V describe techniques of homogeneous

transcoding (with similar standard) and heterogeneous video

transcoding (between different standards), respectively. Sec-

tion VI reviews some research issues. Section VII concludes

the paper with ﬁnal remarks.

II. R

EQUIREMENTS AND

FUNCTIONALITIES

The ﬁrst and most important challenge in the context of a

video conferencing is to provide transcoding on the ﬂy with real-

time speed and without any interruption of video ﬂow [17], [49].

There are three basic requirements in transcoding [2], [42]: 1)

the information in the original bitstream should be exploited

as much as possible; 2) the resulting video quality of the new

bitstream should be as high as possible, or as close as possible

to the bitstream created by coding the original source video at

the reduced rate; 3) in real-time applications, the transcoding

delay and memory requirement should be minimized to meet

real-time constraints.

A video transcoder can provide several functions, including

adjustment of bit rate and format conversion. We illustrate these

functionalities and their classiﬁcation in Fig. 2.

Homogeneous transcoding performs conversion between

video bitstreams of the same standard. A simple technique to

transcode a video to lower bit rate is to increase the quantization

step at the encoder part in the transcoder [35], [43]. Spatial

resolution can be done in a number of ways (see Fig. 3) [24].

One possibility is to transcode from normal video to a video

containing only the region of interest. Fig. 4 illustrates that a

transcoder can down-sample a scene to the object of interest

(determined through meta information). This may be done

using some meta information. In subsampling, ﬁltering and

pixel averaging to reduce spatial resolution [24], [30] problems

arise when passing motion vectors directly from the decoder to

MPEG-4 FGS allows more ﬂexible control.

Fig. 2. Various transcoding operations and their classiﬁcation.

Fig. 3. Various ways of spatial transcoding.

Fig. 4. Transcoding with normal down-sampling and with interest-based

object.

the encoder. Thus, motion vectors need to be reﬁned [32], [37].

Frame-rate conversion is needed when the end-system supports

only a lower frame-rate. With dropped frames, the incoming

motion information is invalid because they point to the frames

that do not exist in the transcoded bitstream.

A heterogeneous video transcoder provides conversions be-

tween existing and future video coding standards. It provides

syntax conversion between these standards. Further, a hetero-

geneous video transcoder may also provide the functionalities

of homogeneous transcoding. Transcoding may include addi-

tional functions such as error-resilience and logo or watermark

insertion. These functions will be described in the paper subse-

quently.

III. V

IDEO TRANSCODING ARCHITECTURES

A. Open-Loop Transcoder and Closed-Loop Transcoder

The most straightforward transcoding architecture is to cas-

cade the decoder and encoder directly as shown in Fig. 5(a). In

this architecture, the incoming source video stream

is fully

decoded, and then re-encoded the decoded video into the target

video stream

with desirable bit-rate or format, with no

degradation in the visual quality due to transcoding. The more

AHMAD et al.: VIDEO TRANSCODING: AN OVERVIEW OF VARIOUS TECHNIQUES AND RESEARCH ISSUES 795

Fig. 5. Cascaded decoder and encoder transcoder: (a) function and (b) details.

detailed manifestation of the cascaded transcoder is shown in

Fig. 5(b).

In predictive coding, a coded video frame is predicted from

other frames and only the prediction error (residue error) is

coded. For the decoder to operate properly, the video frames re-

constructed and stored in decoder predictor must be exactly the

same as those in the encoder predictor. Decoding of a transcoded

video would result in errors if the predictors of the decoder

are different from those of the original encoder; these errors

would accumulate with time through the whole group of pic-

tures (GOP). The error accumulation resulting from encoder/de-

coder predictor mismatch is called “drift” error [7].

In order to understand how the drift error comes, let us

consider the architectures of the cascaded decoder and encoder

transcoder in Fig. 5(b) and an open-loop transcoder with

re-quantization scheme in Fig. 6.

From Fig. 5(b), we can get

(1)

(2)

(3)

where

In Fig. 6, the open-loop transcoder starts with the de-quanti-

zation of the DCT coefﬁcients using the original quantizer levels

. These coefﬁcients are re-encoded with a different quan-

tizer

for output bit rate reduction. From Fig. 6, we get

(4)

Fig. 6. Open-loop transcoder with re-quantization scheme.

Comparing (4) with (3), the drift error of frame can

be expressed as

We can see that represents an error in the reference

picture that is used for motion compensation (MC). This error

may be caused by re-quantization, elimination of some nonzero

DCT coefﬁcients, or by integer truncation [47]. In video com-

pression, Intra-coded frames (I frames) are encoded without ref-

erence frame, MC is not needed in encoding I frames, so the

transcoding of I frames is not subject to the drift. Bi-direction-

ally predictive coded frames (B frames) are not used for pre-

dicting future frames [7]. Therefore, the transcoding of B frames

does not contribute to the propagation and accumulation of the

drift. The drift error is only caused by the transcoding operation

of INTER coded frames, and can accumulate through a GOP, the

quality deterioration gradually increases until the next I-frame

refreshes the video scene [1], [3], [47].

Open-loop transcoders contain no feedback loop in the

transcoding architecture for compensating the drift error.

They aim for minimum transcoding complexity, and thus only

modify the encoded DCT coefﬁcients to reduce the overall bit

rate [1]. Open-loop transcoders include selective transmission

796 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005

Fig. 7. SDTA with motion vector reused. (a) SDTA with STR. (b) Simpliﬁed

SDTA without STR.

and re-quantization. Selective transmission [8], [34] discards

high frequency DCT coefﬁcients of a block. Re-quantization

architectures re-quantize the motion compensated residue er-

rors to adapt to the bit-rate requirement [34], as shown in Fig. 6.

Both approaches operate in the frequency domain and are

rather simple to implement. Both of them change the residue

error and alter the content in the decoder predictor. Therefore,

when the decoder decodes the video processed by an open-loop

algorithm, the predictors would be different from those of the

original encoder, leading to drift errors.

Closed-loop transcoders contain a feedback loop in the

transcoding architecture in order to correct the transcoding

distortion (see Figs. 7 and 8 as examples) by compensating

the drift in the transcoder [2], [17], [34]. We will focus on

the closed-loop architectures in the following subsections and

classify them in various categories.

B. Spatial-Domain Video Transcoding

Fig. 5(b) shows a spatial-domain transcoding architecture

(SDTA) that can perform dynamic bit-rate adaptation via the

rate-control at the encoder side. This architecture is ﬂexible

since the decoder-loop and the encoder-loop can be totally

independent of each other (e.g., they can operate at different

bit-rates, frame-rates, picture resolutions, coding modes, and

even different standards). This architecture is drift-free, but its

computational complexity is high for real-time applications.

Fig. 8. Frequency domain transcoder architecture (FDTA).

Since a pre-encoded video stream arriving at a transcoder al-

ready carries useful information such as the picture type, mo-

tion vectors (MV), quantization step-size, bit-allocation statis-

tics, etc., it is possible to construct transcoders with different

complexity and performance in terms of coding efﬁciency and

video quality. Intuitively, most of the motion information and

the mode decision information received in the video decoder can

be reused in the video encoder without introducing signiﬁcant

degradation on visual quality. Thus, motion estimation, the most

time-consuming operation in video encoding which accounts

for 60%–70% of the encoder computation [30], is avoided. This

leads to an SDTA that can reuse MVs [shown in Fig. 7(a)].

This architecture saves the motion estimation operation, which

is the most time-consuming module. The pre-encoded source

video is decoded in the spatial-domain by performing variable-

length decoding (VLD), inverse quantization

, IDCT, and

motion compensation. In the encoder, the motion compensated

residue errors are encoded into frequency-domain through DCT,

re-quantization

, and variable length coding (VLC). The

motion compensation operation at the encoding end is also per-

formed in the spatial domain for the prediction operation. The

MV reuse approach is useful in complexity reduction for mo-

tion estimation in video transcoding [17].

The architectures in Fig. 7(a) and the Figs. 8 and 9 include

two optional functional blocks placed between the decoder

and encoder: spatial/temporal resolution reduction (STR)

module and MV composition and reﬁnement (MVCR) module.

STR allows the source video to be transcoded to target video

with different spatial/temporal resolution accordingly. MVRC

is needed to adjust the MVs when STR is applied. When

transcoding without spatial/temporal resolution reduction, the

SDTA architecture can be further simpliﬁed into Fig. 7(b), [2],

in which only one feedback loop is employed.

C. Frequency-Domain Transcoding

Exploiting the structural redundancy of the architecture in

Fig. 7 and the linearity of the DCT/IDCT, a structurally sim-

pler but functionally equivalent frequency-domain transcoding

architecture is possible [8], which can be further simpliﬁed [2],

[4], [23], as shown in Fig. 8. In this architecture, only VLD

and inverse quantization are performed to get DCT value of

each block in the decoder end. At the encoder end, the motion

AHMAD et al.: VIDEO TRANSCODING: AN OVERVIEW OF VARIOUS TECHNIQUES AND RESEARCH ISSUES 797

Fig. 9. Hybrid-domain transcoding architecture (HDTA).

compensated residue errors are encoded through re-quantiza-

tion, and VLC. The reference frame memory in the encoder end

stores the DCT values after inverse quantization, that are then

fed to the frequency-domain MC module to reduce drift error.

This is referred to as frequency-domain transcoding architecture

(FDTA).

In this architecture, motion compensation is performed in

the frequency domain using a MV reusing algorithm. Detail

frequency domain MC algorithm can be found in [2] and [31].

An FDTA may need less computation but suffer from the

drift problem due to nonlinearity operations, which includes

subpixel motion compensation, and DCT coefﬁcients clipping

during MC. FDTAs also lack ﬂexibility and are mostly ﬁtted

for bi-rate transcoding. Recently, researchers have studied

frequency-domain motion estimation that may eliminate some

of these constraints [18].

D. A Hybrid-Domain Transcoding Architecture

Various transcoding algorithms provide tradeoff between the

computational complexity and reconstructed video quality. In

order to reduce the computational complexity while maintain

the reconstructed video quality, ME should be omitted and

DCT/IDCT should be avoided if possible. For example, the

architecture in [45] uses MC for P frames only. I frames are

intra coded, which need no ME and MC, and thus, IDCT/DCT

for I frames can be omitted in principle. But since I frames are

the anchors for subsequent P and B frames, the IDCT at the

decoder stage, inverse quantization and IDCT at the encoder

stage for I frames are still needed to reconstruct the reference

frames, while DCT at the encoder stage can be omitted. Since

P frames are also the anchors for the following P and B frames,

MC, DCT, and IDCT cannot be omitted. For B frames, which

are not the reference frames for the subsequent frames, drift

error generated in B frames would not propagated through the

video sequence, so MC of B frames can be removed without

introducing signiﬁcant degradation on visual quality of re-

constructed pictures. Thus, DCT/IDCT in all B frames can be

omitted, and the transcoding of B frames can be directly done

in the DCT domain.

We can further reduce the transcoding delay without de-

grading the video quality in this architecture. P frames with

frequent scene changes and rapid motion may contain a large

number of INTRA blocks. One can further omit the IDCT/DCT

and MC operation of these INTRA blocks in P frames. In other

words, blocks of I and B pictures and INTRA blocks of P pic-

tures are transcoded in frequency-domain, the spatial-domain

motion compensation is done only when the block is inter block

in P frames. We call this transcoding architecture as hybrid

domain transcoding architecture (HDTA), as shown in Fig. 9.

From the simulation results in [45], compared to SDTA with

MV reused, the HDTA has less complexity, which speeds up the

transcoding operation, but has the expense of some degradation

in picture quality. Compared with frequency domain transcoder,

this transcoder performs DCT/IDCT and MC when the block is

INTER block in P frames, which may increase the transcoding

delay but has better visual quality.

IV. H

OMOGENEOUS VIDEO

TRANSCODING

Homogeneous transcoding performs conversion between

video bitstreams of the same standard. A high quality source

video may be transcoded to a target video bitstream of lower

quality, with different spatial/temporal resolutions, and dif-

ferent bit rates. The following subsections describe some of the

research issues in homogeneous transcoding.

A. Reducing Bits With Fixed Resolution

For ﬁxed spatial and temporal resolution, we can reduce the

bit rate using the following two techniques:

Re-Quantization: A simple technique to transcoding a video

to lower bit rate is to increase the quantization step at the encoder

part in the transcoder [26], [35], [43]. This decreases the number

of nonzero quantized coefﬁcients thus decreasing the amount of

bits in the outgoing bitstream. Requantizing is a good compro-

mise between the complexity and reconstructed image quality,

and can control the bit-rate reduction.

Selective Transmission: Since most of the energy is con-

centrated at the lower frequency band of an image, discarding

(truncating) some of the higher ac frequency coefﬁcients [1],

[30], [34] can preserve the picture quality, but may introduce a

blocking effect in the reconstructed target video.

B. Spatial Resolution Reduction

Reduction in spatial resolution can obviously lower the

bit rate. In this subsection, we describe some common video

transcoding techniques.

Filtering and Subsampling: Filtering and subsampling are

common techniques to reduce spatial resolution [24], [30], [48].

Shanableh [30] proposed a ﬁlter that can be used both hori-

zontal and vertical directions for luminance and chrominance;

the image is then down-sampled by dropping every alternate

pixel in the both horizontal and vertical directions.

Pixel Averaging: Pixel averaging [30] is another common

technique in which every m

m pixels are represented by

a single pixel of their average value. Pixel averaging is the

simplest method but the reconstructed pictures may become

blurred.

Video transcoding: an overview of various techniques and research issues

Figures

Citations

Video Adaptation for Small Display Based on Content Recomposition

Vlogging: A survey of videoblogging technology on the web

Video Coding on Multicore Graphics Processors

System And Method For Transcoding Between Scalable And Non-Scalable Video Codecs

Control Plane Architecture for Multicast Cache-Fill

References

Draft ITU-T recommendation and final draft international standard of joint video specification

Overview of fine granularity scalability in MPEG-4 video standard

Adapting multimedia Internet content for universal access

Manipulation and compositing of MC-DCT compressed video

Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats

Related Papers (5)

Video transcoding architectures and techniques: an overview

Digital Video Transcoding

Overview of the H.264/AVC video coding standard

Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

Frequently Asked Questions (2)

Q1. What have the authors contributed in "Video transcoding: an overview of various techniques and research issues" ?

Q2. What are the future works in "Video transcoding: an overview of various techniques and research issues" ?