What is the encoding time of each frame?

The limited encoding time of each frame limits the available computational complexity of each frame, CFC, which can be defined as fr C TCC PRCFCPRCFC (2) where CPRC represents the clock rate of the processor.

(Open Access) Coding-gain-based complexity control for H.264 video encoder (2008) | Ming-Chen Chien

Q: How many CSLs can be allocated before the slices are encoded?

before the slices are encoded, by measuring CFinit and reserving CFother, CSLs can be allocated byThe operation of the slice layer is very simple.

Q: What is the simplest way to compute a video?

Rate (kbits)Y PS NR (db)rate-distortion curves under various complexity constraintsno constraint Cfc = 72% Cfm Cfc = 66% Cfm Cfc = 58% cfm Cfc = 48% cfm [6]

Q: What is the coding gain of 4x4 Intrafor inter frames?

Table II reveals that the coding gain of 4x4 Intrafor inter frames is very low, because most MBs in the inter frame choose inter mode as the best mode.

CODING-GAIN-BASED COMPLEXITY CONTROL FOR H.264 VIDEO

ENCODER

Ming-Chen Chien

1,2

, Zong-Yi Chen

, and Pao-Chi Chang

1 Department of Communication Engineering, National Central University, Taiwan

2 Department of Electrical Engineering, Chin Min Institute of Technology, Taiwan

FCF

RDJ



minmin

(3)

ABSTRACT

The allowable computational complexity of video encoding is

limited in a power-constrained system. Different video frames

are associated with different motions and contexts, and so are

associated with different computational complexities if no

complexity control is utilized. Variation in computational

complexity leads to encoding delay jittering. Typically motion

estimation (ME) consumes much more computational

complexity than other encoding tools. This work proposes a

practical complexity control method based on the complexity

analysis of an H.264 video encoder to determine the coding

gain of each encoding tool in the video encoder. Experiments

performed on a programming optimized source code show that

the computational complexity associated with each frame is

well controlled below a given limit with very little R-D

performance degradation under a reasonable constraint

comparing to the unconstrained case.

where D denotes distortion; R denotes bit rate; Ȝ denotes the

Lagrange multiplier; J denotes the R-D cost, and c

denotes the

complexity used for a frame.

Traditionally, the complexity constraint is computed in the

frame layer as described above. For typical MPEG-like video

encoders, a frame is partitioned into a number of MBs while an

MB is the basic encoding unit. Different MBs have various

motions and contexts and hence are associated with different

complexities. Therefore, the allocation of C

among MBs is a

critical problem. Typically, MPEG-like video encoders use

many encoding tools, such as ME, DCT, Q, entropy coding and

others. Different encoding tools may exhibit substantially

different coding efficiency. Accordingly, allocating complexity

among encoding tools is another key problem.

A metric of coding gain which represents the coding

efficiency has been proposed [4] as follows:

Index Terms—Complexity control, complexity allocation,

video encoder, H.264

/CG J C ' ' (4)

ǻJDȜ

'  ' (5)

where C' represents the increase in complexity when an

encoding tool is adopted;

represents the decrease in

distortion;

' represents the decrease in rate, and Ȝ is the

Lagrange multiplier. However, a proper

Ȝ is not easily

determined. When the rate control is turned on for a target rate,

becomes nearly zero, and

' equals

1. INTRODUCTION

The real-time video encoding is an important element for many

applications over various wireless networks. To avoid encoding

delay jittering, the available encoding time of each video frame,

, is limited in the real-time video encoding system and can

be defined as

/CG D C ' ' (6)

(1)

A few works of complexity control have been conducted

[2],[3],[4],[5]. The optimization formula of the first C-R-D

model [2] is too complicated to be solved in closed form. Also,

an MHM-based method for allocating complexity for ME

among MBs, which was not optimal, was also proposed in that

study. A statistical optimal operation mode for a sequence in a

complexity-constrained video encoding system has also been

proposed

[3]. However, an optimal operation mode could be

optimal for a frame but inadequate for another frame. A

complexity allocation method for ME based on the cost-

complexity curve has been proposed [4]. A C-R-D optimization

for H.264 ME has also been proposed [5]. It proposed two

Lagrange multipliers to terminate the complexity-inefficient

ME rounds and thus increase coding efficiency. Typically ME

consumes most complexity with a large variation between MBs.

In general, optimal complexity control algorithms are difficult

where fr represents the frame rate. The limited encoding time of

each frame limits the available computational complexity of

each frame, C

, which can be defined as

TCC

PRC

FCPRCFC



(2)

where C

PRC

represents the clock rate of the processor. However,

the C

PRC

of the processor embedded in wireless handsets is

limited and hence C

is also limited.

Optimal complexity control aims to control the encoding

complexity of each frame under a given limit while achieving

optimal R-D performance as follows:

to apply to practical real-time video encoders because of their

large computational overhead. To the best of our knowledge, no

practical complexity control that is efficiently enough and

operates in real time exists for an H.264 video encoder.

Table II.

Coding gain of each encoding tool

Based on complexity analysis of a programming optimized

H.264 code, X264

[10], this work proposes a simple and

practical complexity control method which can control the

encoding complexity of each frame under a given limit while

achieving very good R-D performance.

This paper is organized as follows. Section 2 proposes a

practical complexity control method based on the results of

complexity analysis. Section 3 presents experimental results,

and section 4 draws conclusions.

2. PROPOSED COMPLEXITY CONTROL

For a typical MPEG-like video encoder, Figure 1 displays the

encoding block diagram of an MB. DCT, Q, Q

-1

, IDCT have

been collectively denoted by PRECODING [2]. This paper

follows this notation, and divides the encoder into three major

encoding tools - ME, PRECODING, and entropy coding.

Fig. 1 Basic block diagram of a video encoder

Highly efficient complexity control should be performed

by allocating complexity to the encoding tools with higher

coding gain. This work conducts experiments with the options

presented in Table I to analyze the coding gains of various

encoding tools in the modern H.264 encoder. The metric of

coding gain is given by (6), where is represented by

, which represents the increase in PSNR, and

PSNR' C'

measured by the number of CPU clocks spent on

a piece of

code. Table II presents the results, which will be discussed in

the following subsection.

Table I.

Options for complexity analysis

Video source Foreman QCIF, Carphone

QCIF

Fast ME Diamond

Target rate 103k bps

Frame rate 20

Number of reference frames 1

GOP type IPPPP

CPU Intel Pentium 4 2.66G Hz

RAM 512M bytes

MMX tech. On for SAD computation

Source code of H.264 X264

Encoding tool Coding gain

(db/kclks)

CABAC (compare to CAVLC) 9.17e-4

half pixel ME 2.88e-3

Deblocking filter 8.54e-4

Quarter pixel ME 4.45e-4

8x8 partition mode 1.42e-4

16x8 & 8x16 partition mode 4.63e-5

Sub8x8 partition mode 4.7e-5

4x4 Intra 5.22e-6

5 reference frames 4.06e-5

2.1. Complexity Allocation

The complexity allocation allocates complexity from frame

layer to MB layer. It should be performed before the first MB

in a frame is encoded. When the video encoder starts to encode

a frame, it should do some initialization before encoding slices.

Complexity control records the complexity consumed by the

initialization, which is denoted by

Finit

. The complexity budget

of encoding all slices in a frame is

SLs

. After the slices are

encoded, deblocking filtering can be performed; it is followed

by updating references and other necessary tasks. The

complexity of these tasks after the encoding of slices,

Fother

should be reserved. The deblocking filter is suggested to be

adopted because it has high coding gain as shown in Table II

and proposed elsewhere [6].

Fother

is smaller than C

SLs

displayed in Fig. 2, and it does not vary greatly. It can be

regarded as a constant and can be estimated from the previous

frame. Accordingly, before the slices are encoded, by

measuring

Finit

and reserving C

Fother

, C

SLs

can be allocated by

FotherFinitFCSLs

CCCC 

(7)

0 100 200 300 400

500

1000

frame number

CPU clocks (k)

frame init

0 100 200 300 400

x 10

frame number

CPU clocks (k)

slices encode

0 100 200 300 400

1000

2000

3000

4000

frame number

CPU clocks (k)

others

Fig. 2 Complexity consumption in the frame layer

The operation of the slice layer is very simple. Only a

short slice header is added. The complexity of encoding all

slice headers in a frame is small and can be treated as a constant.

It is denoted by

SLhs

. Therefore, the complexity of encoding all

MBs in a frame,

MBs

, can be allocated according to

Bs SLs SLhs

CCC  (8)

2137

Each MB can adopt ME, PRECODING and entropy

coding. Typically, ME consumes most of the

complexity, as

shown in Fig. 3. It is the main object on which complexity

control will be performed. The modern entropy coding tool

CABAC has a high coding gain, as shown in Table II and

elsewhere [6]. Its adoption is recommended. The modern video

encoding standard H.264 significantly simplifies DCT

operation [6]. Hence, PRECODING has high coding gain, and

is destined to be adopted. Some early termination algorithms

for PRECODING have been proposed to skip the

PRECODING for the MB with small residual signals [11]. All

such algorithms with high efficiency can be utilized. As

described above, the complexity for PRECODING and entropy

coding should be reserved. The complexity budget

MEs

can be

allocated using

MEs MBs MBother

where C

MBother

denotes the complexity reserved for

PRECODING and entropy coding of a MB and

M is the

number of MBs in a frame. Figure 3 shows

MBother

is relatively

small and its variation is much smaller than

, the

complexity for ME of a MB. Therefore,

MBother

can be treated

as a constant and can be estimated statistically by running test

video sequences in advance. The complexity compensation

described below will eliminate the estimation error.

CCC M  u (9)

0 100 200 300 400

frame number

CPU clocks (k/8)

MB init

0 100 200 300 400

500

1000

1500

2000

frame number

CPU clocks (k/8)

0 100 200 300 400

100

150

200

250

frame number

CPU clocks (k/8)

precoding

0 100 200 300 400

100

200

300

400

frame number

CPU clocks (k/8)

entropy coding

Fig. 3 Complexity consumption in the MB layer

The complexity allocation for ME among MBs is

suggested to be weighted by

COST0 as

COST

CiC

MEsME

,...,2,1,

)(

(10)

where

COST0 represents the cost of ME with zero MV in

16x16 partition mode. This equation is simple but meaningful

because

COST0 contains information about context and motion.

Since the MB with larger motion or more complex context has

larger

COST0, it deserves larger complexity budget. Otherwise,

a larger bit rate and larger distortion will be generated.

2.2. ME Flow in Decreasing order of CG

According to the coding gain in Table II, the ME flow in Fig. 4

is suggested. The resulting operation order is similar to that

suggested elsewhere [5] but the adoption of 4x4 Intra prediction

is different. Table II reveals that the coding gain of 4x4 Intra

for inter frames is very low, because most MBs in the inter

frame choose inter mode as the best mode. However, 4x4 Intra

prediction is beneficial to MBs that choose the Intra mode. The

tendency to Intra mode is examined by comparing 16x16 ME

and 16x16 Intra prediction. If the 16x16 Intra prediction yields

a better performance, 4x4 Intra prediction can be utilized to

reduce the residual signal. Otherwise, 4x4 Intra prediction is

not used.

Fig. 4 ME flow in decreasing CG of encoding tools

2.3. Complexity Check and Compensation

After each computation of SAD and the R-D cost, the used

complexity

MEused

is examined. If C

MEused

exceeds C

, the ME

process terminates. Otherwise, the ME process continues.

Any efficient early termination algorithm for

PRECODING can be employed. Complexity compensation

described below will distribute the saved complexity.

After the whole process of the MB encoding is complete,

the balance

MBbalance

between the used complexity C

MBused

and

the budget

is given by

Bbalance MB MBused

CCC  (11)

where

is obtained by

BMEMBothe

CCC 

(12)

Then

MBbalance

is distributed uniformly to the remaining MBs in

that frame.

3. EXPERIMENTAL RESULTS

The options of experiments for the proposed practical

complexity control are shown in Table III. The complexity

metric is the number of

CPU clocks used by an encoding tool,

as measured by the ‘rdtsc’ instruction of an Intel CPU [7].

Figure 5 indicates that the complexity is well controlled

under the given

limit. The complexity of each frame rarely

exceeds the bound. Figure 6 and 7 show that the rate and PSNR

2138

under complexity control are both very close to those in the

unconstrained case. Figure 8 plots the R-D performance with

Foreman video sequence under various complexity constraints,

where Cfm denotes the maximum complexity of a frame

without complexity constraint. When

is down to 72% of

Cfm, the PSNR obtained by this algorithm only degrades less

than 0.5 dB at the same rate. When

is down to 58% of Cfm,

the PSNR obtained by this algorithm degrades no more than 1

dB at the same rate. Experiments with another video source

‘Carphone’ yield similar results.

0 50 100 150 200 250 300 350 400

0.5

1.5

2.5

3.5

x 10

frame number

CPU clocks (k)

complexity control for a fixed frame rate

adopt complexity control

without complexity control

complexity bound

Table III.

Fig. 5 Comparisons of computational complexity with

and without complexity control

Options for complexity control

Fmax

(clk)

source QP Rate

control

Fast ME Complexity

metric

23 M Foreman 29 off Diamond CPU clock

0 50 100 150 200 250 300 350 400

500

1000

1500

2000

2500

3000

rate

frame number

rate (frame size)

adopt complexity control

without complexity control

4. CONCLUSION AND FUTURE WORK

This work proposes an efficient complexity control

method with very little degradation of R-D performance. The

proposed method, which has very low overhead, is also very

practical.

5. REFERENCES

Fig. 6 Comparisons of rate with and without complexity

control

[1] “Draft ITU-T recommendation and final draft international

standard of joint video specification (ITU-T rec. H.264/ISO/IEC

14496-10 AVC)”, in JVT of ISO/IEC MPEG and ITU-T VCEG,

JVT-G050, 2003.

0 50 100 150 200 250 300 350 400

33.5

34.5

35.5

36.5

37.5

Ypsnr

frame number

YPSNR (db)

adopt complexity control

without complexity control

[2] Z. He and Y. F. Liang, “Power-Rate-Distortion analysis for

wireless video communication under energy constraints,” IEEE

Trans. Circuits Syst. Video Technol., vol. 15, no. 5, pp. 645-658,

May 2005.

[3] D. N. Kwon and P. F. Driessen, “Performance and

computational optimization in configurable hybrid video system,”

IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 1, pp. 31-

42, Jan. 2006.

[4] C. Kim and J. Xin, “Hierarchical complexity control of motion

estimation for H.264/AVC,” MITSUBISHI ELECTRIC

RESEARCH LABORATORIES, TR2006-004, Dec. 2006.

Available: http://www.merl.com

Fig. 7 Comparisons of YPSNR with and without

complexity control

[5] Y. Hu, Q. Li, S. Ma, and C. C. J. Kuo, “Joint rate-distortion-

complexity optimization for H.264 motion search,” in Proc. ICME

2006, pp. 1949-1952.

2 3 4 5 6 7 8 9

32.5

33.5

34.5

35.5

36.5

Rate (kbits)

YPSNR (db)

rate-distortion curves under various complexity constraints

no constraint

Cfc = 72% Cfm

Cfc = 66% Cfm

Cfc = 58% cfm

Cfc = 48% cfm

[6] E. G. Richardson, H.264 and MPEG-4 Video Compression.

John Wiley & Sons, 2003.

[7] http://www.intel.com

[8] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F.

Pereira, T. Stockhammer, and T. Wedi, “Video coding with

H.264/AVC: Tools, performance, and complexity,” IEEE Circuits

Syst. Mag., vol. 4, no. 1, pp. 7-28, Apr. 2004.

[9] Joint Model reference software version 10, Available:

http://iphome.hhi.de/suehring/tml/index.htm

[10] x264, Available: http://developers.videolan.org/x264.html.

[11] Z. Chen, P. Zhou, and Y. He, “Fast Integer Pel and Fractional

Pel Motion estimation in for JVT,” JVT-F017r1.doc, Joint Video

Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, 6

meeting,

Awaji, Island, JP, 5-13 Dec. 2002.

Fig. 8 R-D performance under various complexity

constraints

2139

Coding-gain-based complexity control for H.264 video encoder

Figures

Citations

Optimized allocation of multi-core computation for video encoding

Computational complexity allocation and control for inter-coding of high efficiency video coding with fast coding unit split decision

Optimal model-based complexity control for H.264 video encoding

Resource Constrained Video Coding Systems

Complexity control for high-efficiency video coding by coding layers complexity allocations

References

H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia

Draft ITU-T recommendation and final draft international standard of joint video specification

H.264 and MPEG-4 Video Compression

Video coding with H.264/AVC: tools, performance, and complexity

Power-rate-distortion analysis for wireless video communication under energy constraints

Related Papers (5)

Global Motion Assisted Low Complexity Video Encoding for UAV Applications

Complexity Scalable H.264/AVC Encoding

Adaptive algorithms for variable-complexity video coding

Flexible distribution of complexity by hybrid predictive-distributed video coding

Distributed Video Coding with Shared Encoder/Decoder Complexity

Frequently Asked Questions (10)

Q1. What are the contributions mentioned in the paper "Coding-gain-based complexity control for h.264 video encoder" ?

Q2. What is the complexity budget of encoding all slices in a frame?

Q3. What is the definition of a jittering encoding system?

Q4. How many CSLs can be allocated before the slices are encoded?

Q5. What is the encoding time of each frame?

Q6. What is the definition of a complexity control algorithm?

Q7. What is the purpose of this paper?

Q8. What is the simplest way to compute a video?

Q9. What is the complexity compensation for MBs?

Q10. What is the coding gain of 4x4 Intrafor inter frames?