Out-of-the-loop information hiding for HEVC video

doi:10.1109/ICIP.2015.7351477

biblio.ugent.be

The UGent Institutional Repository is the electronic archiving and dissemination platform for all

UGent research publications. Ghent University has implemented a mandate stipulating that all

academic publications of UGent researchers should be deposited and archived in this repository.

Except for items where current copyright restrictions apply, these papers are available in Open

Access.

This item is the archived peer-reviewed author-version of:

Out-of-The-Loop Information Hiding for HEVC Video

Luong Pham Van, Johan De Praeter, Glenn Van Wallendael, Jan De Cock, and Rik Van de Walle

In: 2015 International Conference on Image Processing (ICIP), 3610 - 3614, 2015.

To refer to or to cite this work, please use the citation to the published version:

Pham Van, L., De Praeter, J., Van Wallendael, G., De Cock, J., and Van de Walle, R. (2015). Out-of-

The-Loop Information Hiding for HEVC Video. 2015 International Conference on Image Processing

(ICIP) 3610 - 3614.

OUT-OF-THE-LOOP INFORMATION HIDING FOR HEVC VIDEO

Luong Pham Van, Johan De Praeter, Glenn Van Wallendael, Jan De Cock, and Rik Van de Walle

Ghent University - iMinds - Multimedia Lab, Ghent, Belgium

ABSTRACT

Communication using internet and digital media is more

and more popular. Therefore, the security and privacy of data

transmission are highly demanded. One effective technique

providing this requirement is information hiding. This tech-

nique allows to conceal secret information into a video ﬁle, an

audio, or a picture. In this paper, we propose a low complexity

out-of-the-loop information hiding algorithm for a video pre-

encoded with the high efﬁciency video coding standard. Only

selected components such as the motion vector difference and

transform coefﬁcients of the video are extracted and modi-

ﬁed, bypassing the need of fully decoding and re-encoding

the video. In order to reduce the propagation error caused by

hiding information, the dependency between video frames is

taken into account when distributing the information over the

frame. Several embedding strategies are investigated. The ex-

perimental results show that the information should be hidden

in smaller blocks to reduce quality loss. Using a smart distri-

bution of information across the frames can keep the quality

loss under 1 dB PSNR for an information payload of 15 kbps.

When such a strategy is used, embedding information in the

transform coefﬁcients only slightly outperforms the modiﬁca-

tion of motion vector differences.

Index Terms— Data Hiding, High Efﬁciency Video Cod-

ing, Motion Vector Difference, DCT coefﬁcients.

1. INTRODUCTION

With the current technology, people can easily and ﬂexibly

communicate through internet and digital media. Therefore,

transmission of digital data needs to be more secure, especial

for banking and military information. One effective solution

is information hiding which conceals the secret message into

video. In general, the information is embedded to the video

without changing its perceptual quality. Therefore, only the

sender and receiver can realize the existence of the informa-

tion in video.

During the last decade, many information hiding algo-

rithms have been proposed for existing video coding stan-

The activities described in this paper were funded by Ghent Univer-

sity, iMinds, the Agency for Innovation by Science & Technology (IWT),

the Fund for Scientiﬁc Research (FWO Flanders), and the European Union,

and were carried out using the Stevin Supercomputer Infrastructure at Ghent

University.

dards (e.g. MPEG, H.264/AVC). These techniques usually

map the input information to a component of the video such

as the discrete cosine transform (DCT) coefﬁcients [1, 2], mo-

tion vectors [3], and intra prediction mode [4, 5]. However, to

the best of our knowledge, there has been little focus on wa-

termarking for the recently ﬁnalized High Efﬁciency Video

Coding (HEVC) standard [6]. In [7], the information is hid-

den in the coding block size during the encoding loop of the

HEVC encoder. Although this technique prevents propaga-

tion errors, the complexity is high due to the decoding and

re-encoding step needed to embed the information into a pre-

encoded video bit stream. Additionally, a specialized encoder

is needed for the embedding. If unique information needs to

be inserted into multiple copies of the video, such an approach

would require the computationally expensive encoding step to

be executed multiple times. As such, this approach does not

scale well. Since embedding the information during the en-

coding loop becomes infeasible, the information should be in-

serted directly into the bit stream, outside the encoding loop.

In this paper, we propose a low-complexity out-of-the-

loop technique for information hiding by inserting the data

into a pre-encoded HEVC video stream without fully decod-

ing and encoding. Instead, only a low-complexity entropy

decoding and encoding is required to access and modify se-

lected bit stream components (DCT coefﬁcients, motion vec-

tor differences). Additionally, the propagation error of insert-

ing information is analyzed. To decrease this error, the infor-

mation is distributed over the different frames based on the

inter-prediction dependencies between frames.

The rest of the paper is organized as follows. Section 2

brieﬂy reviews the coding modes in HEVC. In Section 3, the

proposed information hiding technique for pre-encoded video

streams is elaborated on. The experimental results and anal-

ysis are presented in Section 4. Finally, conclusions are ad-

dressed in Section 5.

2. HEVC CODING STRUCTURE

The HEVC standard supports large coding block sizes with

a ﬂexible partitioning scheme. The biggest block (typically

64x64 pixels), known as a coding tree unit (CTU), is recur-

sively split into smaller coding units (CUs) [8]. This CU par-

titioning process is performed for CUs from depth 0 (CTU) to

depth 3 (8x8 pixel CU). Each CU is further split into the pre-

diction units (PU) for inter- and intra-prediction, and trans-

3610

ICIP 2015

6 7

16

10

5

11

9

12

14

15

13

Directly referring frame

Indirectly referring frame

Fig. 1. A part of the reference map of inter-coded frames

using the random access conﬁguration.

form unit (TU) for residual coding. For each out of eight

possible PU partitioning modes, best block matching is per-

formed to ﬁnd the best matching block of the current PU in

reference frames. This process results in a prediction error

(residual) and a motion vector of the PU. The difference be-

tween this motion vector and the motion vector of a neigh-

bouring encoded PU is encoded. The prediction error of the

CU is transformed to the DCT domain by using a squared

Residual Quad-Tree (RQT). The RQT is evaluated from depth

0 (32x32 pixels) to depth 3 (4x4 pixels). These transform co-

efﬁcients are then quantized and entropy encoded.

3. PROPOSED TECHNIQUES

In the proposed techniques, the information is hidden in the

compressed domain, so without a full decoder-encoder loop.

To achieve this, the syntax elements of the video stream are

modiﬁed to include the input information. To modify the syn-

tax elements, only a low-complex entropy decoding and en-

coding needs to be performed contrary to full decoding and

encoding. The main challenge is thus to determine the opti-

mal type and amount of bit stream elements to modify. Since

a bit stream contains many motion vectors and DCT coefﬁ-

cients, this paper investigates the performance of hiding in-

formation in these venues.

Since modifying the syntax elements of the bit stream

causes a mismatch in coding information between encoder

and decoder, errors are propagated throughout the video.

Therefore, the following section proposes a selection strategy

to select the distribution of information across the blocks and

frames in the bit stream. Thereafter, the techniques to modify

the motion vectors and DCT coefﬁcients are described.

3.1. Selection of blocks to hide information

The selection of blocks which contain the hidden informa-

tion depends on the amount of the information that needs to

be hidden. This selection criterion works on two levels: the

frame level and the intra-period level.

At the frame level, the visual quality loss of an individual

frame should be minimized when adding information. Due

to the characteristics of the human visual system, the qual-

ity loss caused by the blocking artefacts resulting from hiding

the input information is more visible in smooth areas while

it is hard to detect in complex areas. These smooth areas

are often encoded by using big blocks. In contrast, complex

areas are coded using small blocks. Therefore, to minimize

the visual quality loss, information should be hidden in small

blocks within a frame. The bigger block sizes are thus only

considered when the amount of added information is high.

At the intra-period level, the error propagation between

frames should be minimized. This error propagation is caused

by adding information to frames that are used by other frames

as reference pictures for inter-prediction. E.g., the errors in-

troduced in frame A by adding information may propagate

to another frame B when frame B relies on frame A for

inter-prediction. As such, frame B is a referring to frame A,

whereas frame A is referred by frame B.

The inﬂuence on error propagation of a frame is measured

by the number of other frames directly or indirectly referring

to this frame. These dependencies are determined by the cod-

ing structure of the video. For instance, HEVC supports hier-

archical prediction, which means that frames can be classiﬁed

according to different levels of dependencies. In this predic-

tion structure, intra frames are independently encoded and are

referred by inter frames. An inter frame between two succes-

sive intra frame can be referred by one or more other frames.

Therefore, any errors in this frame can propagate to referring

frames. Finally, some frames are not referred by any other

frame. Consequently, the errors in these frame do not affect

the other frames.

In Fig. 1, a part of the referring map of frames between

two successive intra frames encoded using the random access

conﬁguration [9] is drawn. The intra-period is 32 and the size

of the group of pictures (GOP) is 8. As seen in this ﬁgure,

frame 6 has ﬁve directly referring frames. It also has sev-

eral indirectly referring (by the way of frame 10, 12, and 16)

frames. On the other hand, the odd-numbered frames have no

referring frames.

Before adding information to a video stream, the error

propagation inﬂuence of each frame is evaluated. Using this

inﬂuence, a frame is classiﬁed into a high, medium or low in-

ﬂuence layer. The hidden information is allocated differently

for each layer: more information is added to frames in low

inﬂuence layers whereas less information is added to frames

in high inﬂuence layers. Within the same layer, the embedded

information is equally distributed over the frames.

3.2. Information embedding

The information embedding process uses the odd-even crite-

rion [10]. The modiﬁed value of the syntax where the infor-

mation hidden is odd if the input bit is 1. Otherwise, this value

is even. If x is the original value of the syntax element (e.g.

3611

Table 1. Frame classiﬁcation in dependency layers.

Layer Frame(number of referring frames)

L0 8(16), 16(16), 24(14)

L1 4(11), 6(9), 12(11), 20(11), 28(9)

L2 The others

0.0

1.5

3.0

4.5

6.0

0 200 400 600 800

1000

PSNR reduction[dB]

Payload[kbits]

MVDd1

MVDd2

MVDd3

MVDp

The optimal strategy

to increase payload

MVD

d1

MVD

d2

MVD

d3

MVD

p

Fig. 2. Visual quality lost and information payload when

modifying motion vector differences of videos.

motion vectors, or non-zero DCT coefﬁcients), and w is the

input bit, then the modiﬁed value x’ is obtained as:

x



= sgn(x) ∗|x| /2∗2+w with x =0

Additionally, if x’ equals 0, no information is hidden to

ensure that all information can be detected in the decoder.

4. EXPERIMENTAL RESULTS

The experiments evaluate the performance of the proposed in-

formation embedding techniques in terms of information ca-

pacity and visual quality loss. Moreover, a comparison be-

tween hiding information in transformed coefﬁcients and mo-

tion vector difference is made.

A total of 23 sequences with a playtime of 10 seconds

have been used to test the information hiding algorithms [9].

Of these, 21 sequences have input resolutions varying from

416x240 up to 1920x1080 pixels while the two sequences

Trafﬁc and PeopleOnStreet have a resolution of 3840x2048

pixels. These sequences are ﬁrst encoded using the HEVC

reference software HM 16 [11]. Evaluation is based on the

random access conﬁguration (RA). The intra period is set to

32 such that stream switching or error recovery can be pro-

vided. The quantization parameter is selected from the fol-

lowing set {22, 27, 32, 37}. Thereafter, the proposed solution

embeds random information in the motion vector difference

or the transform coefﬁcients. The modiﬁed bit stream is then

reconstructed using a normal decoder. The PSNR of the re-

constructed video is obtained using the original video as the

0.0

0.5

1.0

1.5

2.0

2.5

050100150

200

PSNR reduction[dB]

Payload[kbits]

DCTd1

DCTd2

DCTd3

DCTp

The optimal strategy to increase pay load

DCT

d1

DCT

d2

DCT

d3

DCT

p

Fig. 3. Visual quality lost and information payload when

modifying DCT coefﬁcients of videos.

reference. The PSNR reduction is calculated by subtracting

the PSNR of an unmodiﬁed stream with the PSNR of a mod-

iﬁed stream which includes the embedded information.

4.1. Payload and PSNR reduction analysis

This experiment measures PSNR reduction when the amount

of input information increases. Three different embedding

strategies have been evaluated: modiﬁcation of motion vector

differences in CUs at depth 3 (MV D

d3

), or CUs at depth

3 and depth 2 (MV D

d2

),or CUs at both depth 3, 2, and 1

(MV Dd1). Similarly, DCT coefﬁcients are modiﬁed for only

TUs with size 4x4 (DC T

d3

) or TUs with size 4x4 and 8x8

(DC T

d2

), or TUs with sizes 4x4, 8x8, and 16x16 (DC T

d1

).

The amount of added information is increased from 10% to

100% of all available motion vectors or DCT coefﬁcients in

steps of 10% in every inter-coded frame.

On the other hand, the intra-period level distribution strat-

egy explained in Section 3 is carried out (MV D

p

and DCT

p

).

Using this scheme, the inter-coded frames are classiﬁed into

three layers based on the number of referring frames as shown

in Table 1. When the distance in picture order count between

two frames is larger than 10, the inﬂuence between them is

considered as 0, since blocks are much more likely to refer to

closer frames. No input information is not hidden in layer L0.

In layer L1, only blocks at depth 3 and depth 2 are used for

embedding information. Blocks at depth 3, depth 2, and depth

1 are used in layer L2. Thirty-six combinations are tested

by varying the amount of added information in both layer L1

and L2 from 0% to 100% of all available motion vectors or

DCT coefﬁcient in steps of 20%. The experimental results

are shown in Fig. 2 and Fig. 3. Three important conclusions

are drawn from these ﬁgures.

Firstly, adding information to small blocks performs bet-

ter than adding it to larger blocks. With the same amount of

embedded information, adding information to a block at depth

3612

0

3

6

9

12

0 8 16 24 32 40 48 56

64

PSNR reduction[dB]

Frame index

MVDd3

DCTd0

DCTp

MVDp

MVD

d3

DCT

d1

DCT

p

MVD

p

Fig. 4. Errors propagate strongly if dependencies between

frames are not taken into account (MV D

d3

and DC T

d3

).

3 results in a smaller PSNR reduction compared to adding in-

formation to blocks at depth 2 and depth 1. For instance, when

motion vectors are modiﬁed for video (Fig. 2) and only CU

depth 3 is considered, a PSNR reduction of 3.5 dB is obtained.

However, if the CUs at both depth 1 and 2 are modiﬁed to em-

bed the same amount of information, the PSNR reduction is

higher (4.5 dB).

Secondly, when taking frame dependencies into account,

layer L2 should be ﬁlled ﬁrst, since errors from this layer will

propagate less to other frames. When the amount of added

information becomes too much to contain in only L2, L1 also

starts ﬁlling up, which results in a more drastic PSNR reduc-

tion. The lines in Fig. 2 and Fig. 3 show that this strategy can

achieve a high payload with the smallest quality drop.

Finally, using a smart frame distribution strategy to mini-

mize error propagation performs better than adding informa-

tion equally to each frame. By using this strategy, the PSNR

reduction can be kept below 3 dB and 1 dB for MV D

p

and

DC T

p

, respectively.

4.2. Frame-by-frame analysis

In order to evaluate the propagation of quality loss and to

compare embedding data into motion information and DCT

coefﬁcients, the PSNR reduction of the 64 ﬁrst frames of Par-

tyScene is analysed after embedding 20 kbits in total (2 kbps)

using different methods. The video is encoded using QP 27.

The result is depicted in Fig. 4.

It can be seen in Fig. 4 that embedding the same amount

of information in motion information (MV D

d3

) results

in higher quality losses than modifying DCT coefﬁcients

(DC T

d3

). When a motion vector difference of a PU is mod-

iﬁed, the predicted block changes. Therefore, all pixels in

this PU are affected, resulting in a high visual quality loss.

In contrast, when the last non-zero DCT coefﬁcient is mod-

iﬁed, only the frequency corresponding to this coefﬁcient is

affected. In addition, the last non-zero coefﬁcient is usually

at a high frequency such that the quality impact is minimal.

(a) MVD

d3

(23.10 dB)

(b) MVD

p

(33.04 dB)

(c) DCT

d1

(31.28 dB)

(d) DCT

p

(32.23 dB)

Fig. 5. The visual quality of frame 106 of PartyScene (QP 27)

after inserting information.

By exploiting the dependencies between frames to dis-

tribute the information across several layers, MV D

p

and

DC T

p

result in very low PSNR losses. The variance of

PSNR losses is also small, which results in a smoother vi-

sual quality. Although DC T

p

performs slightly better than

MV D

d

, the capacity of MV D

p

is higher. Therefore, when a

lot of data must be added, MV D

p

can be used.

The visual quality of frame 106 of PartyScene (QP 27) af-

ter embedding 20 kbits is shown in Fig. 5. When MV D

d3

is used, artefacts can clearly be seen (e.g. on the wall in the

upper right corner of the picture). On the other hand, the qual-

ities of the other techniques are similar and better.

5. CONCLUSIONS

In this paper, we proposed a low complexity technique to em-

bed information into encoded HEVC video streams without

fully decoding and encoding the video. The experimental re-

sults show that quality loss can be minimized by adding in-

formation ﬁrst to the smallest blocks of each frame and by

taking frame dependencies into account when distributing in-

formation across frames. When a smart distribution of infor-

mation is applied across frames, modifying DCT coefﬁcients

only slightly outperforms adding information to motion vec-

tor difference.

6. REFERENCES

[1] A. Mansouri, A.M. Aznaveh, F. Torkamani-Azar, and

F. Kurugollu, “A Low Complexity Video Watermarking

3613

Out-of-the-loop information hiding for HEVC video

Figures

Citations

Anti-HEVC Recompression Video Watermarking Algorithm Based on the All Phase Biorthogonal Transform and SVD

Authentication and Copyright Protection of Videos Under Transmitting Specifications

Rate-Distortion-Preserving Forensic Watermarking Using Quantization Parameter Variation

A Motion Vector-Based Steganographic Algorithm for HEVC with MTB Mapping Strategy

Fast Fallback Watermark Detection Using Perceptual Hashes

References

Overview of the High Efficiency Video Coding (HEVC) Standard

Common test conditions and software reference configurations

Hiding data in images by simple LSB substitution

High Efficiency Video Coding: High Efficiency Video Coding

Block Partitioning Structure in the HEVC Standard

Related Papers (5)

Complete Video Quality-Preserving Data Hiding

A high bitrate information hiding algorithm for digital video content under H.264/AVC compression

Data hiding-based H.264 video transmission error code recovery method

A novel video coding scheme with frequency-domain-based conditional frame replenishment algorithm

Method for hiding HEVC video information