What are the future works mentioned in the paper "A survey of h.264 avc/svc encryption" ?

Thus further research should focus on the development of objective metrics for the assessment of the security of video encryption schemes for the different security and application scenarios. For the assessment of video encryption schemes in the application scenarios of transparent encryption and sufficient encryption state-of-theart objective quality metrics may be suited ( however, this has to be backed up by empirical evidence, i. e., subjective quality evaluation tests [ 58 ] ). For content confidentiality, however, novel intelligibility metrics as well as an evaluation framework for these metrics are needed ( again subjective tests will have to be an integral part ). Further efforts in the area of H. 264 encryption should also consider the standardization of security tools within H. 264.

(Open Access) A Survey of H.264 AVC/SVC Encryption (2012) | Thomas Stütz

A Survey of H.264 AVC/SVC Encryption

Thomas St¨utz Andreas Uhl

Technical Report 2010-10 December 2010

Department of Computer Sciences

Jakob-Haringer-Straße 2

5020 Salzburg

Austria

www.cosy.sbg.ac.at

Technical Report Series

A Survey of H.264 AVC/SVC Encryption

Thomas St

utz and Andreas Uhl

Abstract—Video encryption has been heavily researched in the

recent years. This survey summarizes the latest research results

on video encryption with a special focus on applicability and

on the most widely-deployed video format H.264 including its

scalable extension SVC. The survey intends to give researchers

and practitioners an analytic and critical overview of the state-of-

the-art of video encryption narrowed down to its joint application

with the H.264 standard suite and associated protocols (packag-

ing / streaming) and processes (transcoding / watermarking).

I. INTRODUCTION

H.264 is the most widely-deployed video compression sys-

tem and has gained a dominance comparable only to JPEG

for image compression. The H.264 standard has also been

extended to allow scalable video coding (as speciﬁed in

Annex G [27], referred to as SVC within this work) with a

backwards compatible non-scalable base layer (non-scalable

H.264 bitstreams referred to as AVC in this work). This

extension enables the implementation of advanced application

scenarios with H.264, such as scalable streaming and universal

multimedia access [69]. Given the dominant application of

H.264 as video compression system, the necessity of practical

security tools for H.264 is unquestionable. In this survey we

present an overview, classiﬁcation and evaluation of the state-

of-the-art of H.264 encryption, a topic to which numerous

proposals that have been made. The survey focuses solely on

H.264 AVC/SVC encryption and intends to give researchers

a brief, yet comprehensive survey and to aid practitioners in

the selection of H.264 encryption algorithms for their speciﬁc

application context. Furthermore, the survey identiﬁes the most

relevant research questions in the area of video encryption, that

still need to be answered in order to leverage the deployment

of H.264 encryption.

A secure approach to encrypt H.264, also referred to as

“naive” encryption approach, is to encrypt the entire com-

pressed H.264 bitstream with a secure cipher, e.g., AES [49],

in a secure mode, e.g., CBC (cipher block chaining mode).

There are well-founded reasons not to stick to this approach,

but to apply speciﬁcally designed encryption routines:

• The implementation of advanced application scenarios,

such as secure adaptation, transparent / perceptual en-

cryption and privacy preserving encryption.

• The preservation of properties and functionalities of

the bitstream, such as format-compliance, scalability,

streaming / packetization, fast forward, extraction of

subsequences, transcodablity, watermarking, and error

resilience.

• The reduction of computational complexity (especially in

the context of mobile computing).

Secure adaptation requires a scalable bitstream and speciﬁc en-

cryption routines that preserve the scalability in the encrypted

SPS PPS AVC SVC SVC

Scalable

Coding

Scalable

Encryption

Secure

Adaptation

SPS PPS AVC SVC SVC

(a) Secure adaptation

SPS PPS AVC SVC SVC

Scalable

Coding

Transparent

Encryption

Conventional

Decoding

SPS PPS AVC SVC SVC

(b) Transparent encryption

Fig. 1. Secure adaptation and transparent encryption

domain (see ﬁgure 1(a) ). Secure adaptation is the basis for

secure scalable streaming [70], where secure adaptation is

employed in a multimedia streaming scenario. A secure stream

for a mobile phone (low bandwidth, low resolution display,

low computing power) and a personal computer (high band-

width, high resolution display, high computing power) can be

generated from the same secure source stream (by secure adap-

tation) without the necessity of the secret key, thus enabling

creator-to-consumer security. Transparent encryption denotes

encryption schemes where a low quality can be decoded

from the ciphertext; this functionality can be implemented

with scalable bitstreams (see ﬁgure 1(b)) by encryption of

the enhancement layers. Privacy preserving encryption should

conceal the identify of persons, an exemplary implementation

is shown in ﬁgure 2.

The remainder of the paper is structured in the following

way: H.264 is brieﬂy summarized in section II. In section III

application scenarios of video encryption are discussed and

(b)

Figure 2: Scrambling for “Hall Monitor”:

Fig. 2. Privacy preserving encryption: DCT coefﬁcients permutation (ﬁgure

taken from [13], ﬁgure 2 (b), p.1171)

their corresponding different notions of security are motivated

and deﬁned. The application context also requires that the

video encryption scheme preserves functionality of the video

bitstream; details are discussed in this section as well, which

ends with the presentation of our evaluation criteria for a

video encryption scheme. In section IV H.264 compression

and encryption are discussed jointly in detail. This approach

of presentation was chosen to keep the level of redundancy

low. Having discussed implementation and technical issues of

H.264 video encryption schemes, section V proposes solutions

and discusses the proposed schemes with respect to the secu-

rity and application scenarios. Further research directions are

discussed in section VI and ﬁnally we conclude in section VII.

II. OVERVIEW OF H.264 AVC / SVC

The H.264 standard speciﬁes the syntax and semantics of

the bitstream together with a normative decoding process [27].

However, it is often and especially in the context of H.264

encryption more convenient to consider the encoding process.

The raw video data is input to the encoder, the output is the

bitstream in the NAL (network abstraction layer) format, i.e.,

a series of NAL units (see ﬁgure 3). The NAL units have a

plaintext header indicating the type of data in AVC as shown

in ﬁgure 4 in which the entire H.264 NAL header is outlined.

The NAL header consists of the forbidden zero bit (F), a 3-bit

ﬁeld signalling importance of the NALU (NRI), and the NAL

unit type (NUT). The most common NUTs are summarized in

table I, NALUs with a unspeciﬁed NUT have to be discarded

by the decoder.

These NAL units are commonly packaged in a container

format for transmission and storage. A typical H.264 encoder

has the structure as outlined in ﬁgure 6. Important parts are

motion estimation (ME in ﬁgure 6) and motion compensation

(MC in ﬁgure 6). Novelties in H.264 compared to previous

video coding standards are intra prediction (Intra pred in ﬁgure

6) and in-loop deblocking ﬁltering, i.e., reference pictures are

ﬁltered to reduce blocking artifacts prior to motion estimation

and compensation. A 4x4 DCT-based transform is applied (T

in ﬁgure 6), followed by quantization (Q in ﬁgure 6). There

are two types of entropy encoding in H.264, namely CAVLC

(context adaptive variable length coding) and CABAC (context

adaptive binary arithmetic coding).

NUT Description AVC class SVC class

0 Unspeciﬁed Non-VCL Non-VCL

1 Non-IDR slice VCL VCL

5 IDR slice VCL VCL

6 SEI Non-VCL Non-VCL

12 Filler data Non-VCL Non-VCL

14 Preﬁx NAL Non-VCL Variable

16 . . . 18 Reserved Non-VCL Non-VCL

20 SVC slice Non-VCL VCL

21 . . . 23 Reserved Non-VCL Non-VCL

24 . . . 31 Unspeciﬁed Non-VCL Non-VCL

TABLE I

SELECTED NAL UNIT TYPES.

SPS PPS AVC SVC SVC AVC SVC SVC AVC SVC SVC

SPS Hea der RBSP PPS He ader RBS P AVC Heade r RBSP SVC H eader RBSP SVC Head er RBSP

Fig. 3. A mapping of video data to H.264 SVC NALUs

NRI

NUT

Fig. 4. NAL unit header structure.

. . .

PID

DID

QID

TID

. . .

Fig. 5. NAL unit header SVC extension structure.

Fig. 6. H.264 compression overview

The scalable extension of H.264, referred to as SVC,

employs most of the tools deﬁned in the non-scalable H.264,

referred to as AVC. An SVC bitstream consists of a base layer

and enhancement layers; each enhancement layer improves

the video in one of three “scalability dimensions”, namely

resolution, quality, and time. Therefore each scalable NALU

belongs to a certain dependency layer (most commonly for

resolution-scalability), a certain quality layer (to enable SNR-

scalability), and a certain temporal layer (to enable different

frame rates). The scalability information for SVC is contained

in an SVC extension header succeeding the AVC NALU

header, as shown in ﬁgure 5. Most important are the ﬁelds

DID (dependency id), which indicates that the NALU belongs

to a certain resolution, QID (quality id), which indicates

that the NALU belongs to a certain quality and TID, which

indicates that the NALU belongs to a certain temporal layer,

i.e., commonly a certain frame rate. The value of the PID

(priority id) is not standardized and can be used to enable

very simple adaptation by only taking this ﬁeld into account.

An exemplary mapping between raw video data and NAL units

is illustrated in ﬁgure 3. A resolution and quality scalable

bitstream is shown, the higher resolution is coded with two

quality layers. The NALU header of the base layer has a DID

of 0, a QID of 0 and a TID of 0, while the NALU headers of

the two enhancement layers have a DID of 1, a QID of 0 or

1 and a TID of 0.

The following references give further details on H.264 AVC

[73], [52] and SVC [54]. Of course the standard itself should

be considered as ultimate reference [27] for both formats.

III. MULTIMEDIA ENCRYPTION

The potential application scenarios for multimedia encryp-

tion are diverse and often require speciﬁc functionality of

the video stream, e.g., scalability, to be preserved by the

encryption scheme, such that associated processes, e.g., rate

adaption can be conducted in the encrypted domain. The clas-

sic application scenario of video encryption is in DRM (digital

rights management), more precisely copyright protection, in

which content providers aim to secure their business value,

i.e., they want to prevent uncompensated redistribution of their

content, very frequently videos.

It is also common practise, that content providers, e.g.,

as frequently applied in pay-TV, offer free public access to

parts of their content to attract potential customers. In the

application scenario of transparent encryption (also referred to

as perceptual encryption in literature [34], [41]) the availability

of a public low quality version is a requirement and the threat

is that an attacker is able to compute a reconstruction of the

original content with higher quality then the available public

version (see ﬁgure 1(b)).

Privacy preservation is also a concern in the context of

video encryption, e.g., a commonly referred application is

privacy preserving video surveillance [13]; here the privacy

of the people and objects in the video should be preserved;

analogous problems for still images is currently facing Google

with its StreetView application. The security threat in privacy

preserving surveillance is the identiﬁcation of a human person

or object, e.g., a license plate, in the video, which thus has to

be prevented (see ﬁgure 2).

Video conferences are another prominent application sce-

nario in which video data is encrypted [24].

The secure adaptation of compressed video streams to net-

work conditions, often referred to as secure scalable streaming

(SSS), is a frequently discussed application scenario in the

context of video encryption [4], [3], [77], [5], [19].

Though not a distinct application scenario, mobile comput-

ing is often referred to in the context of multimedia encryption

[24], as the lower performance of mobile devices imposes

strict constraints on the computational complexity, which is

an argument for low-complexity encryption approaches.

A. Security / Quality / Intelligibility

The security notions for video encryption are application-

context dependent. E.g., in a commercial content-distribution

scenario the security notion for video cryptosystems is dif-

ferent to the conventional cryptographic security notion for

cryptosystems. While conventional cryptographic security no-

tions are built upon the notion of message privacy (referred

to as MP-security), i.e., nothing of the plaintext message can

be learnt / computed from the ciphertext, the privacy of the

message (video) is of limited concern for the content providers,

but a redistribution of a sufﬁcient quality version poses the

threat to their business model. The security of the video

cryptosystem has to be deﬁned with respect to this threat, i.e.,

the reconstruction of a sufﬁcient quality version on the basis

of the ciphertext, which leads to a speciﬁc security notion

for multimedia encryption [53], [45], [60], which we refer to

as MQ-security (message quality security)[60] in this paper.

The security requirements of many application scenarios in

the context of video encryption can be pinned down to this

deﬁnition: A video is encrypted and an adversary must not

be capable to compute a reconstruction of the plaintext with

higher quality than allowed in the application scenario. It

is sufﬁcient for DRM scenarios that the quality is severely

reduced, such that a redistribution is prevented. In the context

of privacy-preservation the quality / intelligibility of a video is

measured in terms of recognizability of faces and persons [14].

The MQ-security notion has often been implicitly employed in

literature and similar concepts have been put forward explicitly

[53], [45].

It has to be highlighted that application scenarios require

different quality levels to be protected, the leakage of an

edge image might not be considered a security threat in

a commercial content distribution scenario, but renders the

application in a video conferencing scenario or a privacy-

protection scenario impossible.

Although the privacy of the content is not an objective of

the content provider in the commercial content-distribution

scenario, the customers may have privacy concerns if the

content, e.g., the movie being watched in a VoD scenario

(video on demand), can be identiﬁed especially if the con-

tent is incriminated with social taboos. In the conventional

cryptographic security notion, MP-security, no information

about the plaintext has to be (efﬁciently) computable from the

ciphertext. If one considers the raw video data as plaintexts

the preservation of any information, even the preservation of

the length of the compressed video stream or the length of

units that comprise the compressed video stream constitutes a

security violation, as even the compressibility of the raw video

data leaks information on the raw video data. If encryption is

conducted after compression the compressibility information

is contained in the length of the compressed video data. This

security notion appropriately models the security requirements

of highly conﬁdential video communications, e.g., video con-

ferences in politics and economy. Thus in this very strict

interpretation of MP-security for video cryptosystems (the raw

video data is the plaintext space), the length of the video data

(or packets) must not depend on the raw video data itself. This

has major implications for the compression settings and video

packetization, e.g., in a video conferencing or VoD scenario.

In order to ensure that absolutely no information about the

visual content is computable from the transmitted encrypted

and compressed video data, even the length of the packets

must not depend on the raw video data [19]. This implies

that for this interpretation of MP-security the video has to be

transmitted at a source-independent rate (e.g., constant) and in

a source-independent fashion (e.g., constant packet lengths).

The issue that video streams can be identiﬁed is present even

in “secure” state-of-the-art technology, e.g., in SRTP [19] and

IPsec, as the secure encryption of packets does not conceal

the packet lengths. This strict security notion, which we refer

to as MPV-security, for video cryptosystems conﬂicts with

rate-distortion optimal packetization and an optimal secure

adaption, also referred to as secure scalable streaming (SSS) in

the context of network adaptation [4], [3]. MPV-security does

not at all model the security requirements and threats of many

application scenarios at all, e.g., of perceptual / transparent

encryption and privacy preserving video surveillance.

The preservation of format-compliance could be assumed

to compromise security, recent contributions from the cryp-

tographic community [6] discuss the topic in depth and

deﬁne a concise formal framework and re-formulate the MP-

security notion for format-preserving encryption (MP-security

is deﬁned for equal length format-compliant datums) and also

analyze format-preserving encryption algorithms, which are

proved to be secure. However, there still is a gap between the

theoretic availability and the practical applicability of format-

preserving and secure encryption algorithms [60] and this

security notion is also not applicable for many application

scenarios in the context of multimedia encryption.

Lightweight / Soft / Partial / Selective Encryption: Some

contributions to multimedia encryption propose the application

of less secure but more efﬁcient encryption algorithms (soft

encryption), i.e., the computational complexity to break the

employed cryptosystem with respect to MP-security is limited.

E.g., in [16] it is proposed to employ a weaker cipher (an AES

derivative with fewer rounds) for the less important parts of the

bitstream. Often obviously insecure algorithms are employed

(e.g., adding constants to the coefﬁcients [31]) which also fall

into that category.

Another approach to reduce the computational complexity

of encryption is selective / partial encryption of the bitstream

with a secure cipher [46].

In this paper, we will discuss the schemes in a cipher-

independent fashion, i.e., all encryption proposals will be

considered to employ the same secure cipher as single source

of pseudo-randomness.

B. Preserved Functionality

H.264 bitstreams are associated with functionalities, i.e.,

there are speciﬁed protocols and processes how to store, trans-

mit, and process H.264 bitstreams (e.g., extract parts, adapt

the rate, . . .). Non-scalable and scalable H.264 bitstreams are

accessible via a network abstraction layer (NAL), i.e., the

coded video data is a sequence of separate NAL units (see

ﬁgure 3 for a possible mapping of raw video data to NAL

units).

H.264 bitstreams are embedded into container formats for

transmission and storage [72], [25] and depending on the

encoding settings bitstreams allow certain operations, such

as extraction of IDR-picture (comparable to an I-frame in

previous standards), extraction of a subset of the frames,

cropping and in the case of SVC the extraction of sub-streams

with different spatial resolution, temporal resolution and SNR-

quality. A wide range of watermarking algorithms speciﬁcally

tailored to H.264 have been proposed, e.g., [38], [50], [24],

[80], [79], [47] and a joint application of encryption and

watermarking, especially watermarking encrypted content, is

often desirable [38], [10], [80], [79].

1) Format-compliance: A bitstream is denoted format-

compliant or H.264-compliant, if it sufﬁces the syntax’s

and semantics’s requirements of the H.264 standard [27].

A format-compliant H.264 bitstream has to be accepted by

every H.264-compliant decoder without any undeﬁned de-

coder behaviour. It is necessary to distinguish whether a

functionality is preserved format-compliantly, i.e., standard

processing can be applied and no modiﬁed software is nec-

essary. E.g., functionality is not preserved if the encrypted

parts of the video are signalled as supplementary data, which

has no semantics in the H.264 decoding process comparable

to commentary in programming languages. The encrypted

H.264 bitstreams where the encrypted data is signalled as

supplementary data (e.g., using SEI messages, see table I)

are still format-compliant, but the encrypted video data is

treated completely different compared to plaintext video data.

Thus the application of conventional tools to process the

video data may lead to unexpected and undesired behaviour,

e.g., rate adaptation algorithms may not perform optimal on

the encrypted bitstreams. Thus we say that a functionality is

preserved in a compliant fashion, if exactly the same processes

as for an H.264 bitstream are applicable.

2) Packetization: NAL syntax / structure: The preservation

of the NAL structure and syntax requirements enables the

transparent application of standard container formats and tools

for H.264.

3) Fast Forward / Extraction of Subsequences / Scalability:

NAL semantics: The additional preservation of the NAL

semantics, i.e., the NAL unit type (NUT, see ﬁgure 4) enables

more sophisticated processing of the encrypted bitstream, such

A Survey of H.264 AVC/SVC Encryption

Figures

Citations

National Institute of Standards and Technology における超伝導研究及び生活

An Overview of Digital Video Watermarking

Analysis of One-Time Random Projections for Privacy Preserving Compressed Sensing

An Overview of Information Hiding in H.264/AVC Compressed Video

Secure Watermarking for Multimedia Content Protection: A Review of its Benefits and Open Issues

References

Overview of the H.264/AVC video coding standard

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

The Scalable Video Coding Extension of the H.264/AVC Standard

Rate-constrained coder control and comparison of video coding standards

Advanced video coding for generic audiovisual services

Related Papers (5)

Overview of the H.264/AVC video coding standard

Secure advanced video coding based on selective encryption algorithms

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

Commutative Encryption and Watermarking in Video Compression

Image quality assessment: from error visibility to structural similarity

Frequently Asked Questions (2)

Q1. What are the future works mentioned in the paper "A survey of h.264 avc/svc encryption" ?

Q2. What contributions have the authors mentioned in the paper "A survey of h.264 avc/svc encryption" ?