scispace - formally typeset
Open AccessJournal ArticleDOI

A Survey of H.264 AVC/SVC Encryption

Thomas Stütz, +1 more
- 01 Mar 2012 - 
- Vol. 22, Iss: 3, pp 325-339
Reads0
Chats0
TLDR
This survey summarizes the latest research results on video encryption with a special focus on applicability and on the most widely-deployed video format H.264 including its scalable extension SVC.
Abstract
Video encryption has been heavily researched in the recent years. This survey summarizes the latest research results on video encryption with a special focus on applicability and on the most widely-deployed video format H.264 including its scalable extension SVC. The survey intends to give researchers and practitioners an analytic and critical overview of the state-of-the-art of video encryption narrowed down to its joint application with the H.264 standard suite and associated protocols (packaging/streaming) and processes (transcoding/watermarking).

read more

Content maybe subject to copyright    Report

A Survey of H.264 AVC/SVC Encryption
Thomas St¨utz Andreas Uhl
Technical Report 2010-10 December 2010
Department of Computer Sciences
Jakob-Haringer-Straße 2
5020 Salzburg
Austria
www.cosy.sbg.ac.at
Technical Report Series

1
A Survey of H.264 AVC/SVC Encryption
Thomas St
¨
utz and Andreas Uhl
Abstract—Video encryption has been heavily researched in the
recent years. This survey summarizes the latest research results
on video encryption with a special focus on applicability and
on the most widely-deployed video format H.264 including its
scalable extension SVC. The survey intends to give researchers
and practitioners an analytic and critical overview of the state-of-
the-art of video encryption narrowed down to its joint application
with the H.264 standard suite and associated protocols (packag-
ing / streaming) and processes (transcoding / watermarking).
I. INTRODUCTION
H.264 is the most widely-deployed video compression sys-
tem and has gained a dominance comparable only to JPEG
for image compression. The H.264 standard has also been
extended to allow scalable video coding (as specified in
Annex G [27], referred to as SVC within this work) with a
backwards compatible non-scalable base layer (non-scalable
H.264 bitstreams referred to as AVC in this work). This
extension enables the implementation of advanced application
scenarios with H.264, such as scalable streaming and universal
multimedia access [69]. Given the dominant application of
H.264 as video compression system, the necessity of practical
security tools for H.264 is unquestionable. In this survey we
present an overview, classification and evaluation of the state-
of-the-art of H.264 encryption, a topic to which numerous
proposals that have been made. The survey focuses solely on
H.264 AVC/SVC encryption and intends to give researchers
a brief, yet comprehensive survey and to aid practitioners in
the selection of H.264 encryption algorithms for their specific
application context. Furthermore, the survey identifies the most
relevant research questions in the area of video encryption, that
still need to be answered in order to leverage the deployment
of H.264 encryption.
A secure approach to encrypt H.264, also referred to as
“naive” encryption approach, is to encrypt the entire com-
pressed H.264 bitstream with a secure cipher, e.g., AES [49],
in a secure mode, e.g., CBC (cipher block chaining mode).
There are well-founded reasons not to stick to this approach,
but to apply specifically designed encryption routines:
The implementation of advanced application scenarios,
such as secure adaptation, transparent / perceptual en-
cryption and privacy preserving encryption.
The preservation of properties and functionalities of
the bitstream, such as format-compliance, scalability,
streaming / packetization, fast forward, extraction of
subsequences, transcodablity, watermarking, and error
resilience.
The reduction of computational complexity (especially in
the context of mobile computing).
Secure adaptation requires a scalable bitstream and specific en-
cryption routines that preserve the scalability in the encrypted
SPS PPS AVC SVC SVC
Scalable
Coding
Scalable
Encryption
Secure
Adaptation
SPS PPS AVC SVC SVC
SPS PPS AVC SVC SVC
(a) Secure adaptation
SPS PPS AVC SVC SVC
Scalable
Coding
Transparent
Encryption
Conventional
Decoding
SPS PPS AVC SVC SVC
(b) Transparent encryption
Fig. 1. Secure adaptation and transparent encryption
domain (see figure 1(a) ). Secure adaptation is the basis for
secure scalable streaming [70], where secure adaptation is
employed in a multimedia streaming scenario. A secure stream
for a mobile phone (low bandwidth, low resolution display,
low computing power) and a personal computer (high band-
width, high resolution display, high computing power) can be
generated from the same secure source stream (by secure adap-
tation) without the necessity of the secret key, thus enabling
creator-to-consumer security. Transparent encryption denotes
encryption schemes where a low quality can be decoded
from the ciphertext; this functionality can be implemented
with scalable bitstreams (see figure 1(b)) by encryption of
the enhancement layers. Privacy preserving encryption should
conceal the identify of persons, an exemplary implementation
is shown in figure 2.
The remainder of the paper is structured in the following
way: H.264 is briefly summarized in section II. In section III
application scenarios of video encryption are discussed and

2
(b)
Figure 2: Scrambling for Hall Monitor:
Fig. 2. Privacy preserving encryption: DCT coefficients permutation (figure
taken from [13], figure 2 (b), p.1171)
their corresponding different notions of security are motivated
and defined. The application context also requires that the
video encryption scheme preserves functionality of the video
bitstream; details are discussed in this section as well, which
ends with the presentation of our evaluation criteria for a
video encryption scheme. In section IV H.264 compression
and encryption are discussed jointly in detail. This approach
of presentation was chosen to keep the level of redundancy
low. Having discussed implementation and technical issues of
H.264 video encryption schemes, section V proposes solutions
and discusses the proposed schemes with respect to the secu-
rity and application scenarios. Further research directions are
discussed in section VI and finally we conclude in section VII.
II. OVERVIEW OF H.264 AVC / SVC
The H.264 standard specifies the syntax and semantics of
the bitstream together with a normative decoding process [27].
However, it is often and especially in the context of H.264
encryption more convenient to consider the encoding process.
The raw video data is input to the encoder, the output is the
bitstream in the NAL (network abstraction layer) format, i.e.,
a series of NAL units (see figure 3). The NAL units have a
plaintext header indicating the type of data in AVC as shown
in figure 4 in which the entire H.264 NAL header is outlined.
The NAL header consists of the forbidden zero bit (F), a 3-bit
field signalling importance of the NALU (NRI), and the NAL
unit type (NUT). The most common NUTs are summarized in
table I, NALUs with a unspecified NUT have to be discarded
by the decoder.
These NAL units are commonly packaged in a container
format for transmission and storage. A typical H.264 encoder
has the structure as outlined in figure 6. Important parts are
motion estimation (ME in figure 6) and motion compensation
(MC in figure 6). Novelties in H.264 compared to previous
video coding standards are intra prediction (Intra pred in figure
6) and in-loop deblocking filtering, i.e., reference pictures are
filtered to reduce blocking artifacts prior to motion estimation
and compensation. A 4x4 DCT-based transform is applied (T
in figure 6), followed by quantization (Q in figure 6). There
are two types of entropy encoding in H.264, namely CAVLC
(context adaptive variable length coding) and CABAC (context
adaptive binary arithmetic coding).
NUT Description AVC class SVC class
0 Unspecified Non-VCL Non-VCL
1 Non-IDR slice VCL VCL
5 IDR slice VCL VCL
6 SEI Non-VCL Non-VCL
12 Filler data Non-VCL Non-VCL
14 Prefix NAL Non-VCL Variable
16 . . . 18 Reserved Non-VCL Non-VCL
20 SVC slice Non-VCL VCL
21 . . . 23 Reserved Non-VCL Non-VCL
24 . . . 31 Unspecified Non-VCL Non-VCL
TABLE I
SELECTED NAL UNIT TYPES.
SPS PPS AVC SVC SVC AVC SVC SVC AVC SVC SVC
SPS Hea der RBSP PPS He ader RBS P AVC Heade r RBSP SVC H eader RBSP SVC Head er RBSP
Fig. 3. A mapping of video data to H.264 SVC NALUs
F
NRI
NUT
Fig. 4. NAL unit header structure.
. . .
PID
.
DID
QID
TID
. . .
Fig. 5. NAL unit header SVC extension structure.
Fig. 6. H.264 compression overview

3
The scalable extension of H.264, referred to as SVC,
employs most of the tools defined in the non-scalable H.264,
referred to as AVC. An SVC bitstream consists of a base layer
and enhancement layers; each enhancement layer improves
the video in one of three “scalability dimensions”, namely
resolution, quality, and time. Therefore each scalable NALU
belongs to a certain dependency layer (most commonly for
resolution-scalability), a certain quality layer (to enable SNR-
scalability), and a certain temporal layer (to enable different
frame rates). The scalability information for SVC is contained
in an SVC extension header succeeding the AVC NALU
header, as shown in figure 5. Most important are the fields
DID (dependency id), which indicates that the NALU belongs
to a certain resolution, QID (quality id), which indicates
that the NALU belongs to a certain quality and TID, which
indicates that the NALU belongs to a certain temporal layer,
i.e., commonly a certain frame rate. The value of the PID
(priority id) is not standardized and can be used to enable
very simple adaptation by only taking this field into account.
An exemplary mapping between raw video data and NAL units
is illustrated in figure 3. A resolution and quality scalable
bitstream is shown, the higher resolution is coded with two
quality layers. The NALU header of the base layer has a DID
of 0, a QID of 0 and a TID of 0, while the NALU headers of
the two enhancement layers have a DID of 1, a QID of 0 or
1 and a TID of 0.
The following references give further details on H.264 AVC
[73], [52] and SVC [54]. Of course the standard itself should
be considered as ultimate reference [27] for both formats.
III. MULTIMEDIA ENCRYPTION
The potential application scenarios for multimedia encryp-
tion are diverse and often require specific functionality of
the video stream, e.g., scalability, to be preserved by the
encryption scheme, such that associated processes, e.g., rate
adaption can be conducted in the encrypted domain. The clas-
sic application scenario of video encryption is in DRM (digital
rights management), more precisely copyright protection, in
which content providers aim to secure their business value,
i.e., they want to prevent uncompensated redistribution of their
content, very frequently videos.
It is also common practise, that content providers, e.g.,
as frequently applied in pay-TV, offer free public access to
parts of their content to attract potential customers. In the
application scenario of transparent encryption (also referred to
as perceptual encryption in literature [34], [41]) the availability
of a public low quality version is a requirement and the threat
is that an attacker is able to compute a reconstruction of the
original content with higher quality then the available public
version (see figure 1(b)).
Privacy preservation is also a concern in the context of
video encryption, e.g., a commonly referred application is
privacy preserving video surveillance [13]; here the privacy
of the people and objects in the video should be preserved;
analogous problems for still images is currently facing Google
with its StreetView application. The security threat in privacy
preserving surveillance is the identification of a human person
or object, e.g., a license plate, in the video, which thus has to
be prevented (see figure 2).
Video conferences are another prominent application sce-
nario in which video data is encrypted [24].
The secure adaptation of compressed video streams to net-
work conditions, often referred to as secure scalable streaming
(SSS), is a frequently discussed application scenario in the
context of video encryption [4], [3], [77], [5], [19].
Though not a distinct application scenario, mobile comput-
ing is often referred to in the context of multimedia encryption
[24], as the lower performance of mobile devices imposes
strict constraints on the computational complexity, which is
an argument for low-complexity encryption approaches.
A. Security / Quality / Intelligibility
The security notions for video encryption are application-
context dependent. E.g., in a commercial content-distribution
scenario the security notion for video cryptosystems is dif-
ferent to the conventional cryptographic security notion for
cryptosystems. While conventional cryptographic security no-
tions are built upon the notion of message privacy (referred
to as MP-security), i.e., nothing of the plaintext message can
be learnt / computed from the ciphertext, the privacy of the
message (video) is of limited concern for the content providers,
but a redistribution of a sufficient quality version poses the
threat to their business model. The security of the video
cryptosystem has to be defined with respect to this threat, i.e.,
the reconstruction of a sufficient quality version on the basis
of the ciphertext, which leads to a specific security notion
for multimedia encryption [53], [45], [60], which we refer to
as MQ-security (message quality security)[60] in this paper.
The security requirements of many application scenarios in
the context of video encryption can be pinned down to this
definition: A video is encrypted and an adversary must not
be capable to compute a reconstruction of the plaintext with
higher quality than allowed in the application scenario. It
is sufficient for DRM scenarios that the quality is severely
reduced, such that a redistribution is prevented. In the context
of privacy-preservation the quality / intelligibility of a video is
measured in terms of recognizability of faces and persons [14].
The MQ-security notion has often been implicitly employed in
literature and similar concepts have been put forward explicitly
[53], [45].
It has to be highlighted that application scenarios require
different quality levels to be protected, the leakage of an
edge image might not be considered a security threat in
a commercial content distribution scenario, but renders the
application in a video conferencing scenario or a privacy-
protection scenario impossible.
Although the privacy of the content is not an objective of
the content provider in the commercial content-distribution
scenario, the customers may have privacy concerns if the
content, e.g., the movie being watched in a VoD scenario
(video on demand), can be identified especially if the con-
tent is incriminated with social taboos. In the conventional
cryptographic security notion, MP-security, no information
about the plaintext has to be (efficiently) computable from the

4
ciphertext. If one considers the raw video data as plaintexts
the preservation of any information, even the preservation of
the length of the compressed video stream or the length of
units that comprise the compressed video stream constitutes a
security violation, as even the compressibility of the raw video
data leaks information on the raw video data. If encryption is
conducted after compression the compressibility information
is contained in the length of the compressed video data. This
security notion appropriately models the security requirements
of highly confidential video communications, e.g., video con-
ferences in politics and economy. Thus in this very strict
interpretation of MP-security for video cryptosystems (the raw
video data is the plaintext space), the length of the video data
(or packets) must not depend on the raw video data itself. This
has major implications for the compression settings and video
packetization, e.g., in a video conferencing or VoD scenario.
In order to ensure that absolutely no information about the
visual content is computable from the transmitted encrypted
and compressed video data, even the length of the packets
must not depend on the raw video data [19]. This implies
that for this interpretation of MP-security the video has to be
transmitted at a source-independent rate (e.g., constant) and in
a source-independent fashion (e.g., constant packet lengths).
The issue that video streams can be identified is present even
in “secure” state-of-the-art technology, e.g., in SRTP [19] and
IPsec, as the secure encryption of packets does not conceal
the packet lengths. This strict security notion, which we refer
to as MPV-security, for video cryptosystems conflicts with
rate-distortion optimal packetization and an optimal secure
adaption, also referred to as secure scalable streaming (SSS) in
the context of network adaptation [4], [3]. MPV-security does
not at all model the security requirements and threats of many
application scenarios at all, e.g., of perceptual / transparent
encryption and privacy preserving video surveillance.
The preservation of format-compliance could be assumed
to compromise security, recent contributions from the cryp-
tographic community [6] discuss the topic in depth and
define a concise formal framework and re-formulate the MP-
security notion for format-preserving encryption (MP-security
is defined for equal length format-compliant datums) and also
analyze format-preserving encryption algorithms, which are
proved to be secure. However, there still is a gap between the
theoretic availability and the practical applicability of format-
preserving and secure encryption algorithms [60] and this
security notion is also not applicable for many application
scenarios in the context of multimedia encryption.
Lightweight / Soft / Partial / Selective Encryption: Some
contributions to multimedia encryption propose the application
of less secure but more efficient encryption algorithms (soft
encryption), i.e., the computational complexity to break the
employed cryptosystem with respect to MP-security is limited.
E.g., in [16] it is proposed to employ a weaker cipher (an AES
derivative with fewer rounds) for the less important parts of the
bitstream. Often obviously insecure algorithms are employed
(e.g., adding constants to the coefficients [31]) which also fall
into that category.
Another approach to reduce the computational complexity
of encryption is selective / partial encryption of the bitstream
with a secure cipher [46].
In this paper, we will discuss the schemes in a cipher-
independent fashion, i.e., all encryption proposals will be
considered to employ the same secure cipher as single source
of pseudo-randomness.
B. Preserved Functionality
H.264 bitstreams are associated with functionalities, i.e.,
there are specified protocols and processes how to store, trans-
mit, and process H.264 bitstreams (e.g., extract parts, adapt
the rate, . . .). Non-scalable and scalable H.264 bitstreams are
accessible via a network abstraction layer (NAL), i.e., the
coded video data is a sequence of separate NAL units (see
figure 3 for a possible mapping of raw video data to NAL
units).
H.264 bitstreams are embedded into container formats for
transmission and storage [72], [25] and depending on the
encoding settings bitstreams allow certain operations, such
as extraction of IDR-picture (comparable to an I-frame in
previous standards), extraction of a subset of the frames,
cropping and in the case of SVC the extraction of sub-streams
with different spatial resolution, temporal resolution and SNR-
quality. A wide range of watermarking algorithms specifically
tailored to H.264 have been proposed, e.g., [38], [50], [24],
[80], [79], [47] and a joint application of encryption and
watermarking, especially watermarking encrypted content, is
often desirable [38], [10], [80], [79].
1) Format-compliance: A bitstream is denoted format-
compliant or H.264-compliant, if it suffices the syntax’s
and semantics’s requirements of the H.264 standard [27].
A format-compliant H.264 bitstream has to be accepted by
every H.264-compliant decoder without any undefined de-
coder behaviour. It is necessary to distinguish whether a
functionality is preserved format-compliantly, i.e., standard
processing can be applied and no modified software is nec-
essary. E.g., functionality is not preserved if the encrypted
parts of the video are signalled as supplementary data, which
has no semantics in the H.264 decoding process comparable
to commentary in programming languages. The encrypted
H.264 bitstreams where the encrypted data is signalled as
supplementary data (e.g., using SEI messages, see table I)
are still format-compliant, but the encrypted video data is
treated completely different compared to plaintext video data.
Thus the application of conventional tools to process the
video data may lead to unexpected and undesired behaviour,
e.g., rate adaptation algorithms may not perform optimal on
the encrypted bitstreams. Thus we say that a functionality is
preserved in a compliant fashion, if exactly the same processes
as for an H.264 bitstream are applicable.
2) Packetization: NAL syntax / structure: The preservation
of the NAL structure and syntax requirements enables the
transparent application of standard container formats and tools
for H.264.
3) Fast Forward / Extraction of Subsequences / Scalability:
NAL semantics: The additional preservation of the NAL
semantics, i.e., the NAL unit type (NUT, see figure 4) enables
more sophisticated processing of the encrypted bitstream, such

Citations
More filters
Journal ArticleDOI

An Overview of Digital Video Watermarking

TL;DR: This paper presents a review of the digital video watermarking techniques in which their applications, challenges, and important properties are discussed, and categorizes them based on the domain in which they embed the watermark.
Journal ArticleDOI

Analysis of One-Time Random Projections for Privacy Preserving Compressed Sensing

TL;DR: The results indicate that CS is in general not secure according to cryptographic standards, but may provide a useful built-in data obfuscation layer.
Journal ArticleDOI

An Overview of Information Hiding in H.264/AVC Compressed Video

TL;DR: In this paper, information hiding methods in the H.264/AVC compressed video domain are surveyed and perspectives and recommendations are presented to provide a better understanding of the current trend of information hiding and to identify new opportunities for information hiding in compressed video.
Journal ArticleDOI

Secure Watermarking for Multimedia Content Protection: A Review of its Benefits and Open Issues

TL;DR: The adoption of efficient methods for watermark embedding or detection on data that have been secured in some way provides an elegant way to solve the security concerns of fingerprinting applications.
References
More filters
Journal ArticleDOI

Overview of the H.264/AVC video coding standard

TL;DR: An overview of the technical features of H.264/AVC is provided, profiles and applications for the standard are described, and the history of the standardization process is outlined.
Journal ArticleDOI

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

TL;DR: An overview of the basic concepts for extending H.264/AVC towards SVC are provided and the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.
Journal ArticleDOI

Rate-constrained coder control and comparison of video coding standards

TL;DR: A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the future works mentioned in the paper "A survey of h.264 avc/svc encryption" ?

Thus further research should focus on the development of objective metrics for the assessment of the security of video encryption schemes for the different security and application scenarios. For the assessment of video encryption schemes in the application scenarios of transparent encryption and sufficient encryption state-of-theart objective quality metrics may be suited ( however, this has to be backed up by empirical evidence, i. e., subjective quality evaluation tests [ 58 ] ). For content confidentiality, however, novel intelligibility metrics as well as an evaluation framework for these metrics are needed ( again subjective tests will have to be an integral part ). Further efforts in the area of H. 264 encryption should also consider the standardization of security tools within H. 264. 

This survey summarizes the latest research results on video encryption with a special focus on applicability and on the most widely-deployed video format H. 264 including its scalable extension SVC.