scispace - formally typeset
Open AccessBook ChapterDOI

Feature-Based steganalysis for JPEG images and its implications for future design of steganographic schemes

TLDR
In this article, a feature-based steganalytic method for JPEG images is proposed, where the features are calculated as an L 1 norm of the difference between a specific macroscopic functional calculated from the stego image and the same functional obtained from a decompressed, cropped, and recompressed stegos image.
Abstract
In this paper, we introduce a new feature-based steganalytic method for JPEG images and use it as a benchmark for comparing JPEG steganographic algorithms and evaluating their embedding mechanisms. The detection method is a linear classifier trained on feature vectors corresponding to cover and stego images. In contrast to previous blind approaches, the features are calculated as an L1 norm of the difference between a specific macroscopic functional calculated from the stego image and the same functional obtained from a decompressed, cropped, and recompressed stego image. The functionals are built from marginal and joint statistics of DCT coefficients. Because the features are calculated directly from DCT coefficients, conclusions can be drawn about the impact of embedding modifications on detectability. Three different steganographic paradigms are tested and compared. Experimental results reveal new facts about current steganographic methods for JPEGs and new de-sign principles for more secure JPEG steganography.

read more

Content maybe subject to copyright    Report

Feature-Based Steganalysis for JPEG Images and its
Implications for Future Design of Steganographic
Schemes
Jessica Fridrich
Dept. of Electrical Engineering, SUNY Binghamton, Binghamton, NY
13902-6000, USA
fridrich@binghamton.edu
http://www.ws.binghamton.edu/fridrich
Abstract. In this paper, we introduce a new feature-based steganalytic method
for JPEG images and use it as a benchmark for comparing JPEG steg-
anographic algorithms and evaluating their embedding mechanisms. The detec-
tion method is a linear classifier trained on feature vectors corresponding to
cover and stego images. In contrast to previous blind approaches, the features
are calculated as an L
1
norm of the difference between a specific macroscopic
functional calculated from the stego image and the same functional obtained
from a decompressed, cropped, and recompressed stego image. The functionals
are built from marginal and joint statistics of DCT coefficients. Because the
features are calculated directly from DCT coefficients, conclusions can be
drawn about the impact of embedding modifications on detectability. Three dif-
ferent steganographic paradigms are tested and compared. Experimental results
reveal new facts about current steganographic methods for JPEGs and new de-
sign principles for more secure JPEG steganography.
1 Introduction
Steganography is the art of invisible communication. Its purpose is to hide the very
presence of communication by embedding messages into innocuous-looking cover
objects. Each steganographic communication system consists of an embedding algo-
rithm and an extraction algorithm. To accommodate a secret message in a digital
image, the original cover image is slightly modified by the embedding algorithm. As
a result, the stego image is obtained.
Steganalysis is the art of discovering hidden data in cover objects. As in cryptana-
lysis, it is assumed that the steganographic method is publicly known with the excep-
tion of a secret key. Steganography is considered secure if the stego-images do not
contain any detectable artifacts due to message embedding. In other words, the set of
stego-images should have the same statistical properties as the set of cover-images. If
there exists an algorithm that can guess whether or not a given image contains a se-
cret message with a success rate better than random guessing, the steganographic

system is considered broken. For a more exact treatment of the concept of steg-
anographic security, the reader is referred to [1,2].
1.1 Steganalytic Methods
Several trends have recently appeared in steganalysis. One of the first general stega-
nalytic methods was the “chi-square attack” by Westfeld [3]. The original version of
this attack could detect sequentially embedded messages and was later generalized to
randomly scattered messages [4,5]. Because this approach is based solely on the first
order statistics and is applicable only to idempotent embedding operations, such as
LSB (Least Significant Bit) flipping, its applicability to modern steganographic
schemes, that are aware of the Cachin criterion [2], is rather limited.
Another major stream in steganalysis is based on the concept of a distinguishing
statistic [6]. In this approach, the steganalyst first carefully inspects the embedding
algorithm and then identifies a quantity (the distinguishing statistics) that changes
predictably with the length of the embedded message, yet one that can be calibrated
for cover images. For JPEG images, this calibration is done by decompressing the
stego image, cropping by a few pixels in each direction, and recompressing using the
same quantization table. The distinguishing statistic calculated from this image is
used as an estimate for the same quantity from the cover image. Using this calibra-
tion, highly accurate and reliable estimation of the embedded message length can be
constructed for many schemes [6]. The detection philosophy is not limited to any
specific type of the embedding operation and works for randomly scattered messages
as well. One disadvantage of this approach is that the detection needs to be custom-
ized to each embedding paradigm and the design of proper distinguishing statistics
cannot be easily automatized.
The third direction in steganalysis is formed by blind classifiers. Pioneered by
Memon and Farid [7,15], a blind detector learns what a typical, unmodified image
looks like in a multi-dimensional feature space. A classifier is then trained to learn the
differences between cover and stego image features. The 72 features proposed by
Farid are calculated in the wavelet decomposition of the stego image as the first four
moments of coefficients and the log error between the coefficients and their globally
optimal linear prediction from neighboring wavelet modes. This methodology com-
bined with a powerful Support Vector Machine classifier gives very impressive re-
sults for most current steganographic schemes. Farid demonstrated a very reliable
detection for J-Steg, both versions of OutGuess, and for F5 (color images only). The
biggest advantage of blind detectors is their potential ability to detect any embedding
scheme and even to classify embedding techniques by their position in the feature
space. Among the disadvantages is that the methodology will always likely be less
accurate than targeted approaches and it may not be possible to accurately estimate
the secret message length, which is an important piece of information for the stegana-
lyst.
Introducing blind detectors prompted further research in steganography. Based on
the previous work of Eggers [8], Tzschoppe [9] constructed a JPEG steganographic
scheme (HPDM) that is undetectable using Farid’s scheme. However, the same

scheme is easily detectable [10] using a single scalar feature – the calibrated spatial
blockiness [6]. This suggests that it should be possible to construct a very powerful
feature-based detector (blind on the class of JPEG images) if we used calibrated
features computed directly in the DCT domain rather than from a somewhat arbitrary
wavelet decomposition. This is the approach taken in this paper.
1.2 Proposed Research
We combine the concept of calibration with the feature-based classification to devise
a blind detector specific to JPEG images. By calculating the features directly in the
JPEG domain rather than in the wavelet domain, it appears that the detection can be
made more sensitive to a wider type of embedding algorithms because the calibration
process (for details, see Sec. 2) increases the features’ sensitivity to the embedding
modifications while suppressing image-to-image variations. Another advantage of
calculating the features in the DCT domain is that it enables more straightforward
interpretation of the influence of individual features on detection as well as easier
formulation of design principles leading to more secure steganography.
The proposed detection can also be viewed as a new approach to the definition of
steganographic security. According to Cachin, a steganographic scheme is considered
secure if the Kullback-Leibler distance between the distribution of stego and cover
images is zero (or small for
ε
-security). Farid’s blind detection is essentially a reflec-
tion of this principle. Farid first determines the statistical model for natural images in
the feature space and then calculates the distance between a specific image and the
statistical model. This “distance” is then used to determine whether the image is a
stego image. In our approach, we change the security model and use the stego image
as a side-information to recover some statistics of the cover image. Instead of measur-
ing the distance between the image and a statistical model, we measure the distance
between certain parameters of the stego image and the same parameters related to the
original image that we succeeded to capture by calibration.
The paper is organized as follows. In the next section, we explain how the features
are calculated and why. In Section 3, we give the details of the detection scheme and
discuss the experimental results for OutGuess [11], F5 [13], and Model Based Steg-
anography [12,14]. Implications for future design of steganographic schemes are
discussed in Section 4. The paper is summarized in Section 5.
2 Calibrated Features
Two types of features will be used in our analysis – first order features and second
order features. Also, some features will be constructed in the DCT domain, while
others in the spatial domain. In the whole paper, scalar quantities will be represented
with a non-bold italic font, while vectors and matrices will always be in bold italics.
The L
1
norm is defined for a vector (or matrix) as a sum of absolute values of all
vector (or matrix) elements.

All features are constructed in the following manner. A vector functional F is ap-
plied to the stego JPEG image J
1
. This functional could be the global DCT coefficient
histogram, a co-occurrence matrix, spatial blockiness, etc. The stego image J
1
is de-
compressed to the spatial domain, cropped by 4 pixels in each direction, and recom-
pressed with the same quantization table as J
1
to obtain J
2
. The same vector functional
F is then applied to J
2
. The final feature f is obtained as an L
1
norm of the difference
1
)()(
21
L
JJf FF = . (1)
J
1
J
2
4
p
ixels
F
F
||F(J
1
)– F(J
2
)||
The logic behind this choice for features is the following. The cropping and recom-
pression should produce a “calibrated” image with most macroscopic features similar
to the original cover image. This is because the cropped stego image is perceptually
similar to the cover image and thus its DCT coefficients should have approximately
the same statistical properties as the cover image. The cropping by 4 pixels is impor-
tant because the 8×8 grid of recompression “does not see” the previous JPEG com-
pression and thus the obtained DCT coefficients are not influenced by previous quan-
tization (and embedding) in the DCT domain. One can think of the cropped
/recompressed image as an approximation to the cover image or as a side-
information. The use of the calibrated image as a side-information has proven very
useful for design of very accurate targeted steganalytic methods in the past [6].
2.1 First Order Features
The simplest first order statistic of DCT coefficients is their histogram. Suppose the
stego JPEG file is represented with a DCT coefficient array d
k
(i, j) and the quantiza-
tion matrix Q(i, j), i, j = 1,…,8, k = 1, …, B. The symbol d
k
(i, j) denotes the (i, j)-th
quantized DCT coefficient in the k-th block (there are total of B blocks). The global
histogram of all 64k DCT coefficients will be denoted as H
r
, where r = L, …, R, L =
min
k,i,j
d
k
(i, j) and R = max
k,i,j
d
k
(i, j).
There are steganographic programs that preserve H [8,10,11]. However, the
schemes in [8,9,11] only preserve the global histogram and not necessarily histo-
grams of individual DCT modes. Thus, we add individual histograms for low fre-
quency DCT modes to our set of functionals. For a fixed DCT mode (i, j), let
, r =
ij
r
h

L, …, R, denote the individual histogram of values d
k
(i, j), k = 1, …, B. We only use
histograms of low frequency DCT coefficients because histograms of coefficients
from medium and higher frequencies are usually statistically unimportant due to the
small number of non-zero coefficients.
To provide additional first order macroscopic statistics to our set of functionals, we
have decided to include “dual histograms”. For a fixed coefficient value d, the dual
histogram is an 8×8 matrix
d
ij
g
=
=
B
k
k
d
ij
jiddg
1
)),(,(
δ
, (2)
where
δ
(u,v)=1 if u=v and 0 otherwise. In words, is the number of how many
times the value
d occurs as the (i, j)-th DCT coefficient over all B blocks in the JPEG
image. The dual histogram captures how a given coefficient value
d is distributed
among different DCT modes. Obviously, if a steganographic method preserves all
individual histograms, it also preserves all dual histograms and vice versa.
d
ij
g
2.2 Second Order Features
If the corresponding DCT coefficients from different blocks were independent, then
any embedding scheme that preserves the first order statistics – the histogram
would be undetectable by Cachin’s definition of steganographic security [2]. How-
ever, because natural images can exhibit higher-order correlations over distances
larger than 8 pixels, individual DCT modes from neighboring blocks are not inde-
pendent. Thus, it makes sense to use features that capture inter-block dependencies
because they will likely be violated by most steganographic algorithms.
Let
I
r
and I
c
denote the vectors of block indices while scanning the image “by
rows” and “by columns”, respectively. The first functional capturing inter-block de-
pendency is the “variation”
V defined as
||||
|),(),(| |),(),(|
8
1,
1||
1
)1()(
8
1,
1||
1
)1()(
cr
ji
I
k
kIkI
ji
I
k
kIkI
II
jidjidjidjid
V
c
cc
r
rr
+
+
=
∑∑∑∑
=
=
+
=
=
+
. (3)
Most steganographic techniques in some sense add entropy to the array of quantized
DCT coefficients and thus are more likely to increase the variation
V than decrease.
Embedding changes are also likely to increase the discontinuities along the 8×8
block boundaries. In fact, this property has proved very useful in steganalysis in the
past [6,10,12]. Thus, we include two blockiness measures
B
α
,
α
= 1, 2, to our set of
functionals. The blockiness is calculated from the decompressed JPEG image and
thus represents an “integral measure” of inter-block dependency over all DCT modes
over the whole image:

Citations
More filters
Book

Digital Watermarking and Steganography

TL;DR: This new edition now contains essential information on steganalysis and steganography, and digital watermark embedding is given a complete update with new processes and applications.
Journal ArticleDOI

Rich Models for Steganalysis of Digital Images

TL;DR: A novel general strategy for building steganography detectors for digital images by assembling a rich model of the noise component as a union of many diverse submodels formed by joint distributions of neighboring samples from quantized image noise residuals obtained using linear and nonlinear high-pass filters.
Journal ArticleDOI

Ensemble Classifiers for Steganalysis of Digital Media

TL;DR: This paper proposes an alternative and well-known machine learning tool-ensemble classifiers implemented as random forests-and argues that they are ideally suited for steganalysis.
Journal ArticleDOI

Steganalysis by Subtractive Pixel Adjacency Matrix

TL;DR: A method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is least significant bit (LSB) matching.
Book

Steganography in Digital Media: Principles, Algorithms, and Applications

TL;DR: This clear, self-contained guide shows you how to understand the building blocks of covert communication in digital media files and how to apply the techniques in practice, including those of steganalysis, the detection of Steganography.
References
More filters
Journal ArticleDOI

On the limits of steganography

TL;DR: It is shown that public key information hiding systems exist, and are not necessarily constrained to the case where the warden is passive, and the use of parity checks to amplify covertness and provide public key steganography.
Book ChapterDOI

F5-A Steganographic Algorithm

TL;DR: The newly developed algorithm F5 withstands visual and statistical attacks, yet it still offers a large steganographic capacity because it implements matrix encoding to improve the efficiency of embedding and reduces the number of necessary changes.
Book ChapterDOI

Attacks on Steganographic Systems

TL;DR: In this paper, the authors present both visual and statistical attacks, making use of the ability of humans to clearly discern between noise and visual patterns, and automate statistical attacks which are much easier to automate.
Book ChapterDOI

An Information-Theoretic Model for Steganography

TL;DR: An information-theoretic model for steganography with passive adversaries is proposed and several secure steganographic schemes are presented; one of them is a universal information hiding scheme based on universal data compression techniques that requires no knowledge of the covertext statistics.
Proceedings Article

Defending against statistical steganalysis

TL;DR: Improved methods for information hiding are presented and an a priori estimate is presented to determine the amount of data that can be hidden in the image while still being able to maintain frequency count based statistics.
Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "Feature-based steganalysis for jpeg images and its implications for future design of steganographic schemes" ?

In this paper, the authors introduce a new feature-based steganalytic method for JPEG images and use it as a benchmark for comparing JPEG steganographic algorithms and evaluating their embedding mechanisms. 

Further investigation of this issue will be part of their future research. In the future, the authors also plan to replace the Fisher Linear Discriminant with more sophisticated classifiers, such as Support Vector Machines, to further improve the detection reliability of the proposed steganalytic algorithm. The authors also plan to develop a multiple-class classifier capable of recognizing stego images produced by different embedding algorithms ( steganographic program identification ). 

The authors only use histograms of low frequency DCT coefficients because histograms of coefficients from medium and higher frequencies are usually statistically unimportant due to the small number of non-zero coefficients. 

For all tested schemes, one of the most influential features of the proposed detection was the co-occurrence matrix of DCT coefficients (5), which is the probability distribution of coefficient pairs from neighboring blocks. 

The biggest advantage of blind detectors is their potential ability to detect any embedding scheme and even to classify embedding techniques by their position in the feature space. 

The MB2 method is currently the only JPEG steganographic method that takes into account inter-block dependencies between DCT coefficients by preserving the blockiness, which is an “integral” measure of these dependencies. 

This is likely because MB1 does not avoid any other coefficients than 0 and its embedding mechanism is guaranteed to embed the maximal number of bits given the fact that marginal statistics of all coefficients must be preserved. 

The detection reliability is relatively high even for embedding rates as small as 0.05 bpc and the method becomes highly detectable for messages above 0.1 bpc. 

One of the most surprising facts revealed by the experiments is that even features based on functionals that are preserved by the embedding may have substantial influence. 

In fact, it has been argued by its authors [13] that the stego image looks as if the cover image was originally compressed with a lower JPEG quality factor.