What is the significance of the co-occurrence matrix of DCT coefficients?

For all tested schemes, one of the most influential features of the proposed detection was the co-occurrence matrix of DCT coefficients (5), which is the probability distribution of coefficient pairs from neighboring blocks.

Why is the MB2 method the only one that takes into account inter-block dependencies?

The MB2 method is currently the only JPEG steganographic method that takes into account inter-block dependencies between DCT coefficients by preserving the blockiness, which is an “integral” measure of these dependencies.

Why does MB1 avoid the maximum number of bits?

This is likely because MB1 does not avoid any other coefficients than 0 and its embedding mechanism is guaranteed to embed the maximal number of bits given the fact that marginal statistics of all coefficients must be preserved.

What is the detection reliability of the outGuess algorithm?

The detection reliability is relatively high even for embedding rates as small as 0.05 bpc and the method becomes highly detectable for messages above 0.1 bpc.

What is the surprising fact about the embedding?

One of the most surprising facts revealed by the experiments is that even features based on functionals that are preserved by the embedding may have substantial influence.

What is the reason why the stego image looks like it was originally compressed?

In fact, it has been argued by its authors [13] that the stego image looks as if the cover image was originally compressed with a lower JPEG quality factor.

(Open Access) Feature-Based steganalysis for JPEG images and its implications for future design of steganographic schemes (2004) | Jessica Fridrich

Q: What are the contributions mentioned in the paper "Feature-based steganalysis for jpeg images and its implications for future design of steganographic schemes" ?

In this paper, the authors introduce a new feature-based steganalytic method for JPEG images and use it as a benchmark for comparing JPEG steganographic algorithms and evaluating their embedding mechanisms.

Q: What have the authors stated for future works in "Feature-based steganalysis for jpeg images and its implications for future design of steganographic schemes" ?

Further investigation of this issue will be part of their future research. In the future, the authors also plan to replace the Fisher Linear Discriminant with more sophisticated classifiers, such as Support Vector Machines, to further improve the detection reliability of the proposed steganalytic algorithm. The authors also plan to develop a multiple-class classifier capable of recognizing stego images produced by different embedding algorithms ( steganographic program identification ).

Q: Why do the authors use histograms of low frequency DCT coefficients?

The authors only use histograms of low frequency DCT coefficients because histograms of coefficients from medium and higher frequencies are usually statistically unimportant due to the small number of non-zero coefficients.

Feature-Based Steganalysis for JPEG Images and its

Implications for Future Design of Steganographic

Schemes

Jessica Fridrich

Dept. of Electrical Engineering, SUNY Binghamton, Binghamton, NY

13902-6000, USA

fridrich@binghamton.edu

http://www.ws.binghamton.edu/fridrich

Abstract. In this paper, we introduce a new feature-based steganalytic method

for JPEG images and use it as a benchmark for comparing JPEG steg-

anographic algorithms and evaluating their embedding mechanisms. The detec-

tion method is a linear classifier trained on feature vectors corresponding to

cover and stego images. In contrast to previous blind approaches, the features

are calculated as an L

norm of the difference between a specific macroscopic

functional calculated from the stego image and the same functional obtained

from a decompressed, cropped, and recompressed stego image. The functionals

are built from marginal and joint statistics of DCT coefficients. Because the

features are calculated directly from DCT coefficients, conclusions can be

drawn about the impact of embedding modifications on detectability. Three dif-

ferent steganographic paradigms are tested and compared. Experimental results

reveal new facts about current steganographic methods for JPEGs and new de-

sign principles for more secure JPEG steganography.

1 Introduction

Steganography is the art of invisible communication. Its purpose is to hide the very

presence of communication by embedding messages into innocuous-looking cover

objects. Each steganographic communication system consists of an embedding algo-

rithm and an extraction algorithm. To accommodate a secret message in a digital

image, the original cover image is slightly modified by the embedding algorithm. As

a result, the stego image is obtained.

Steganalysis is the art of discovering hidden data in cover objects. As in cryptana-

lysis, it is assumed that the steganographic method is publicly known with the excep-

tion of a secret key. Steganography is considered secure if the stego-images do not

contain any detectable artifacts due to message embedding. In other words, the set of

stego-images should have the same statistical properties as the set of cover-images. If

there exists an algorithm that can guess whether or not a given image contains a se-

cret message with a success rate better than random guessing, the steganographic

system is considered broken. For a more exact treatment of the concept of steg-

anographic security, the reader is referred to [1,2].

1.1 Steganalytic Methods

Several trends have recently appeared in steganalysis. One of the first general stega-

nalytic methods was the “chi-square attack” by Westfeld [3]. The original version of

this attack could detect sequentially embedded messages and was later generalized to

randomly scattered messages [4,5]. Because this approach is based solely on the first

order statistics and is applicable only to idempotent embedding operations, such as

LSB (Least Significant Bit) flipping, its applicability to modern steganographic

schemes, that are aware of the Cachin criterion [2], is rather limited.

Another major stream in steganalysis is based on the concept of a distinguishing

statistic [6]. In this approach, the steganalyst first carefully inspects the embedding

algorithm and then identifies a quantity (the distinguishing statistics) that changes

predictably with the length of the embedded message, yet one that can be calibrated

for cover images. For JPEG images, this calibration is done by decompressing the

stego image, cropping by a few pixels in each direction, and recompressing using the

same quantization table. The distinguishing statistic calculated from this image is

used as an estimate for the same quantity from the cover image. Using this calibra-

tion, highly accurate and reliable estimation of the embedded message length can be

constructed for many schemes [6]. The detection philosophy is not limited to any

specific type of the embedding operation and works for randomly scattered messages

as well. One disadvantage of this approach is that the detection needs to be custom-

ized to each embedding paradigm and the design of proper distinguishing statistics

cannot be easily automatized.

The third direction in steganalysis is formed by blind classifiers. Pioneered by

Memon and Farid [7,15], a blind detector learns what a typical, unmodified image

looks like in a multi-dimensional feature space. A classifier is then trained to learn the

differences between cover and stego image features. The 72 features proposed by

Farid are calculated in the wavelet decomposition of the stego image as the first four

moments of coefficients and the log error between the coefficients and their globally

optimal linear prediction from neighboring wavelet modes. This methodology com-

bined with a powerful Support Vector Machine classifier gives very impressive re-

sults for most current steganographic schemes. Farid demonstrated a very reliable

detection for J-Steg, both versions of OutGuess, and for F5 (color images only). The

biggest advantage of blind detectors is their potential ability to detect any embedding

scheme and even to classify embedding techniques by their position in the feature

space. Among the disadvantages is that the methodology will always likely be less

accurate than targeted approaches and it may not be possible to accurately estimate

the secret message length, which is an important piece of information for the stegana-

lyst.

Introducing blind detectors prompted further research in steganography. Based on

the previous work of Eggers [8], Tzschoppe [9] constructed a JPEG steganographic

scheme (HPDM) that is undetectable using Farid’s scheme. However, the same

scheme is easily detectable [10] using a single scalar feature – the calibrated spatial

blockiness [6]. This suggests that it should be possible to construct a very powerful

feature-based detector (blind on the class of JPEG images) if we used calibrated

features computed directly in the DCT domain rather than from a somewhat arbitrary

wavelet decomposition. This is the approach taken in this paper.

1.2 Proposed Research

We combine the concept of calibration with the feature-based classification to devise

a blind detector specific to JPEG images. By calculating the features directly in the

JPEG domain rather than in the wavelet domain, it appears that the detection can be

made more sensitive to a wider type of embedding algorithms because the calibration

process (for details, see Sec. 2) increases the features’ sensitivity to the embedding

modifications while suppressing image-to-image variations. Another advantage of

calculating the features in the DCT domain is that it enables more straightforward

interpretation of the influence of individual features on detection as well as easier

formulation of design principles leading to more secure steganography.

The proposed detection can also be viewed as a new approach to the definition of

steganographic security. According to Cachin, a steganographic scheme is considered

secure if the Kullback-Leibler distance between the distribution of stego and cover

images is zero (or small for

-security). Farid’s blind detection is essentially a reflec-

tion of this principle. Farid first determines the statistical model for natural images in

the feature space and then calculates the distance between a specific image and the

statistical model. This “distance” is then used to determine whether the image is a

stego image. In our approach, we change the security model and use the stego image

as a side-information to recover some statistics of the cover image. Instead of measur-

ing the distance between the image and a statistical model, we measure the distance

between certain parameters of the stego image and the same parameters related to the

original image that we succeeded to capture by calibration.

The paper is organized as follows. In the next section, we explain how the features

are calculated and why. In Section 3, we give the details of the detection scheme and

discuss the experimental results for OutGuess [11], F5 [13], and Model Based Steg-

anography [12,14]. Implications for future design of steganographic schemes are

discussed in Section 4. The paper is summarized in Section 5.

2 Calibrated Features

Two types of features will be used in our analysis – first order features and second

order features. Also, some features will be constructed in the DCT domain, while

others in the spatial domain. In the whole paper, scalar quantities will be represented

with a non-bold italic font, while vectors and matrices will always be in bold italics.

The L

norm is defined for a vector (or matrix) as a sum of absolute values of all

vector (or matrix) elements.

All features are constructed in the following manner. A vector functional F is ap-

plied to the stego JPEG image J

. This functional could be the global DCT coefficient

histogram, a co-occurrence matrix, spatial blockiness, etc. The stego image J

is de-

compressed to the spatial domain, cropped by 4 pixels in each direction, and recom-

pressed with the same quantization table as J

to obtain J

. The same vector functional

F is then applied to J

. The final feature f is obtained as an L

norm of the difference

)()(

JJf FF −= . (1)

ixels

||F(J

)– F(J

)||

The logic behind this choice for features is the following. The cropping and recom-

pression should produce a “calibrated” image with most macroscopic features similar

to the original cover image. This is because the cropped stego image is perceptually

similar to the cover image and thus its DCT coefficients should have approximately

the same statistical properties as the cover image. The cropping by 4 pixels is impor-

tant because the 8×8 grid of recompression “does not see” the previous JPEG com-

pression and thus the obtained DCT coefficients are not influenced by previous quan-

tization (and embedding) in the DCT domain. One can think of the cropped

/recompressed image as an approximation to the cover image or as a side-

information. The use of the calibrated image as a side-information has proven very

useful for design of very accurate targeted steganalytic methods in the past [6].

2.1 First Order Features

The simplest first order statistic of DCT coefficients is their histogram. Suppose the

stego JPEG file is represented with a DCT coefficient array d

(i, j) and the quantiza-

tion matrix Q(i, j), i, j = 1,…,8, k = 1, …, B. The symbol d

(i, j) denotes the (i, j)-th

quantized DCT coefficient in the k-th block (there are total of B blocks). The global

histogram of all 64k DCT coefficients will be denoted as H

, where r = L, …, R, L =

min

k,i,j

(i, j) and R = max

k,i,j

(i, j).

There are steganographic programs that preserve H [8,10,11]. However, the

schemes in [8,9,11] only preserve the global histogram and not necessarily histo-

grams of individual DCT modes. Thus, we add individual histograms for low fre-

quency DCT modes to our set of functionals. For a fixed DCT mode (i, j), let

, r =

L, …, R, denote the individual histogram of values d

(i, j), k = 1, …, B. We only use

histograms of low frequency DCT coefficients because histograms of coefficients

from medium and higher frequencies are usually statistically unimportant due to the

small number of non-zero coefficients.

To provide additional first order macroscopic statistics to our set of functionals, we

have decided to include “dual histograms”. For a fixed coefficient value d, the dual

histogram is an 8×8 matrix

∑

jiddg

)),(,(

, (2)

where

(u,v)=1 if u=v and 0 otherwise. In words, is the number of how many

times the value

d occurs as the (i, j)-th DCT coefficient over all B blocks in the JPEG

image. The dual histogram captures how a given coefficient value

d is distributed

among different DCT modes. Obviously, if a steganographic method preserves all

individual histograms, it also preserves all dual histograms and vice versa.

2.2 Second Order Features

If the corresponding DCT coefficients from different blocks were independent, then

any embedding scheme that preserves the first order statistics – the histogram –

would be undetectable by Cachin’s definition of steganographic security [2]. How-

ever, because natural images can exhibit higher-order correlations over distances

larger than 8 pixels, individual DCT modes from neighboring blocks are not inde-

pendent. Thus, it makes sense to use features that capture inter-block dependencies

because they will likely be violated by most steganographic algorithms.

Let

and I

denote the vectors of block indices while scanning the image “by

rows” and “by columns”, respectively. The first functional capturing inter-block de-

pendency is the “variation”

V defined as

||||

|),(),(| |),(),(|

1||

)1()(

1||

)1()(

kIkI

jidjidjidjid

−+−

∑∑∑∑

−

. (3)

Most steganographic techniques in some sense add entropy to the array of quantized

DCT coefficients and thus are more likely to increase the variation

V than decrease.

Embedding changes are also likely to increase the discontinuities along the 8×8

block boundaries. In fact, this property has proved very useful in steganalysis in the

past [6,10,12]. Thus, we include two blockiness measures

= 1, 2, to our set of

functionals. The blockiness is calculated from the decompressed JPEG image and

thus represents an “integral measure” of inter-block dependency over all DCT modes

over the whole image:

Feature-Based steganalysis for JPEG images and its implications for future design of steganographic schemes

Figures

Citations

Digital Watermarking and Steganography

Rich Models for Steganalysis of Digital Images

Ensemble Classifiers for Steganalysis of Digital Media

Steganalysis by Subtractive Pixel Adjacency Matrix

Steganography in Digital Media: Principles, Algorithms, and Applications

References

On the limits of steganography

F5-A Steganographic Algorithm

Attacks on Steganographic Systems

An Information-Theoretic Model for Steganography

Defending against statistical steganalysis

Related Papers (5)

F5-A Steganographic Algorithm

Merging Markov and DCT Features for Multi-Class JPEG Steganalysis

Defending against statistical steganalysis

Model-Based Steganography

Steganalysis using image quality metrics

Frequently Asked Questions (10)

Q1. What are the contributions mentioned in the paper "Feature-based steganalysis for jpeg images and its implications for future design of steganographic schemes" ?

Q2. What have the authors stated for future works in "Feature-based steganalysis for jpeg images and its implications for future design of steganographic schemes" ?

Q3. Why do the authors use histograms of low frequency DCT coefficients?

Q4. What is the significance of the co-occurrence matrix of DCT coefficients?

Q5. What is the biggest advantage of blind detectors?

Q6. Why is the MB2 method the only one that takes into account inter-block dependencies?

Q7. Why does MB1 avoid the maximum number of bits?

Q8. What is the detection reliability of the outGuess algorithm?

Q9. What is the surprising fact about the embedding?

Q10. What is the reason why the stego image looks like it was originally compressed?