What have the authors stated for future works in "Steganalysis by subtractive pixel adjacency matrix" ?

In their future work, the authors would like to use the SPAM features to detect other steganographic algorithms for spatial domain, namely LSB embedding, and to investigate the limits of steganography in the spatial domain to determine the maximal secure payload for current spatial-domain embedding methods. Another direction worth pursuing is to use the third-order Markov chain in combination with feature selection to further improve the accuracy of steganalysis.

Why do the authors believe the simple filter is superior to more complex filters?

The authors believe that the superior accuracy of the simple filter [−1,+1] is because it does not distort the stego noise as more complex filters do.

What is the effect of the curse of dimensionality?

In their paper, the authors show that there is a great performance benefit in using higher-order models without running into the curse of dimensionality.

What is the way to improve the accuracy of steganography?

Another direction worth pursuing is to use the third-order Markov chain in combination with feature selection to further improve the accuracy of steganalysis.

How many features were shared between all four databases?

At the same time, one must be aware that the feature selection is database-dependent as only 114 out of 200 best features were shared between all four databases.

What is the difference between neighboring pixels?

The local dependences between differences of neighboring pixels are modeled as a Markov chain, whose sample probability transition matrix is taken as a feature vector for steganalysis.

(Open Access) Steganalysis by Subtractive Pixel Adjacency Matrix (2010) | Tomas Pevny

Q: What have the authors contributed in "Steganalysis by subtractive pixel adjacency matrix" ?

This paper presents a novel method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is LSB matching. The steganalyzer achieves superior accuracy with respect to prior art and provides stable results across various cover sources. Since the feature set based on second-order Markov chain is highdimensional, the authors address the issue of curse of dimensionality using a feature selection algorithm and show that the curse did not occur in their experiments.

Q: How many bins in an 8-bit grayscale image?

The curse of dimensionality may be encountered even for the histogram of pixel pairs in an 8-bit grayscale image (2562 = 65536 bins).

Steganalysis by Subtractive Pixel Adjacency Matrix

Tomáš Pevný

INPG - Gipsa-Lab

46 avenue Félix Viallet

Grenoble cedex 38031

France

pevnak@gmail.com

Patrick Bas

INPG - Gipsa-Lab

46 avenue Félix Viallet

Grenoble cedex 38031

France

patrick.bas@gipsa-

lab.inpg.fr

Jessica Fridrich

Binghamton University

Department of ECE

Binghamton, NY, 13902-6000

001 607 777 6177

fridrich@binghamton.edu

ABSTRACT

This paper presents a novel method for detection of stegano-

graphic methods that embed in the spatial domain by adding

a low-amplitude independent stego signal, an example of

which is LSB matching. First, arguments are provided for

modeling diﬀerences between adjacent pixels using ﬁrst-order

and second-order Markov chains. Subsets of sample tran-

sition probability matrices are then used as features for a

steganalyzer implemented by support vector machines. The

accuracy of the presented steganalyzer is evaluated on LSB

matching and four diﬀerent databases. The steganalyzer

achieves superior accuracy with respect to prior art and

provides stable results across various cover sources. Since

the feature set based on second-order Markov chain is high-

dimensional, we address the issue of curse of dimensionality

using a feature selection algorithm and show that the curse

did not occur in our experiments.

Categories and Subject Descriptors

D.2.11 [Software Engineering]: Software Architectures—

information hiding

General Terms

Security, Algorithms

Keywords

Steganalysis, LSB matching, ±1 embedding

1. INTRODUCTION

A large number of practical steganographic algorithms

perform embedding by applying a mutually independent em-

bedding operation to all or selected elements of the cover [7].

The eﬀect of embedding is equivalent to adding to the co-

ver an independent noise-like signal called stego noise. The

weakest method that falls under this paradigm is the Least

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

MM&Sec’09, September 7–8, 2009, Princeton, New Jersey, USA.

Signiﬁcant Bit (LSB) embedding in which LSBs of individ-

ual cover elements are replaced with message bits. In this

case, the stego noise depends on cover elements and the em-

bedding operation is LSB ﬂipping, which is asymmetrical. It

is exactly this asymmetry that makes LSB embedding eas-

ily detectable [14, 16, 17]. A trivial modiﬁcation of LSB

embedding is LSB matching (also called ±1 embedding),

which randomly increases or decreases pixel values by one

to match the LSBs with the communicated message bits.

Although both steganographic schemes are very similar in

that the cover elements are changed by at most one and the

message is read from LSBs, LSB matching is much harder

to detect. Moreover, while the accuracy of LSB stegana-

lyzers is only moderately sensitive to the cover source, most

current detectors of LSB matching exhibit performance that

can signiﬁcantly vary over diﬀerent cover sources [18, 4].

One of the ﬁrst detectors for embedding by noise adding

used the center of gravity of the histogram characteristic

function [10, 15, 19]. A quantitative steganalyzer of LSB

matching based on maximum likelihood estimation of the

change rate was described in [23]. Alternative methods em-

ploying machine learning classiﬁers used features extracted

as moments of noise residuals in the wavelet domain [11,

8] and from statistics of Amplitudes of Local Extrema in

the graylevel histogram [5] (further called ALE detector).

A recently published experimental comparison of these de-

tectors [18, 4] shows that the Wavelet Absolute Moments

(WAM) steganalyzer [8] is the most accurate and versatile

and oﬀers good overall performance on diverse images.

The heuristic behind embedding by noise adding is based

on the fact that during image acquisition many noise sources

are superimposed on the acquired image, such as the shot

noise, readout noise, ampliﬁer noise, etc. In the literature

on digital imaging sensors, these combined noise sources are

usually modeled as an iid signal largely independent of the

content. While this is true for the raw sensor output, sub-

sequent in-camera processing, such as color interpolation,

denoising, color correction, and ﬁltering, creates complex

dependences in the noise component of neighboring pixels.

These dependences are violated by steganographic embed-

ding because the stego noise is an iid sequence independent

of the cover image. This opens the door to possible attacks.

Indeed, most steganalysis methods in one way or another

try to use these dependences to detect the presence of the

stego noise.

The steganalysis method described in this paper exploits

the fact that embedding by noise adding alters dependences

between pixels. By modeling the diﬀerences between adja-

cent pixels in natural images, we identify deviations from

this model and postulate that such deviations are due to

steganographic embedding. The steganalyzer is constructed

as follows. A ﬁlter suppressing the image content and ex-

posing the stego noise is applied. Dependences between

neighboring pixels of the ﬁltered image (noise residuals) are

modeled as a higher-order Markov chain. The sample tran-

sition probability matrix is then used as a vector feature

for a feature-based steganalyzer implemented using machine

learning algorithms. Based on experiments, the steganalyzer

is signiﬁcantly more accurate than prior art.

The idea to model dependences between neighboring pix-

els by Markov chain appeared for the ﬁrst time in [24]. It

was then further improved to model pixel diﬀerences instead

of pixel values in [26]. In our paper, we show that there is

a great performance beneﬁt in using higher-order models

without running into the curse of dimensionality.

This paper is organized as follows. Section 2 explains

the ﬁlter used to suppress the image content and expose

the stego noise. Then, the features used for steganalysis

are introduced as the sample transition probability matrix

of a higher-order Markov model of the ﬁltered image. The

subsequent Section 3 experimentally compares several ste-

ganalyzers diﬀering by the order of the Markov model, its

parameters, and the implementation of the support vector

machine (SVM) classiﬁer. This section also compares the

results with prior art. In Section 4, we use a simple feature

selection method to show that our results were not aﬀected

by the curse of dimensionality. The paper is concluded in

Section 5.

2. SUBTRACTIVE PIXEL ADJACENCY MA-

TRIX

2.1 Rationale

In principle, higher-order dependences between pixels in

natural images can be modeled by histograms of pairs, triples,

or larger groups of neighboring pixels. However, these his-

tograms possess several unfavorable aspects that make them

diﬃcult to be used directly as features for steganalysis:

1. The number of bins in the histograms grows exponen-

tially with the number of pixels. The curse of dimen-

sionality may be encountered even for the histogram of

pixel pairs in an 8-bit grayscale image (256

= 65536

bins).

2. The estimates of some bins may be noisy because they

have a very low probability of occurrence, such as com-

pletely black and completely white pixels next to each

other.

3. It is rather diﬃcult to ﬁnd a statistical model for pixel

groups because their statistics are inﬂuenced by the

image content. By working with the noise component

of images, which contains the most energy of the stego

noise signal, we increase the SNR and, at the same

time, obtain a tighter model.

The second point indicates that a good model should cap-

ture those characteristics of images that can be robustly es-

timated. The third point indicates that some pre-processing

or calibration should be applied to increase the SNR, such

as working with a noise residual as in WAM [8].

2 · 10

−6

6 · 10

−6

1 · 10

−5

4 · 10

−5

1 · 10

−4

3 · 10

−4

9 · 10

−4

2 · 10

−3

6 · 10

−3

1 · 10

−2

100

150

200

250

i,j

i,j+1

Figure 1: Distribution of two horizontally adjacent

pixels (I

i,j

, I

i,j+1

) in 8-bit grayscale images estimated

from ≈ 10000 images from the BOWS2 database (see

Section 3 for more details about the database). The

degree of gray at (x, y) is the probability P (I

i,j

x ∧ I

i,j+1

= y).

Representing a grayscale m × n image with a matrix

i,j

∈ N, i ∈ {1, . . . , m}, j ∈ {1, . . . , n}} ,

N = {0, 1, 2, . . .},

Figure 1 shows the distribution of two horizontally adjacent

pixels (I

i,j

, I

i,j+1

) estimated from ≈ 10000 8-bit grayscale

images from the BOWS2 database. The histogram can be

accurately estimated only along the “ridge” that follows the

minor diagonal. A closer inspection of Figure 1 reveals that

the shape of this ridge (along the horizontal or vertical axis)

is approximately constant across the grayscale values. This

indicates that pixel-to-pixel dependences in natural images

can be modeled by the shape of this ridge, which is, in turn,

determined by the distribution of diﬀerences I

i,j+1

− I

i,j

between neighboring pixels.

By modeling local dependences in natural images using

the diﬀerences I

i,j+1

− I

i,j

, our model assumes that the dif-

ferences I

i,j+1

− I

i,j

are independent of I

i,j

. In other words,

for r = k − l

P (I

i,j+1

= k ∧ I

i,j

= l) ≈ P (I

i,j+1

− I

i,j

= r)P (I

i,j

= l).

This “diﬀerence” model can be seen as a simpliﬁed version of

the model of two neighboring pixels, since the co-occurence

matrix of two adjacent pixels has 65536 bins, while the his-

togram of diﬀerences has only 511 bins. The diﬀerences

suppress the image content because the diﬀerence array is

essentially a high-pass-ﬁltered version of the image (see be-

low). By replacing the full neighborhood model by the sim-

pliﬁed diﬀerence model, the information loss is likely to be

small because the mutual information between the diﬀerence

i,j+1

−I

i,j

and I

i,j

estimated from ≈ 10800 grayscale images

−20 −10 0 10 20

5 · 10

−2

0.1

0.15

0.2

0.25

Value of diﬀerence

Probability of diﬀence

Figure 2: Histogram of diﬀerences of two adjacent

pixels, I

i,j+1

− I

i,j

, in the range [−20, 20] calculated

over ≈ 10800 grayscale images from the BOWS2

database.

in the BOWS2 database is 7.615 · 10

−2

which means that

the diﬀerences are almost independent of the pixel values.

Recently, the histogram characteristic function derived

from the diﬀerence model was used to improve steganalysis

of LSB matching [19]. Based on our experiments, however,

the ﬁrst-order model is not complex enough to clearly dis-

tinguish between dependent and independent noise, which

forced us to move to higher-order models. Instead, we model

the diﬀerences between adjacent pixels as a Markov chain.

Of course, it is impossible to use the full Markov model,

because even the ﬁrst-order Markov model would have 511

elements. By examining the histogram of diﬀerences ( Fig-

ure 2), we can see that the diﬀerences are concentrated

around zero and quickly fall oﬀ. Consequently, it makes

sense to accept as a model (and as features) only the diﬀer-

ences in a small ﬁxed range [−T, T ].

2.2 The SPAM features

We now explain the Subtractive Pixel Adjacency Model

of covers (SPAM) that will be used to compute features for

steganalysis. First, the transition probabilities along eight

directions are computed.

The diﬀerences and the transition

probability are always computed along the same direction.

We explain further calculations only on the horizontal direc-

tion as the other directions are obtained in a similar manner.

All direction-speciﬁc quantities will be denoted by a super-

script {←, →, ↓, ↑, -, &, ., %} showing the direction of the

calculation.

The calculation of features starts by computing the diﬀer-

ence array D

. For a horizontal direction left-to-right

→

i,j

= I

i,j

− I

i,j+1

i ∈ {1, . . . , m}, j ∈ {1, . . . , n − 1}.

Huang et al. [13], estimated the mutual information be-

tween I

i,j

− I

i,j+1

and I

i,j

+ I

i,j+1

to 0.0255.

There are four axes: horizontal, vertical, major and minor

diagonal, and two directions along each axis, which leads to

eight directions in total.

Order T Dimension

1st 4 162

2nd 3 686

Table 1: Dimension of models used in our exper-

iments. Column “order” shows the order of the

Markov chain and T is the range of diﬀerences.

As introduced in Section 2.1, the ﬁrst-order SPAM fea-

tures, F

1st

, model the diﬀerence arrays D by a ﬁrst-order

Markov process. For the horizontal direction, this leads to

→

u,v

= P (D

→

i,j+1

= u|D

→

i,j

= v),

where u, v ∈ {−T, . . . , T }.

The second-order SPAM features, F

2nd

, model the diﬀer-

ence arrays D by a second-order Markov process. Again, for

the horizontal direction,

→

u,v,w

= P (D

→

i,j+2

= u|D

→

i,j+1

= v, D

→

i,j

= w),

where u, v, w ∈ {−T, . . . , T }.

To decrease the feature dimensionality, we make a plau-

sible assumption that the statistics in natural images are

symmetric with respect to mirroring and ﬂipping (the eﬀect

of portrait / landscape orientation is negligible). Thus, we

separately average the horizontal and vertical matrices and

then the diagonal matrices to form the ﬁnal feature sets,

1st

, F

2nd

. With a slight abuse of notation, this can be for-

mally written:

1,...,k

→

+ M

←

+ M

↓

+ M

↑

k+1,...,2k

+ M

, (1)

where k = (2T + 1)

for the ﬁrst-order features and k =

(2T + 1)

for the second-order features. In experiments de-

scribed in Section 3, we used T = 4 for the ﬁrst-order fea-

tures, obtaining thus 2k = 162 features, and T = 3 for the

second-order features, leading to 2k = 686 features (c.f., Ta-

ble 1).

To summarize, the SPAM features are formed by the av-

eraged sample Markov transition probability matrices (1) in

the range [−T, T ]. The dimensionality of the model is de-

termined by the order of the Markov model and the range

of diﬀerences T ).

The order of the Markov chain, together with the param-

eter T , controls the complexity of the model. The concrete

choice depends on the application, computational resources,

and the number of images available for the classiﬁer training.

Practical issues associated with these choices are discussed

in Section 4.

The calculation of the diﬀerence array can be interpreted

as high-pass ﬁltering with the kernel [−1, +1], which is, in

fact, the simplest edge detector. The ﬁltering suppresses the

image content and exposes the stego noise, which results in

a higher SNR. The ﬁltering can be also seen as a diﬀerent

form of calibration [6]. From this point of view, it would

make sense to use more sophisticated ﬁlters with a better

SNR. Interestingly, none of the ﬁlters we tested

provided

We experimented with the adaptive Wiener ﬁlter with

3 × 3 neighborhood, the wavelet ﬁlter [21] used in WAM,

consistently better performance. We believe that the supe-

rior accuracy of the simple ﬁlter [−1, +1] is because it does

not distort the stego noise as more complex ﬁlters do.

3. EXPERIMENTAL RESULTS

To evaluate the performance of the proposed steganalyz-

ers, we subjected them to tests on a well known archetype

of embedding by noise adding – the LSB matching. We con-

structed and compared the steganalyzers that use the ﬁrst-

order Markov features with diﬀerences in the range [−4, +4]

(further called ﬁrst-order SPAM features) and second-order

Markov features with diﬀerences in the range [−3, +3] (fur-

ther called second-order SPAM features). Moreover, we

compared the accuracy of linear and non-linear classiﬁers

to observe if the decision boundary between the cover and

stego features is linear. Finally, we compared the SPAM ste-

ganalyzers with prior art, namely with detectors based on

WAM [8] and ALE [5] features.

3.1 Experimental methodology

3.1.1 Image databases

It is a well-known fact that the accuracy of steganalysis

may vary signiﬁcantly across diﬀerent cover sources. In par-

ticular, images with a large noise component, such as scans

of photographs, are much more challenging for steganalysis

than images with a low noise component or ﬁltered images

(JPEG compressed). In order to assess the SPAM models

and compare them with prior art under diﬀerent conditions,

we measured their accuracy on four diﬀerent databases:

1. CAMERA contains ≈ 9200 images captured by 23 dif-

ferent digital cameras in the raw format and converted

to grayscale.

2. BOWS2 contains ≈ 10800 grayscale images with ﬁxed

size 512 × 512 coming from rescaled and cropped nat-

ural images of various sizes. This database was used

during the BOWS2 contest [2].

3. NRCS consists of 1576 raw scans of ﬁlm converted to

grayscale [1].

4. JPEG85 contains 9200 images from CAMERA com-

pressed by JPEG with quality factor 85.

5. JOINT contains images from all four databases above,

≈ 30800 images.

All classiﬁers were trained and tested on the same database

of images. Even though the estimated errors are intra-

database errors, which can be considered artiﬁcial, we note

here that the errors estimated on the JOINT database can

be actually close to real world performance.

Prior to all experiments, all databases were divided into

training and testing subsets with approximately the same

number of images. In each database, two sets of stego im-

ages were created with payloads 0.5 bits per pixel (bpp) and

0.25 bpp. According to the recent evaluation of steganalytic

methods for LSB matching [4], these two embedding rates

and discrete ﬁlters,

0 +1 0

+1 −4 +1

0 +1 0

, [+1, −2, +1], and

[+1, +2, −6, +2, +1].

are already diﬃcult to detect reliably. These two embedding

rates were also used in [8].

The steganalyzers’ performance is evaluated using the min-

imal average decision error under equal probability of cover

and stego images

Err

= min

+ P

) , (2)

where P

and P

stand for the probability of false alarm

or false positive (detecting cover as stego) and probability

of missed detection (false negative).

3.1.2 Classiﬁers

In the experiments presented in this section, we used ex-

clusively soft-margin SVMs [25]. Soft-margin SVMs can bal-

ance complexity and accuracy of classiﬁers through a hyper-

parameter C penalizing the error on the training set. Higher

values of C produce classiﬁers more accurate on the training

set that are also more complex with a possibly worse gen-

eralization.

On the other hand, a smaller value of C leads

to a simpler classiﬁer with a worse accuracy on the training

set.

Depending on the choice of the kernel, SVMs can have

additional kernel parameters. In this paper, we used SVMs

with a linear kernel, which is free of any parameters, and

SVMs with a Gaussian kernel, k(x, y) = exp

−γkx − yk

with width γ > 0 as the parameter. The parameter γ has a

similar role as C. Higher values of γ make the classiﬁer more

pliable but likely prone to overﬁtting the data, while lower

values of γ have the opposite eﬀect.

Before training the SVM, the value of the penalization

parameter C and the kernel parameters (in our case γ) need

to be set. The values should be chosen to obtain a classiﬁer

with a good generalization. The standard approach is to

estimate the error on unknown samples by cross-validation

on the training set on a ﬁxed grid of values, and then select

the value corresponding to the lowest error (see [12] for de-

tails). In this paper, we used ﬁve-fold cross-validation with

the multiplicative grid:

C ∈ {0.001, 0.01, . . . , 10000}.

γ ∈ {2

|i ∈ {−d − 3, . . . , −d + 3},

where d is number of features in the subset.

3.2 Linear or non-linear?

This paragraph compares the accuracy of steganalyzers

based on ﬁrst-order and second-order SPAM features, and

steganalyzers implemented by SVMs with Gaussian and lin-

ear kernels. The steganalyzers were always trained to detect

For SVMs, the minimization in (2) is carried over the set

containing just one tuple (P

, P

) by varying the threshold

because the training algorithm of SVMs outputs one ﬁxed

classiﬁer for each pair (P

, P

) rather than a set of classi-

ﬁers. In our implementation, the reported error is calculated

according to

i=1

I(y

, ˆy

), where I(·, ·) is the indicator

function attaining 1 iﬀ y

6= ˆy

, and 0 otherwise, y

is the

true label of the i

sample and ˆy

is the label returned by the

SVM classiﬁer. In case of an equal number of positive and

negative samples, the error provided by our implementation

equals to the error calculated according to (2).

The ability of classiﬁers to generalize is described by the

error on samples unknown during the training phase of the

classiﬁer.

bpp 2nd SPAM WAM ALE

CAMERA 0.25 0.057 0.185 0.337

BOWS2 0.25 0.054 0.170 0.313

NRCS 0.25 0.167 0.293 0.319

JPEG85 0.25 0.008 0.018 0.257

JOINT 0.25 0.074 0.206 0.376

CAMERA 0.50 0.026 0.090 0.231

BOWS2 0.50 0.024 0.074 0.181

NRCS 0.50 0.068 0.157 0.259

JPEG85 0.50 0.002 0.003 0.155

JOINT 0.50 0.037 0.117 0.268

Table 3: Error (2) of steganalyzers for LSB matching

with payloads 0.25 and 0.5 bpp. The steganalyzers

were implemented as SVMs with a Gaussian kernel.

The lowest error for a given database and message

length is in boldface.

a particular payload. The reported error (2) was always

measured on images from the testing set, which were not

used in any form during training or development of the ste-

ganalyzer.

Results, summarized in Table 3.2, show that steganalyzers

implemented as Gaussian SVMs are always better than their

linear counterparts. This shows that the decision bound-

aries between cover and stego features are nonlinear, which

is especially true for databases with images of diﬀerent size

(Camera, JPEG85). Moreover, the steganalyzers built from

the second-order SPAM model with diﬀerences in the range

[−3, +3] are also always better than steganalyzers based

on ﬁrst-order SPAM model with diﬀerences in the range

[−4, +4], which indicates that the degree of the model is

more important than the range of the diﬀerences.

3.3 Comparison with prior art

Table 3 shows the classiﬁcation error (2) of the steganalyz-

ers using second-order SPAM (686 features), WAM [8] (81

features), and ALE [5] (10 features) on all four databases

and for two relative payloads. We have created a special

steganalyzer for each combination of the database, features,

and payload (total 4×3×2 = 24 steganalyzers). The stegan-

alyzers were implemented by SVMs with a Gaussian kernel

as described in Section 3.1.2.

Table 3 also clearly demonstrates that the accuracy of

steganalysis greatly depends on the cover source. For im-

ages with a low level of noise, such as JPEG-compressed

images, the steganalysis is very accurate (P

Err

= 0.8% on

images with payload 0.25 bpp). On the other hand, on very

noisy images, such as scanned photographs from the NRCS

database, the accuracy is obviously worse. Here, we have to

be cautious with the interpretation of the results, because

the NRCS database contains only 1500 images, which makes

the estimates of accuracy less reliable than on other, larger

image sets.

In all cases, the steganalyzers that used second-order SPAM

features perform the best, the WAM steganalyzers are sec-

ond with about three times higher error, and ALE stegan-

alyzers are the worst. Figure 3 compares the steganalyzers

in selected cases using the receiver operating characteristic

curve (ROC), created by varying the threshold of SVMs with

the Gaussian kernel. The dominant performance of SPAM

steganalyzers is quite apparent.

4. CURSE OF DIMENSIONALITY

Denoting the number of training samples as l and the

number of features as d, the curse of dimensionality refers

to overﬁtting the training data because of an insuﬃcient

number of training samples and a large dimensionality d

(e.g., the ratio

is too small). In theory, the number of

training samples depends exponentially on the dimension of

the training set, but the practical rule of thumb states that

the number of training samples should be at least ten times

the dimension of the training set.

One of the reasons for the popularity of SVMs is that

they are considered resistant to the curse of dimensionality

and to uninformative features. However, this is true only

for SVMs with a linear kernel. SVMs with the Gaussian

kernel (and other local kernels as well) can suﬀer from the

curse of dimensionality and their accuracy can be decreased

by uninformative features [3]. Because the dimensionality

of the second-order SPAM feature set is 686, the feature set

may be susceptible to all the above problems, especially for

experiments on the NRCS database.

This section investigates whether the large dimensionality

and uninformative features negatively inﬂuence the perfor-

mance of the steganalyzers based on second-order SPAM

features. We use a simple feature selection algorithm to

select subsets of features of diﬀerent size, and observe the

discrepancy between the errors on the training and testing

sets. If the curse of dimensionality occurs, the diﬀerence

between both errors should grow with the dimension of the

feature set.

4.1 Details of the experiment

The aim of feature selection is to select a subset of fea-

tures so that the classiﬁer’s accuracy is better or equal to

the classiﬁer implemented using the full feature set. In the-

ory, ﬁnding the optimal subset of features is an NP-complete

problem [9], which frequently suﬀers from overﬁtting. In or-

der to alleviate these issues, we used a very simple feature

selection scheme operating in a linear space. First, we cal-

culated the correlation coeﬃcient between the i

feature x

and the number of embedding changes in the stego image y

according to

corr(x

, y) =

E[x

y] − E[x

]E[y]

E[x

] − E[x

]

E[y

] − E[y]

(3)

Second, a subset of features of cardinality k was formed by

selecting k features with the highest correlation coeﬃcient.

The advantages of this approach to feature selection are a

good estimation of the ranking criteria, since the features are

evaluated separately, and a low computational complexity.

The drawback is that the dependences between multiple fea-

tures are not evaluated, which means that the selected sub-

sets of features are almost certainly not optimal, i.e., there

exists a diﬀerent subset with the same or smaller number

of features with a better classiﬁcation accuracy. Despite

this weakness, the proposed method seems to oﬀer a good

In Equation (3), E[·] stands for the empirical mean over

the variable within the brackets. For example E[x

y] =

j=1

i,j

, where x

i,j

denotes the i

element of the j

feature vector.

This approach is essentially equal to feature selection us-

ing the Hilbert-Schmidt independence criteria with linear

kernels [22].

Steganalysis by Subtractive Pixel Adjacency Matrix

Figures

Citations

Rich Models for Steganalysis of Digital Images

Ensemble Classifiers for Steganalysis of Digital Media

Break our steganographic system: the ins and outs of organizing BOSS

Universal distortion function for steganography in an arbitrary domain

Using high-dimensional image models to perform highly undetectable steganography

References

The Nature of Statistical Learning Theory

A Practical Guide to Support Vector Classication

Feature extraction : foundations and applications

F5-A Steganographic Algorithm

Low-complexity image denoising based on statistical modeling of wavelet coefficients

Related Papers (5)

Rich Models for Steganalysis of Digital Images

Ensemble Classifiers for Steganalysis of Digital Media

Using high-dimensional image models to perform highly undetectable steganography

Break our steganographic system: the ins and outs of organizing BOSS

Designing steganographic distortion using directional filters

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Steganalysis by subtractive pixel adjacency matrix" ?

Q2. What have the authors stated for future works in "Steganalysis by subtractive pixel adjacency matrix" ?

Q3. What is the heuristic behind embedding by noise adding?

Q4. Why do the authors believe the simple filter is superior to more complex filters?

Q5. How can the authors model dependences between pixels in natural images?

Q6. How many bins in an 8-bit grayscale image?

Q7. What is the effect of the curse of dimensionality?

Q8. What is the way to improve the accuracy of steganography?

Q9. How many features were shared between all four databases?

Q10. What is the difference between neighboring pixels?

Q11. Why is the feature set susceptible to the curse of dimensionality?

Q12. What is the classification error of the steganalyzers?

Q13. How many training samples should be at least ten times the dimension of the training set?