scispace - formally typeset
Open AccessJournal ArticleDOI

Steganalysis by Subtractive Pixel Adjacency Matrix

TLDR
A method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is least significant bit (LSB) matching.
Abstract
This paper presents a method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is least significant bit (LSB) matching. First, arguments are provided for modeling the differences between adjacent pixels using first-order and second-order Markov chains. Subsets of sample transition probability matrices are then used as features for a steganalyzer implemented by support vector machines. The major part of experiments, performed on four diverse image databases, focuses on evaluation of detection of LSB matching. The comparison to prior art reveals that the presented feature set offers superior accuracy in detecting LSB matching. Even though the feature set was developed specifically for spatial domain steganalysis, by constructing steganalyzers for ten algorithms for JPEG images, it is demonstrated that the features detect steganography in the transform domain as well.

read more

Content maybe subject to copyright    Report

Steganalysis by Subtractive Pixel Adjacency Matrix
Tomáš Pevný
INPG - Gipsa-Lab
46 avenue Félix Viallet
Grenoble cedex 38031
France
pevnak@gmail.com
Patrick Bas
INPG - Gipsa-Lab
46 avenue Félix Viallet
Grenoble cedex 38031
France
patrick.bas@gipsa-
lab.inpg.fr
Jessica Fridrich
Binghamton University
Department of ECE
Binghamton, NY, 13902-6000
001 607 777 6177
fridrich@binghamton.edu
ABSTRACT
This paper presents a novel method for detection of stegano-
graphic methods that embed in the spatial domain by adding
a low-amplitude independent stego signal, an example of
which is LSB matching. First, arguments are provided for
modeling differences between adjacent pixels using first-order
and second-order Markov chains. Subsets of sample tran-
sition probability matrices are then used as features for a
steganalyzer implemented by support vector machines. The
accuracy of the presented steganalyzer is evaluated on LSB
matching and four different databases. The steganalyzer
achieves superior accuracy with respect to prior art and
provides stable results across various cover sources. Since
the feature set based on second-order Markov chain is high-
dimensional, we address the issue of curse of dimensionality
using a feature selection algorithm and show that the curse
did not occur in our experiments.
Categories and Subject Descriptors
D.2.11 [Software Engineering]: Software Architectures—
information hiding
General Terms
Security, Algorithms
Keywords
Steganalysis, LSB matching, ±1 embedding
1. INTRODUCTION
A large number of practical steganographic algorithms
perform embedding by applying a mutually independent em-
bedding operation to all or selected elements of the cover [7].
The effect of embedding is equivalent to adding to the co-
ver an independent noise-like signal called stego noise. The
weakest method that falls under this paradigm is the Least
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
MM&Sec’09, September 7–8, 2009, Princeton, New Jersey, USA.
Copyright 2009 ACM 978-1-60558-492-8/09/09 ...$10.00.
Significant Bit (LSB) embedding in which LSBs of individ-
ual cover elements are replaced with message bits. In this
case, the stego noise depends on cover elements and the em-
bedding operation is LSB flipping, which is asymmetrical. It
is exactly this asymmetry that makes LSB embedding eas-
ily detectable [14, 16, 17]. A trivial modification of LSB
embedding is LSB matching (also called ±1 embedding),
which randomly increases or decreases pixel values by one
to match the LSBs with the communicated message bits.
Although both steganographic schemes are very similar in
that the cover elements are changed by at most one and the
message is read from LSBs, LSB matching is much harder
to detect. Moreover, while the accuracy of LSB stegana-
lyzers is only moderately sensitive to the cover source, most
current detectors of LSB matching exhibit performance that
can significantly vary over different cover sources [18, 4].
One of the first detectors for embedding by noise adding
used the center of gravity of the histogram characteristic
function [10, 15, 19]. A quantitative steganalyzer of LSB
matching based on maximum likelihood estimation of the
change rate was described in [23]. Alternative methods em-
ploying machine learning classifiers used features extracted
as moments of noise residuals in the wavelet domain [11,
8] and from statistics of Amplitudes of Local Extrema in
the graylevel histogram [5] (further called ALE detector).
A recently published experimental comparison of these de-
tectors [18, 4] shows that the Wavelet Absolute Moments
(WAM) steganalyzer [8] is the most accurate and versatile
and offers good overall performance on diverse images.
The heuristic behind embedding by noise adding is based
on the fact that during image acquisition many noise sources
are superimposed on the acquired image, such as the shot
noise, readout noise, amplifier noise, etc. In the literature
on digital imaging sensors, these combined noise sources are
usually modeled as an iid signal largely independent of the
content. While this is true for the raw sensor output, sub-
sequent in-camera processing, such as color interpolation,
denoising, color correction, and filtering, creates complex
dependences in the noise component of neighboring pixels.
These dependences are violated by steganographic embed-
ding because the stego noise is an iid sequence independent
of the cover image. This opens the door to possible attacks.
Indeed, most steganalysis methods in one way or another
try to use these dependences to detect the presence of the
stego noise.
The steganalysis method described in this paper exploits
the fact that embedding by noise adding alters dependences
between pixels. By modeling the differences between adja-

cent pixels in natural images, we identify deviations from
this model and postulate that such deviations are due to
steganographic embedding. The steganalyzer is constructed
as follows. A filter suppressing the image content and ex-
posing the stego noise is applied. Dependences between
neighboring pixels of the filtered image (noise residuals) are
modeled as a higher-order Markov chain. The sample tran-
sition probability matrix is then used as a vector feature
for a feature-based steganalyzer implemented using machine
learning algorithms. Based on experiments, the steganalyzer
is significantly more accurate than prior art.
The idea to model dependences between neighboring pix-
els by Markov chain appeared for the first time in [24]. It
was then further improved to model pixel differences instead
of pixel values in [26]. In our paper, we show that there is
a great performance benefit in using higher-order models
without running into the curse of dimensionality.
This paper is organized as follows. Section 2 explains
the filter used to suppress the image content and expose
the stego noise. Then, the features used for steganalysis
are introduced as the sample transition probability matrix
of a higher-order Markov model of the filtered image. The
subsequent Section 3 experimentally compares several ste-
ganalyzers differing by the order of the Markov model, its
parameters, and the implementation of the support vector
machine (SVM) classifier. This section also compares the
results with prior art. In Section 4, we use a simple feature
selection method to show that our results were not affected
by the curse of dimensionality. The paper is concluded in
Section 5.
2. SUBTRACTIVE PIXEL ADJACENCY MA-
TRIX
2.1 Rationale
In principle, higher-order dependences between pixels in
natural images can be modeled by histograms of pairs, triples,
or larger groups of neighboring pixels. However, these his-
tograms possess several unfavorable aspects that make them
difficult to be used directly as features for steganalysis:
1. The number of bins in the histograms grows exponen-
tially with the number of pixels. The curse of dimen-
sionality may be encountered even for the histogram of
pixel pairs in an 8-bit grayscale image (256
2
= 65536
bins).
2. The estimates of some bins may be noisy because they
have a very low probability of occurrence, such as com-
pletely black and completely white pixels next to each
other.
3. It is rather difficult to find a statistical model for pixel
groups because their statistics are influenced by the
image content. By working with the noise component
of images, which contains the most energy of the stego
noise signal, we increase the SNR and, at the same
time, obtain a tighter model.
The second point indicates that a good model should cap-
ture those characteristics of images that can be robustly es-
timated. The third point indicates that some pre-processing
or calibration should be applied to increase the SNR, such
as working with a noise residual as in WAM [8].
0
2 · 10
6
6 · 10
6
1 · 10
5
4 · 10
5
1 · 10
4
3 · 10
4
9 · 10
4
2 · 10
3
6 · 10
3
1 · 10
2
0
0
50
50
100
100
150
150
200
200
250
250
I
i,j
I
i,j+1
Figure 1: Distribution of two horizontally adjacent
pixels (I
i,j
, I
i,j+1
) in 8-bit grayscale images estimated
from 10000 images from the BOWS2 database (see
Section 3 for more details about the database). The
degree of gray at (x, y) is the probability P (I
i,j
=
x I
i,j+1
= y).
Representing a grayscale m × n image with a matrix
{I
i,j
|I
i,j
N, i {1, . . . , m}, j {1, . . . , n}} ,
N = {0, 1, 2, . . .},
Figure 1 shows the distribution of two horizontally adjacent
pixels (I
i,j
, I
i,j+1
) estimated from 10000 8-bit grayscale
images from the BOWS2 database. The histogram can be
accurately estimated only along the “ridge” that follows the
minor diagonal. A closer inspection of Figure 1 reveals that
the shape of this ridge (along the horizontal or vertical axis)
is approximately constant across the grayscale values. This
indicates that pixel-to-pixel dependences in natural images
can be modeled by the shape of this ridge, which is, in turn,
determined by the distribution of differences I
i,j+1
I
i,j
between neighboring pixels.
By modeling local dependences in natural images using
the differences I
i,j+1
I
i,j
, our model assumes that the dif-
ferences I
i,j+1
I
i,j
are independent of I
i,j
. In other words,
for r = k l
P (I
i,j+1
= k I
i,j
= l) P (I
i,j+1
I
i,j
= r)P (I
i,j
= l).
This “difference” model can be seen as a simplified version of
the model of two neighboring pixels, since the co-occurence
matrix of two adjacent pixels has 65536 bins, while the his-
togram of differences has only 511 bins. The differences
suppress the image content because the difference array is
essentially a high-pass-filtered version of the image (see be-
low). By replacing the full neighborhood model by the sim-
plified difference model, the information loss is likely to be
small because the mutual information between the difference
I
i,j+1
I
i,j
and I
i,j
estimated from 10800 grayscale images

20 10 0 10 20
0
5 · 10
2
0.1
0.15
0.2
0.25
Value of difference
Probability of diffence
Figure 2: Histogram of differences of two adjacent
pixels, I
i,j+1
I
i,j
, in the range [20, 20] calculated
over 10800 grayscale images from the BOWS2
database.
in the BOWS2 database is 7.615 · 10
2
,
1
which means that
the differences are almost independent of the pixel values.
Recently, the histogram characteristic function derived
from the difference model was used to improve steganalysis
of LSB matching [19]. Based on our experiments, however,
the first-order model is not complex enough to clearly dis-
tinguish between dependent and independent noise, which
forced us to move to higher-order models. Instead, we model
the differences between adjacent pixels as a Markov chain.
Of course, it is impossible to use the full Markov model,
because even the first-order Markov model would have 511
2
elements. By examining the histogram of differences ( Fig-
ure 2), we can see that the differences are concentrated
around zero and quickly fall off. Consequently, it makes
sense to accept as a model (and as features) only the differ-
ences in a small fixed range [T, T ].
2.2 The SPAM features
We now explain the Subtractive Pixel Adjacency Model
of covers (SPAM) that will be used to compute features for
steganalysis. First, the transition probabilities along eight
directions are computed.
2
The differences and the transition
probability are always computed along the same direction.
We explain further calculations only on the horizontal direc-
tion as the other directions are obtained in a similar manner.
All direction-specific quantities will be denoted by a super-
script {←, , , , -, &, ., %} showing the direction of the
calculation.
The calculation of features starts by computing the differ-
ence array D
·
. For a horizontal direction left-to-right
D
i,j
= I
i,j
I
i,j+1
,
i {1, . . . , m}, j {1, . . . , n 1}.
1
Huang et al. [13], estimated the mutual information be-
tween I
i,j
I
i,j+1
and I
i,j
+ I
i,j+1
to 0.0255.
2
There are four axes: horizontal, vertical, major and minor
diagonal, and two directions along each axis, which leads to
eight directions in total.
Order T Dimension
1st 4 162
2nd 3 686
Table 1: Dimension of models used in our exper-
iments. Column “order” shows the order of the
Markov chain and T is the range of differences.
As introduced in Section 2.1, the first-order SPAM fea-
tures, F
1st
, model the difference arrays D by a first-order
Markov process. For the horizontal direction, this leads to
M
u,v
= P (D
i,j+1
= u|D
i,j
= v),
where u, v {−T, . . . , T }.
The second-order SPAM features, F
2nd
, model the differ-
ence arrays D by a second-order Markov process. Again, for
the horizontal direction,
M
u,v,w
= P (D
i,j+2
= u|D
i,j+1
= v, D
i,j
= w),
where u, v, w {−T, . . . , T }.
To decrease the feature dimensionality, we make a plau-
sible assumption that the statistics in natural images are
symmetric with respect to mirroring and flipping (the effect
of portrait / landscape orientation is negligible). Thus, we
separately average the horizontal and vertical matrices and
then the diagonal matrices to form the final feature sets,
F
1st
, F
2nd
. With a slight abuse of notation, this can be for-
mally written:
F
·
1,...,k
=
1
4
h
M
·
+ M
·
+ M
·
+ M
·
i
,
F
·
k+1,...,2k
=
1
4
h
M
&
·
+ M
-
·
+ M
.
·
+ M
%
·
i
, (1)
where k = (2T + 1)
2
for the first-order features and k =
(2T + 1)
3
for the second-order features. In experiments de-
scribed in Section 3, we used T = 4 for the first-order fea-
tures, obtaining thus 2k = 162 features, and T = 3 for the
second-order features, leading to 2k = 686 features (c.f., Ta-
ble 1).
To summarize, the SPAM features are formed by the av-
eraged sample Markov transition probability matrices (1) in
the range [T, T ]. The dimensionality of the model is de-
termined by the order of the Markov model and the range
of differences T ).
The order of the Markov chain, together with the param-
eter T , controls the complexity of the model. The concrete
choice depends on the application, computational resources,
and the number of images available for the classifier training.
Practical issues associated with these choices are discussed
in Section 4.
The calculation of the difference array can be interpreted
as high-pass filtering with the kernel [1, +1], which is, in
fact, the simplest edge detector. The filtering suppresses the
image content and exposes the stego noise, which results in
a higher SNR. The filtering can be also seen as a different
form of calibration [6]. From this point of view, it would
make sense to use more sophisticated filters with a better
SNR. Interestingly, none of the filters we tested
3
provided
3
We experimented with the adaptive Wiener filter with
3 × 3 neighborhood, the wavelet filter [21] used in WAM,

consistently better performance. We believe that the supe-
rior accuracy of the simple filter [1, +1] is because it does
not distort the stego noise as more complex filters do.
3. EXPERIMENTAL RESULTS
To evaluate the performance of the proposed steganalyz-
ers, we subjected them to tests on a well known archetype
of embedding by noise adding the LSB matching. We con-
structed and compared the steganalyzers that use the first-
order Markov features with differences in the range [4, +4]
(further called first-order SPAM features) and second-order
Markov features with differences in the range [3, +3] (fur-
ther called second-order SPAM features). Moreover, we
compared the accuracy of linear and non-linear classifiers
to observe if the decision boundary between the cover and
stego features is linear. Finally, we compared the SPAM ste-
ganalyzers with prior art, namely with detectors based on
WAM [8] and ALE [5] features.
3.1 Experimental methodology
3.1.1 Image databases
It is a well-known fact that the accuracy of steganalysis
may vary significantly across different cover sources. In par-
ticular, images with a large noise component, such as scans
of photographs, are much more challenging for steganalysis
than images with a low noise component or filtered images
(JPEG compressed). In order to assess the SPAM models
and compare them with prior art under different conditions,
we measured their accuracy on four different databases:
1. CAMERA contains 9200 images captured by 23 dif-
ferent digital cameras in the raw format and converted
to grayscale.
2. BOWS2 contains 10800 grayscale images with fixed
size 512 × 512 coming from rescaled and cropped nat-
ural images of various sizes. This database was used
during the BOWS2 contest [2].
3. NRCS consists of 1576 raw scans of film converted to
grayscale [1].
4. JPEG85 contains 9200 images from CAMERA com-
pressed by JPEG with quality factor 85.
5. JOINT contains images from all four databases above,
30800 images.
All classifiers were trained and tested on the same database
of images. Even though the estimated errors are intra-
database errors, which can be considered artificial, we note
here that the errors estimated on the JOINT database can
be actually close to real world performance.
Prior to all experiments, all databases were divided into
training and testing subsets with approximately the same
number of images. In each database, two sets of stego im-
ages were created with payloads 0.5 bits per pixel (bpp) and
0.25 bpp. According to the recent evaluation of steganalytic
methods for LSB matching [4], these two embedding rates
and discrete filters,
"
0 +1 0
+1 4 +1
0 +1 0
#
, [+1, 2, +1], and
[+1, +2, 6, +2, +1].
are already difficult to detect reliably. These two embedding
rates were also used in [8].
The steganalyzers’ performance is evaluated using the min-
imal average decision error under equal probability of cover
and stego images
P
Err
= min
1
2
(P
Fp
+ P
Fn
) , (2)
where P
Fp
and P
Fn
stand for the probability of false alarm
or false positive (detecting cover as stego) and probability
of missed detection (false negative).
4
3.1.2 Classifiers
In the experiments presented in this section, we used ex-
clusively soft-margin SVMs [25]. Soft-margin SVMs can bal-
ance complexity and accuracy of classifiers through a hyper-
parameter C penalizing the error on the training set. Higher
values of C produce classifiers more accurate on the training
set that are also more complex with a possibly worse gen-
eralization.
5
On the other hand, a smaller value of C leads
to a simpler classifier with a worse accuracy on the training
set.
Depending on the choice of the kernel, SVMs can have
additional kernel parameters. In this paper, we used SVMs
with a linear kernel, which is free of any parameters, and
SVMs with a Gaussian kernel, k(x, y) = exp
`
γkx yk
2
2
´
,
with width γ > 0 as the parameter. The parameter γ has a
similar role as C. Higher values of γ make the classifier more
pliable but likely prone to overfitting the data, while lower
values of γ have the opposite effect.
Before training the SVM, the value of the penalization
parameter C and the kernel parameters (in our case γ) need
to be set. The values should be chosen to obtain a classifier
with a good generalization. The standard approach is to
estimate the error on unknown samples by cross-validation
on the training set on a fixed grid of values, and then select
the value corresponding to the lowest error (see [12] for de-
tails). In this paper, we used five-fold cross-validation with
the multiplicative grid:
C {0.001, 0.01, . . . , 10000}.
γ {2
i
|i {−d 3, . . . , d + 3},
where d is number of features in the subset.
3.2 Linear or non-linear?
This paragraph compares the accuracy of steganalyzers
based on first-order and second-order SPAM features, and
steganalyzers implemented by SVMs with Gaussian and lin-
ear kernels. The steganalyzers were always trained to detect
4
For SVMs, the minimization in (2) is carried over the set
containing just one tuple (P
Fp
, P
Fn
) by varying the threshold
because the training algorithm of SVMs outputs one fixed
classifier for each pair (P
Fp
, P
Fn
) rather than a set of classi-
fiers. In our implementation, the reported error is calculated
according to
1
l
P
l
i=1
I(y
i
, ˆy
i
), where I(·, ·) is the indicator
function attaining 1 iff y
i
6= ˆy
i
, and 0 otherwise, y
i
is the
true label of the i
th
sample and ˆy
i
is the label returned by the
SVM classifier. In case of an equal number of positive and
negative samples, the error provided by our implementation
equals to the error calculated according to (2).
5
The ability of classifiers to generalize is described by the
error on samples unknown during the training phase of the
classifier.

bpp 2nd SPAM WAM ALE
CAMERA 0.25 0.057 0.185 0.337
BOWS2 0.25 0.054 0.170 0.313
NRCS 0.25 0.167 0.293 0.319
JPEG85 0.25 0.008 0.018 0.257
JOINT 0.25 0.074 0.206 0.376
CAMERA 0.50 0.026 0.090 0.231
BOWS2 0.50 0.024 0.074 0.181
NRCS 0.50 0.068 0.157 0.259
JPEG85 0.50 0.002 0.003 0.155
JOINT 0.50 0.037 0.117 0.268
Table 3: Error (2) of steganalyzers for LSB matching
with payloads 0.25 and 0.5 bpp. The steganalyzers
were implemented as SVMs with a Gaussian kernel.
The lowest error for a given database and message
length is in boldface.
a particular payload. The reported error (2) was always
measured on images from the testing set, which were not
used in any form during training or development of the ste-
ganalyzer.
Results, summarized in Table 3.2, show that steganalyzers
implemented as Gaussian SVMs are always better than their
linear counterparts. This shows that the decision bound-
aries between cover and stego features are nonlinear, which
is especially true for databases with images of different size
(Camera, JPEG85). Moreover, the steganalyzers built from
the second-order SPAM model with differences in the range
[3, +3] are also always better than steganalyzers based
on first-order SPAM model with differences in the range
[4, +4], which indicates that the degree of the model is
more important than the range of the differences.
3.3 Comparison with prior art
Table 3 shows the classification error (2) of the steganalyz-
ers using second-order SPAM (686 features), WAM [8] (81
features), and ALE [5] (10 features) on all four databases
and for two relative payloads. We have created a special
steganalyzer for each combination of the database, features,
and payload (total 4×3×2 = 24 steganalyzers). The stegan-
alyzers were implemented by SVMs with a Gaussian kernel
as described in Section 3.1.2.
Table 3 also clearly demonstrates that the accuracy of
steganalysis greatly depends on the cover source. For im-
ages with a low level of noise, such as JPEG-compressed
images, the steganalysis is very accurate (P
Err
= 0.8% on
images with payload 0.25 bpp). On the other hand, on very
noisy images, such as scanned photographs from the NRCS
database, the accuracy is obviously worse. Here, we have to
be cautious with the interpretation of the results, because
the NRCS database contains only 1500 images, which makes
the estimates of accuracy less reliable than on other, larger
image sets.
In all cases, the steganalyzers that used second-order SPAM
features perform the best, the WAM steganalyzers are sec-
ond with about three times higher error, and ALE stegan-
alyzers are the worst. Figure 3 compares the steganalyzers
in selected cases using the receiver operating characteristic
curve (ROC), created by varying the threshold of SVMs with
the Gaussian kernel. The dominant performance of SPAM
steganalyzers is quite apparent.
4. CURSE OF DIMENSIONALITY
Denoting the number of training samples as l and the
number of features as d, the curse of dimensionality refers
to overfitting the training data because of an insufficient
number of training samples and a large dimensionality d
(e.g., the ratio
l
d
is too small). In theory, the number of
training samples depends exponentially on the dimension of
the training set, but the practical rule of thumb states that
the number of training samples should be at least ten times
the dimension of the training set.
One of the reasons for the popularity of SVMs is that
they are considered resistant to the curse of dimensionality
and to uninformative features. However, this is true only
for SVMs with a linear kernel. SVMs with the Gaussian
kernel (and other local kernels as well) can suffer from the
curse of dimensionality and their accuracy can be decreased
by uninformative features [3]. Because the dimensionality
of the second-order SPAM feature set is 686, the feature set
may be susceptible to all the above problems, especially for
experiments on the NRCS database.
This section investigates whether the large dimensionality
and uninformative features negatively influence the perfor-
mance of the steganalyzers based on second-order SPAM
features. We use a simple feature selection algorithm to
select subsets of features of different size, and observe the
discrepancy between the errors on the training and testing
sets. If the curse of dimensionality occurs, the difference
between both errors should grow with the dimension of the
feature set.
4.1 Details of the experiment
The aim of feature selection is to select a subset of fea-
tures so that the classifier’s accuracy is better or equal to
the classifier implemented using the full feature set. In the-
ory, finding the optimal subset of features is an NP-complete
problem [9], which frequently suffers from overfitting. In or-
der to alleviate these issues, we used a very simple feature
selection scheme operating in a linear space. First, we cal-
culated the correlation coefficient between the i
th
feature x
i
and the number of embedding changes in the stego image y
according to
6
corr(x
i
, y) =
E[x
i
y] E[x
i
]E[y]
p
E[x
2
i
] E[x
i
]
2
·
p
E[y
2
] E[y]
2
(3)
Second, a subset of features of cardinality k was formed by
selecting k features with the highest correlation coefficient.
7
The advantages of this approach to feature selection are a
good estimation of the ranking criteria, since the features are
evaluated separately, and a low computational complexity.
The drawback is that the dependences between multiple fea-
tures are not evaluated, which means that the selected sub-
sets of features are almost certainly not optimal, i.e., there
exists a different subset with the same or smaller number
of features with a better classification accuracy. Despite
this weakness, the proposed method seems to offer a good
6
In Equation (3), E[·] stands for the empirical mean over
the variable within the brackets. For example E[x
i
y] =
1
n
P
n
j=1
x
i,j
y
j
, where x
i,j
denotes the i
th
element of the j
th
feature vector.
7
This approach is essentially equal to feature selection us-
ing the Hilbert-Schmidt independence criteria with linear
kernels [22].

Citations
More filters
Journal ArticleDOI

Rich Models for Steganalysis of Digital Images

TL;DR: A novel general strategy for building steganography detectors for digital images by assembling a rich model of the noise component as a union of many diverse submodels formed by joint distributions of neighboring samples from quantized image noise residuals obtained using linear and nonlinear high-pass filters.
Journal ArticleDOI

Ensemble Classifiers for Steganalysis of Digital Media

TL;DR: This paper proposes an alternative and well-known machine learning tool-ensemble classifiers implemented as random forests-and argues that they are ideally suited for steganalysis.
Book ChapterDOI

Break our steganographic system: the ins and outs of organizing BOSS

TL;DR: This paper summarizes the first international challenge on steganalysis called BOSS (an acronym for Break The authors' Steganographic System), explaining the motivations behind the organization of the contest, its rules together with reasons for them, and the steganographic algorithm developed for the contest.
Journal ArticleDOI

Universal distortion function for steganography in an arbitrary domain

TL;DR: This paper proposes a universal distortion design called universal wavelet relative distortion (UNIWARD) that can be applied for embedding in an arbitrary domain and demonstrates experimentally using rich models as well as targeted attacks that steganographic methods built using UNIWARD match or outperform the current state of the art in the spatial domain, JPEG domain, and side-informed JPEG domain.
Book ChapterDOI

Using high-dimensional image models to perform highly undetectable steganography

TL;DR: A complete methodology for designing practical and highly-undetectable stegosystems for real digital media and explains why high-dimensional models might be problem in steganalysis, and introduces HUGO, a new embedding algorithm for spatial-domain digital images and its performance with LSB matching.
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

A Practical Guide to Support Vector Classication

TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.
Book

Feature extraction : foundations and applications

TL;DR: This book discusses Feature Extraction for Classification of Proteomic Mass Spectra, Sequence Motifs: Highly Predictive Features of Protein Function, and Combining a Filter Method with SVMs.
Book ChapterDOI

F5-A Steganographic Algorithm

TL;DR: The newly developed algorithm F5 withstands visual and statistical attacks, yet it still offers a large steganographic capacity because it implements matrix encoding to improve the efficiency of embedding and reduces the number of necessary changes.
Journal ArticleDOI

Low-complexity image denoising based on statistical modeling of wavelet coefficients

TL;DR: In this article, a simple spatially adaptive statistical model for wavelet image coefficients was introduced and applied to image denoising. But the model is inspired by a recent wavelet compression algorithm, the estimationquantization coder.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Steganalysis by subtractive pixel adjacency matrix" ?

This paper presents a novel method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is LSB matching. The steganalyzer achieves superior accuracy with respect to prior art and provides stable results across various cover sources. Since the feature set based on second-order Markov chain is highdimensional, the authors address the issue of curse of dimensionality using a feature selection algorithm and show that the curse did not occur in their experiments. 

In their future work, the authors would like to use the SPAM features to detect other steganographic algorithms for spatial domain, namely LSB embedding, and to investigate the limits of steganography in the spatial domain to determine the maximal secure payload for current spatial-domain embedding methods. Another direction worth pursuing is to use the third-order Markov chain in combination with feature selection to further improve the accuracy of steganalysis. 

The heuristic behind embedding by noise adding is based on the fact that during image acquisition many noise sources are superimposed on the acquired image, such as the shot noise, readout noise, amplifier noise, etc. 

The authors believe that the superior accuracy of the simple filter [−1,+1] is because it does not distort the stego noise as more complex filters do. 

In principle, higher-order dependences between pixels in natural images can be modeled by histograms of pairs, triples, or larger groups of neighboring pixels. 

The curse of dimensionality may be encountered even for the histogram of pixel pairs in an 8-bit grayscale image (2562 = 65536 bins). 

In their paper, the authors show that there is a great performance benefit in using higher-order models without running into the curse of dimensionality. 

Another direction worth pursuing is to use the third-order Markov chain in combination with feature selection to further improve the accuracy of steganalysis. 

At the same time, one must be aware that the feature selection is database-dependent as only 114 out of 200 best features were shared between all four databases. 

The local dependences between differences of neighboring pixels are modeled as a Markov chain, whose sample probability transition matrix is taken as a feature vector for steganalysis. 

Because the dimensionality of the second-order SPAM feature set is 686, the feature set may be susceptible to all the above problems, especially for experiments on the NRCS database. 

In all cases, the steganalyzers that used second-order SPAM features perform the best, the WAM steganalyzers are second with about three times higher error, and ALE steganalyzers are the worst. 

In theory, the number oftraining samples depends exponentially on the dimension of the training set, but the practical rule of thumb states that the number of training samples should be at least ten times the dimension of the training set.