scispace - formally typeset
Open AccessProceedings ArticleDOI

Lossless data embedding with file size preservation

TLDR
The proposed methods are the first examples of lossless embedding methods that preserve the file size for image formats that use lossless compression.
Abstract
In lossless watermarking, it is possible to completely remove the embedding distortion from the watermarked image and recover an exact copy of the original unwatermarked image. Lossless watermarks found applications in fragile authentication, integrity protection, and metadata embedding. It is especially important for medical and military images. Frequently, lossless embedding disproportionably increases the file size for image formats that contain lossless compression (RLE BMP, GIF, JPEG, PNG, etc...). This partially negates the advantage of embedding information as opposed to appending it. In this paper, we introduce lossless watermarking techniques that preserve the file size. The formats addressed are RLE encoded bitmaps and sequentially encoded JPEG images. The lossless embedding for the RLE BMP format is designed in such a manner to guarantee that the message extraction and original image reconstruction is insensitive to different RLE encoders, image palette reshuffling, as well as to removing or adding duplicate palette colors. The performance of both methods is demonstrated on test images by showing the capacity, distortion, and embedding rate. The proposed methods are the first examples of lossless embedding methods that preserve the file size for image formats that use lossless compression.

read more

Content maybe subject to copyright    Report

Lossless Data Embedding with File Size Preservation
Jessica Fridrich
, Miroslav Goljan, Qing Chen, and Vivek Pathak
Department of Electrical and Computer Engineering
SUNY Binghamton, Binghamton, NY 13902-6000, USA
ABSTRACT
In lossless watermarking, it is possible to completely remove the embedding distortion from the watermarked image
and recover an exact copy of the original unwatermarked image. Lossless watermarks found applications in fragile
authentication, integrity protection, and metadata embedding. It is especially important for medical and military
images. Frequently, lossless embedding disproportionably increases the file size for image formats that contain
lossless compression (RLE BMP, GIF, JPEG, PNG, etc…). This partially negates the advantage of embedding
information as opposed to appending it. In this paper, we introduce lossless watermarking techniques that preserve the
file size. The formats addressed are RLE encoded bitmaps and sequentially encoded JPEG images. The lossless
embedding for the RLE BMP format is designed in such a manner to guarantee that the message extraction and
original image reconstruction is insensitive to different RLE encoders, image palette reshuffling, as well as to
removing or adding duplicate palette colors. The performance of both methods is demonstrated on test images by
showing the capacity, distortion, and embedding rate. The proposed methods are the first examples of lossless
embedding methods that preserve the file size for image formats that use lossless compression.
Keywords: Embedding, lossless, erasable, invertible, removable, distortion, file size preservation, RLE, JPEG
1. INTRODUCTION
Lossless embedding is a term for a class of data hiding techniques that are capable of restoring the embedded image to
its original state without accessing any side information. One can say that the embedding distortion can be erased or
removed from the embedded image. This is why some researchers refer to this type of embedding as erasable,
removable, invertible, or distortion-free.
The idea of lossless embedding was for the first time proposed by Honsinger
1
in 1999. This technique, originally
designed for lossless authentication, suffered from visible distortion (for some images) and limited capacity. Fridrich et
al.
2
introduced a general methodology for lossless embedding in digital images that is based on lossless compression of
image features. In this method, one first selects a subset X of image features that is losslessly compressible and that can
be randomized without causing visible degradation to the image. The lossless embedding proceeds by compressing X
to C(X) and replacing X with C(X) &
L
jj
m
1
}{
=
, where m
j
are the message bits and ‘&’ denotes concatenation. This way,
one can losslessly embed up to |X|–|C(X)| bits. This embedding paradigm is very general and many schemes can be
designed by selecting different image features
2,3
.
Alternative approaches to lossless embedding were later proposed by Macq
4
, Tian
5
, and Kalker
6
. Researchers have
focused on different aspects of lossless embedding schemes. Increasing the lossless embedding capacity has recently
been the principle motivation
3,5,6
. Celik
3
described a lossless authentication method with localization. Kalker et al.
7
proposed an approach that minimizes distortion per embedded bit. The first lossless embedding scheme for audio
signals has been described by Kalker
6
.
fridrich@binghamton.edu; phone: 1 607 777-2577; fax: 1607 777-4464; http://www.ws.binghamton.edu/fridrich; SUNY
Binghamton; Watson School of Engineering, Dept. of Electrical and Computer Engineering, Binghamton, NY USA 13902-6000

So far, little attention has been paid to the increase of the file size introduced by lossless embedding. In lossless
embedding schemes designed for image formats that use some form of lossless compression, the increase in the file
size could be many times larger than the actual number of embedded bits L. This inefficiency partially outweighs the
advantage of embedding the data as opposed to appending it to the cover image. In fact, the sponsors of this research
have expressed the need for lossless embedding schemes that preserve the file size.
The act of lossless embedding of a random message stream increases the entropy E(I) of the cover image I to
E(Y)=E(I)+L, where Y is the embedded image. Fortunately, any specific lossless compression algorithm does not
compress Y to the ideal E(Y) bits but to |C(Y)| bits, where |C(Y)|>E(Y) and C(Y) is the compressed embedded image.
Consequently, one can theoretically at most |C(Y)|–E(Y) bits losslessly and still preserve the file size
+
. To design such a
scheme, however, one will likely have to tailor it to the specific compression scheme as well as the image format.
For example, if the cover image is an RGB encoded BMP file, the embedding does not increase its file size because the
RGB BMP format does not incorporate any compression. However, the run length encoded (RLE) BMP, GIF, and
JPEG contain lossless compression
8
(runlength, LZ77, and Huffman, respectively). Thus, the embedded file has a
different, usually larger, size than the original.
This paper is the first step to developing lossless embedding techniques that preserve the file size. We have chosen two
of the most common formats – the RLE encoded BMP image format and the ubiquitous JPEG format. In the next
section, we describe the RLE compression algorithm and then, in Section 3, the RS lossless embedding scheme with
file size preservation is introduced. Experimental results are presented in Section 4. In Section 5, we describe the
relevant details of the JPEG format and the lossless file-size preserving technique. The algorithm performance is
discussed in Section 6. Conclusions and future research are included in Section 7.
2. RLE COMPRESSION
Run length encoding (RLE) is a simple lossless compression that assigns short codes to long runs of identical symbols.
It is used in the BMP format for images with up to 256 colors. The RLE format decoding rules are simple:
n B decode as byte B repeated n-times, n1,
0 0 EOL; end of row,
0 1 EOB; end of bitmap,
0 2 x y Delta; move x pixels to the right and y pixels down,
0 n A
1
…A
n
(0) n3, decode as A
1
…A
n
, zero is padded when n is odd.
The last decoding rule is called the “absolute mode”. An important observation is that although the decoded image is
always unique, the encoding can be done in many different ways. For example, some RLE implementations never use
the code “nB” for n=1 but use the absolute mode instead. Therefore, different RLE encoders may generate files with
slightly different sizes.
3. LOSSLESS EMBEDDING WITH FILE SIZE PRESERVATION FOR RLE BMPs
3.1 Problem statement
Because there exist many different RLE encoders, the embedding scheme must also guarantee that the message and the
original image can be extracted from the embedded and encoded image independently of the RLE encoder.
The Air Force Office of Scientific Research and the Air Force Research Laboratory in Rome, NY.
+
In practice, however, we are not likely to achieve this capacity because the embedded image must be perceptually equivalent to
the original image.

We have decided to use the RS lossless data embedding method
2
as our starting point for the design of a lossless file-
size preserving method. This method seemed to be the most amenable to modifications that would enable us such
construction.
Lossless embedding with file size preservation for RLE compressed images (LE4RLE) should satisfy the following
requirements:
(R1) The file size of the original and the embedded images must be equal after RLE compression using virtually
any RLE compressor.
(R2) The original image can be retrieved from the embedded image exactly.
(R3) Any image processing that does not modify image content (image renaming, palette reordering, removing or
introducing duplicate entries in the palette, image lossless compression and/or decompression of any kind)
must not lead to message extraction failure.
(R4) The message and the original image can be retrieved from both RLE compressed and decompressed images.
(R5) Embedded images should be perceptually equivalent to their originals, keeping the embedding distortion as
low as possible.
3.2 Defining concepts
In this section, we briefly introduce the concepts needed for the description of the RS embedding method
2
and its
LE4RLE modification that preserves file size (in Section 3.4).
First, all palette colors are divided into disjoint (unordered) pairs {c
i
, c
j
} of perceptually similar colors (some colors
may be paired to themselves). The set of all color pairs is denoted as P. Furthermore, for each color c
i
, we define its
flipped color as
i
c = c
j
, where {c
i
, c
j
} is a color pair from P.
Next, we extend the flipping operation to a group of k pixels with colors (c
1
, c
2
, …, c
k
) and a binary mask M{0,1}
k
:
)',...,','(
21 k
M
cccG = , where G = (c
1
, c
2
, …, c
k
) and
=
=
=
0
1
'
ii
ii
i
Mc
Mc
c
, i = 1, …, k
.
The mask M can be the same for all groups (as it is the case in the original RS embedding) or be individually defined
for each group (in this paper). We further define the discrimination function f(G)
=
+
==
1
1
121
),(),...,,()(
k
i
iik
ccdcccfGf , (1)
where d is the distance between two colors. The selection of color pairs and the distance d is detailed in Section 3.5.
Finally, we describe a function that assigns one bit b(G) to each group G:
<
>
=
.)()(undefined,
)()(,1
)()(,0
)(
TGfGf
TGfGf
TGfGf
Gb
(2)
The threshold T can be used to achieve different capacity-distortion rate (see Section 4). Note that for natural images
the flipped group
G will be “noisier” than G and thus Prob{f( G )>f(G)}>1/2. Consequently, b(G) will have more 0’s
than 1’s.
Because
HGHG == , we have )(1)( GbGb = , whenever b(G) is defined. Also, b(G) is defined if and only if
)(Gb is defined.

3.3 RS lossless embedding
Following the original method, the RS lossless embedding starts by dividing the original image X into disjoint groups
of the same size and shape (e.g., 2×2 blocks). Let G
i
, i=1, 2,…, N be all the groups for which b
i
= b(G
i
) is defined. The
RS algorithm flips some of the groups G
i
to G
i
so that their associated bits b
i
= b(G
i
) encode the message and the
(compressed) original bits
)}({
1
N
ii
bC
=
&
L
jj
m
1
}{
=
, where C({b
i
}) is the losslessly compressed
bit-stream {b
i
} needed for
reconstruction of the original image. Note that because the bit-stream {b
i
} contains more 0’s than 1’s, it will be
losslessly compressible. As a result, this method can embed up to N–|C({b
i
})| message bits m
j
.
At the decoder, the compressed bit-stream C({b
i
}) and the message bits {m
j
} are extracted. Then, the groups G
i
are
flipped as needed to match their associated bits b
i
with the extracted and decompressed bit-stream {b
i
} thus obtaining
an exact copy of the original image.
Next, we explain how this scheme can be modified to guarantee file size preservation for RLE encoded BMP images.
3.4 RS lossless embedding with file size preservation
In the RLE BMP format, the image data X is represented by indices x
i
to the image palette, which can have up to 256
entries. Let c(x
i
) denote the color of the pixel x
i
. During embedding, each pixel x
i
can either stay unmodified or be
changed to
i
x , where
i
x is the index to the color )(
i
xc .
We start with a simple observation that the size of the RLE compressed image will not be changed by embedding if the
length of all runs (along image rows) is not changed. This means that any sequence of pixels ‘yxxxz’ can be changed
to ‘ywwwz’ by replacing x with w, wy and wz.
Given the image X ={x
i
}, i=1, 2,…, N
p
, represented as a row vector (pixels arranged by rows), we define the invariant
image R={r
i
} as
},min{
iii
xxr
=
, i=1, 2,…, N
p
.
Thus, the image R does not distinguish between the colors in the pair
)}(),({
ii
xcxc P.
When scanning a row of pixels in R, transform this image using the RLE code “nB” only, whenever the index B is
repeated n times, and as “00” for End of Line. The sequence of numbers n determines the length of row segments that
the embedding algorithm must leave unmodified or modify simultaneously to
Bn . Because each segment will carry the
same amount of hidden information regardless of its length, to keep the distortion low, it is better to limit the length of
each segment to a small number. In this paper, we use segments consisting of exactly one pixel. The set of all pixels
that belong to such segments of length 1 will be denoted as Q
x
i
Q
(r
i
r
i–1
and r
i
r
i+1
).
If the embedding algorithm flips only the pixels in Q, the file size of the embedded image will be preserved under any
RLE encoder. Thus, the lossless method with file size preservation proceeds in the same way as the original RS
method with one difference – the mask M for each group G is determined by pixels from Q. This mask will reflect
which pixels can be modified and which cannot.
To obtain the individual masks, we define a binary matrix E=(e
i
), of the same size as the image, that captures which
pixels may (1) and must not (0) be modified
=
.0
1
Qx
Qx
e
i
i
i
In RS method
2
, adaptive arithmetic coding was used to compress the bit-stream b(G
i
).

Each group of k pixels
),...,,(
21 k
iii
xxx
with colors G =
))(,),((
1 k
ii
xcxc K
will have its own embedding mask M
),...,,(
21 k
iii
eeeM =
. Thus, )',..,','(
21 k
M
cccG = , where
=
. )(
)(
'
Qxxc
Qxxc
c
jj
jj
ii
ii
j
Note that the embedding mask can be uniquely determined from both the original and embedded images.
Now, let us summarize all steps of lossless embedding with file size preservation.
Encoder
1. Determine pairs of close colors P (see Section 3.5)
2. Calculate the set Q of modifiable pixels.
3. Divide the image into disjoint groups G. Calculate b
i
= b(G
i
) for all groups of pixels whenever they are
defined. Use a pseudo-random order for index i of G
i
.
4. Start compressing the bit sequence {b
i
}
to C{b
i
}. Stop the compression at b
k
as soon as the inequality
k l + L + length(C{b
i
}
k
i
1=
) is satisfied (l is the number of bits that encodes message length).
5. Form the composite message Message_length & Message_bits & C{b
i
} spanning l, L, and V bits.
6. For each i, if (b
i
i-th bit of C{b
i
}&{m
j
}) then flip G
i
to
i
G .
Decoder
1.–2. The same as in Encoder.
3. Divide the image into disjoint groups G. Calculate b
i
= b(G
i
) for all groups of pixels whenever they are
defined. Use the same pseudo-random order for index i of G
i
as during embedding.
4. Read b
1
b
2
b
l
and message bits b
l+1
b
l+2
b
l+L
.
5. Set j=1. Decompress the segment b
l+L+1
b
l+L+j
and denote the length of the decompressed segment V.
6. If V < l+L+j then Go to 5, else Stop. The decompressed bits are b
1
b
2
b
l+L+V
.
7. For i =1, …, V, if (b
i
b
l+L+i
), flip G
i
to
i
G .
Because both the encoder and decoder start from the image decompressed to the spatial domain, the method is
insensitive to differences between RLE encoders. Also, presorting the palette to a fixed order (e.g., alphabetically)
before determining color pairs P will make the system work after palette reshuffling. The problem of removing or
adding duplicate palette entries can be addressed by unifying duplicate entries in the palette to the lowest one from all
duplicate indices before embedding and returning the occurrences of the duplicate colors after embedding. In
particular, let c
1
, c
2
, …, c
k
are different palette entries corresponding to one RGB color, c
1
< c
2
< … < c
k
. Before
embedding, we modify all pixels with colors c
2
, …, c
k
to c
1
. After embedding, the pixels with colors c
1
are changed
back to c
j
if they were equal to c
j
in the original image, for all j = 2, …, k. The first step guarantees that it will not
matter whether or not the duplicate entries are removed and the last step guarantees that the file size will not decrease
during data insertion.
3.5 Color pairing and distance d
Let D = {d
ij
}, i, j =1, …, n, n 256, be the matrix of distances between colors i and j after presorting the colors that
appear in the image. These distances can be measured in any color space, such as RGB, YUV, or CIELAB. After
subjectively evaluating results of experiments with different spaces, we selected the square of the weighted Euclidean
RGB distance
2
2
2
2
2
2
)()()(
jibjigjir
bbwggwrrwd ++= ,
where w
r
= 0.35, w
g
= 0.4, and w
b
= 0.25, and r
i
, r
j
, g
i
, g
j
, b
i
, and b
j
are integers in the range from 0 to 255.

Citations
More filters
Journal ArticleDOI

Reversible Watermarking: Current Status and Key Issues

TL;DR: The aim of this paper is to define the purpose of reversible watermarking, reflecting recent progress, and provide some research issues for the future.
Book ChapterDOI

Reversible data hiding for JPEG images based on histogram pairs

TL;DR: This paper proposes a lossless data hiding technique for JPEG images based on histogram pairs that embeds data into the JPEG quantized 8x8 block DCT coefficients and can obtain higher payload than the prior arts.
Journal ArticleDOI

Lossless data hiding in JPEG bitstream

TL;DR: This paper proposes a method of embedding secret data into JPEG bitstream by Huffman code mapping and preserves the image with no quality distortion and provides more embedding capacity.
Journal ArticleDOI

Data Embedding in JPEG Bitstream by Code Mapping

TL;DR: An algorithm to embed data directly in the bitstream of JPEG imagery by remapping run/size values of marked VLCs so that standard viewers do not lose synchronization and displays the image with minimum loss of quality.
Journal ArticleDOI

Secure data hiding techniques: a survey

TL;DR: This article presents a detailed discussion of different prospects of digital image watermarking and performance comparisons of the discussed techniques are presented in tabular format.
References
More filters
Book

Introduction to data compression

TL;DR: The author explains the development of the Huffman Coding Algorithm and some of the techniques used in its implementation, as well as some of its applications, including Image Compression, which is based on the JBIG standard.
Patent

Lossless recovery of an original image containing embedded data

TL;DR: In this paper, a method and system embeds digital meta-data into an original image in such a way that the meta data can be completely removed at a later time to allow loss less recovery of the original image.
Proceedings ArticleDOI

Lossless data embedding for all image formats

TL;DR: In this article, the authors formulate two general methodologies for lossless embedding that can be applied to images as well as any other digital objects, including video, audio, and other structures with redundancy.
Proceedings ArticleDOI

Capacity bounds and constructions for reversible data-hiding

TL;DR: The purpose of this paper is to repair this situation and to provide some first results on the limits of reversible data-hiding.
Proceedings ArticleDOI

Circular interpretation of histogram for reversible watermarking

TL;DR: An original circular interpretation of a bijective transformation is proposed to implement a method that fulfill all quality and functionality requirements of lossless watermarking methods.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Lossless data embedding with file size preservation" ?

In this paper, the authors introduce lossless watermarking techniques that preserve the file size. 

Future research will be directed towards development of lossless embedding techniques with file size preservation for other image formats that include lossless compression, such as GIF, PNG, or JPEG2000. Also, obtaining theoretical upper bounds on capacity given the compression method and properties of typical images is an open and interesting question that deserves further study. 

The authors envision image authentication, image integrity protection, and metadata embedding as the main application areas for the new embedding technology. 

Since the target application of lossless embedding is authentication, possibly combined with metadata embedding, the capacities seem to be adequate for this purpose. 

Once an appropriate distance measure d in the RGB color space is established, one can attempt to determine the color pairing P that minimizes the distortion with a lower bound on the capacity or maximize the capacity with an upper bound on the distortion. 

The authors work with the sequence of intermediate symbols (after Huffman decompression) and modify the amplitude of certain DCT coefficients by at most one. 

The file size increase ∆ may become very large if the act of embedding makes the image significantly less compressible using RLE (for Image No. 4, ∆ is more than 20 times larger than the message length). 

Lossless embedding is a term for a class of data hiding techniques that are capable of restoring the embedded image to its original state without accessing any side information. 

Because the amplitude category is not Huffman coded and because the modifications are always confined to the same amplitude category, the embedded file size stays the same. 

presorting the palette to a fixed order (e.g., alphabetically) before determining color pairs P will make the system work after palette reshuffling. 

The JPEG encoder consists of three fundamental components (see Fig. 1): Forward Discrete Cosine Transform (FDCT), a scalar quantizer, and an entropy-encoder. 

Lossless embedding with file size preservation for RLE compressed images (LE4RLE) should satisfy the following requirements:(R1) The file size of the original and the embedded images must be equal after RLE compression using virtually any RLE compressor. 

to obtain a more efficient lossless compression of the sequence T, the authors divide T into several subsequences (each subsequence corresponding to one category) and perform the arithmetic compression for coefficients from each category separately. 

Following the original method, the RS lossless embedding starts by dividing the original image X into disjoint groups of the same size and shape (e.g., 2×2 blocks).