Research on estimation length of hidden message.
Racuciu, Ciprian Iulian ; Mihailescu, Marius Iulian ; Garban, Valentin 等
1. INTRODUCTION
Steganography represent the art of covered, or hidden, writing.
Steganography hides the covert message but not the fact that two
parties are communicating with each other. Using a transport in which
the stego process involves placing a hidden message, named carrier. The
workpaper shows that the secret message is embedded with the carrier to
form the stego environment.
Using a stego key it can be used for encryption of the hidden
message and/or for randomization in the stego system. Thus, we can
resume:
stego_environment = hidden_message + carrier + stego_key
Steganalysis, the detection of this hidden information, is an
inherently difficult problem and requires a thorough investigation.
Steganalysis is a relatively new branch of research. While
steganography (which is somewhat different from watermarking) deals with
techniques for hiding information, the goal of steganalysis is to detect
and/or estimate potentially hidden information from observed data with
little or no knowledge about the steganography algorithm and/or its
parameters. It is fair to say that steganalysis is both an art and a
science. The art of steganalysis plays a major role in the selection of
features or characteristics a typical stego message might exhibit while
the science helps in reliably testing the selected features for the
presence of hidden information.
While it is possible to design a reasonably good steganalysis
technique for a specific steganography algorithm, the long term goal
must be to develop a steganalysis framework that can work effectively at
least for a class of steganography methods if not for all. Clearly, this
poses a number of mathematical challenges and questions.
2. DATA HIDING
As long as people have been able to communicate with one another,
there has been a desire to do so secretly. Two general approaches to
covert exchanges of information have been: communicate in a way
understandable by the intended parties, but unintelligible to
eavesdroppers; or communicate innocuously, so no extra party bothers to
eavesdrop. Naturally both of these methods can be used concurrently to
enhance privacy. The formal studies of these methods, cryptography and
steganography, have evolved and become increasingly more sophisticated
over the centuries to the modern digital age.
Methods for hiding data into cover or host media, such as audio,
images, and video, were developed about a decade ago.
Steganography generally is subjected to less vicious attacks,
however as much data as possible is to be inserted. Additionally,
whereas in some cases it may actually serve a watermarker to advertise
the existence of hidden data, it is of paramount importance for a
steganographer's data to remain hidden.
3. LSB INFORMATION
An early method used to detect LSB hiding is the [x.sup.2]
(chisquared) technique, later successfully used by Provos'
stegdetect for detection of LSB hiding in JPEG coefficients. We first
note that generally the binary message data is assumed to be independent
and indentically distributed with the probability of 0 equality to the
probability of 1. If the hider's intended message does not have
these properties, a wise steganographer would use an entropy coder to
reduce the size of the message; the compressed version of the message
should fulfill the assumptions. Because 0 and 1 are equally likely,
after overwriting the LSB, it is expected that the number of pixels in a
pair of values which share all but the LSB are equalized.
Although, we would expect these numbers to be close before hiding,
we do not expect them to be equal in typical cover data.
Due to this effect, if a histogram of the stego data is taken over
all pixel values (e.g. 0 to 255 for 8-bit data), a clear
"steplike" trend can be seen. We know then exactly what the
histogram is expected to look like after LSB hiding in every pixel (or
DCT coefficient). The [x.sup.2] test is a goodness-of-fit measure which
analyzes how close the histogram of the image under scrutiny is to the
expected histogram of that image with embedded data. If it is
"close", we decide it has hidden data, otherwise not. In other
words, [x.sup.2] is a measure of the likelihood that the unknown image
is stego.
4. IMPROVED DIFFERENCE IMAGE HISTOGRAM STEGANALYSIS
For detection and estimation for length of hidden message,
difference image histogram algorithm was primarily based on the
statistical hypothesis that for natural images (1)
[a.sub.i] [approximately equal to] [yl.sub.i]
and for a stego-images with the LSB plane fully embedded
[a.sub.i] [approximately equal to] 1(2)
Obviously, the hypotheses given in Equations (1) and (2) will
affect the precision of the Difference image histogram method. Once in
these hypotheses there exists some initial bias, the estimate value via
the Equation (1) will not be reliable. When the embedding ratio is low,
the bias of these hypotheses will lead the incorrect decision, and if
there are no embedding messages in images, the false alarm rate is high.
Table 1 will show the mean and variance of the [y.sub.i] to al value.
With the increase in i the variance increases and the mean begins
to deviate from 1. In some cases the detection lead to an incorrect
decision of estimating more than 1% embedding, for the normal images.
5. IMPROVED DIFFERENCE IMAGE HISTOGRAM ALGORITHM
The Improved Difference Image Histogram algorithm consists in
several steps, as follows: Input is represented by a set of BMP images
for detecting; Output the embedded ratio estimate [p.sub.modified] for
each image; Step 1, from the set of the images, we select one single
image; Step 2, we obtain difference image histogram of the image before
(hi) and after flipping the LSB planes to ,,zero" (g); Step 3, for
steps 4 to 8 do for each value of i = 0,1,2; Step 4, we calculate the
statistical values for the image i.e. [a.sub.i] =
[a.sub.2i+2,2i+1/[a.sub.2i,2i+1], [[beta].sub.i] and
[a.sub.2i+2,2i+3]/[a.sub.2i,2i]-1 and [[gamma].sub.i] =,
[g.sub.2i]/[g.sub.2i+2] where the co-efficient which represent the
transition can be estimated using the following equation:
[a.sub.0,1] = [a.sub.0,-1] = [g.sub.0]-[h.sub.0]/[2.sub.g0], (3)
[a.sub.2i,2i] = [h.sub.2i]/[g.sub.2i] (4)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5)
[a.sub.2i,2i+1] = 1 - [a.sub.2i,2i] - [a.sub.2i,2i] -1 (6)
Step 5, we obtain de value ,,p" from the root of the below
equation qhose absolute value is smaller.
[2d.sub.1] [p.sup.2] + ([dl.sub.3] - [4d.sub.1] - [d.sub.2]) P +
[2d.sub.2] = 0 (7)
in which [d.sub.1] = 1 - [[gamma].sub.i], [d.sub.2] = [a.sub.i] -
[[gamma].sub.i], [d.sub.3] = 1 - [[beta].sub.i]; Step 6, calculate the
value [a.sub.1](0) which represents the estimation of for zero embedded
message length using the following equation
[a.sub.i] = [2e.sup.2.sub.1] - [2.sub.e1][e.sub.2] - ([e.sub.2] +
[e.sub.3](1 - [e.sub.1]/[2e.sup.2.sub.1] (8)
in which [e.sub.1] = 1 - p,[e.sub.3] = 1 = [a.sub.1] and [e.sub.3]
= 1 - [[beta].sub.i]
Step 7, we calculate the initial bias,, e" with the formula:
[epsilon] = [[gamma].sub.i] - [a.sub.i](0) (9)
Step 8, now we will subtract the error ,, e" from the p to
obtain the modified estimation ration .
(10) [P.sub.modified] (i) = p - e
Step 9, the average of [p.sub.modified(i) for i = 0,1,2 will give
final embedded ratio [p.sub.modifield]
6. EXPERIMENTAL RESULTS
For experimental tests, we have selected more standard 512x512 test
images (such as Lena, Peppers and so on). We have apply sequential and
random LSB replacement to embed the images with the ratio of p= 0, 10%,
20%, ... ,100% respectively with 10% increments we created two
databases.
Then we have use the RS method, DIHmethod and GEFR method to
estimate the embedding ratio of secret information respectively. The
mask used in the RS method is [1,0; 0,1]. The results from testing the
test images are obtained by DIH method and the proposed method (IDIH)
which is shown in Table 1.
The leftmost column in Table 1 is the real embedding ratio, and
column "IDIH", "DIH" represent the estimate
embedding ratio got by Improve Difference Image Histogram method
(proposed method) and Difference Image Histogram Method (DIH)
respectively. The estimate precision of IDIH is higher than DIH
obviously; this aspect is shown in the Table 1.
In the case of sequential embedding, the accuracy is much higher
than the case of random embedding for the embedded ratios of greater
than 40%. It is having a higher performance to all the other
steganalytic techniques for entire range of possible embedding lengths.
7. CONCLUSIONS
This paper proposes a new detection algorithm, which represent an
improved algorithm of the difference image histogram algorithm and
performed tests on a group of raw lossless images. Results based on
experimental tests, show that the improved difference image histogram
steganalysis method is more accurate and guarantee the assurance than
the conventional difference image histogram method. The algorithm
described in this paper, reduces the mean error with 50% for embedding
ratios greater than 40% when compared to the DIH algorithm.
Finally we have generally focused on grayscale still images.
However the methods we presented here can be applied to the study of
data hiding in color images, video, and audio.
8. REFERENCES
Anantharam V. A large deviations approach to error exponents in
source coding and hypothesis testing. IEEE Trans. on Information Theory,
36(4):938-943, 2008
Neil F. Johnson, Sushil Jajodia: Steganalysis of Images Created
Using Current Steganography Software, in David Aucsmith (Ed.):
Information Hiding, LNCS 1525, Springer-Verlag Berlin Heidelberg 2007.
pp. 32-47
Robert Tinsley, Steganography and JPEG Compression, Final Year
Project Report, University of Warwick, 1999
Rowland, C.H. "Covert Channels in the [T.SUB.C]P/IP Protocol
Suite." First Monday, 2006.URL:
http://www.firstmonday.dk/issues/issue25/rowland/.Last accessed:
2010-12-21. URL:
http://www.guides.sk/psionic/covert/covert.tcp.txt. Last accessed:
2004-01-10
K. Bennett,Linguistic steganography: Survey, analysis and
robustness cocerns for hiding information in text, Tech. Rep. TR
2004-13, Purdue CERIAS, May 2009
Tab. 1. Comparison between IDIH and DIH
Random Sequential
Embedding
ratio (%) IDIH DIH IDIH DIH
0% 0.3052 1.6855 0.3052 1.6855
10% 14.7804 15.3881 15.6703 16.0368
20% 20.38 20.80 27.98 28.11
30% 20.3764 20.8017 27.9818 28.1124
40% 40.1524 42.9062 44.3258 44.922
50% 48.6793 52.2864 49.7154 48.5228
60% 62.245 63.8 60.5394 56.5979
70% 72.7311 66.67118 69.7919 68.726
80% 84.6388 73.4632 80.8796 72.2582
90% 90.9915 85.8664 84.8516 81.955
100% 98.6088 95.5193 98.6088 92.5193