期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2009
卷号:4
期号:2
出版社:SERSC
摘要:Natural language watermarking (NLW) is a kind of digital rights management (DRM) techniques specially designed for natural language documents. Watermarking algorithms based on synonym substitution are the most popular kind, they embeds watermark into documents in linguistic meaning-preserving ways. A lot of work has been done on embedding, but only a little on steganalysis such as detecting, destroying, and extracting the watermark. In this paper, we try to distinguish between watermarked articles and unwatermarked articles using context information. We evaluate the suitability of words for their context, and then the suitability sequence of words leads to the final judgment made by a SVM (support vector machine) classifier. IDF (inverse document frequency) is used to weight words’ suitability in order to balance common words and rare ones. This scheme is evaluated on internet instead of in a specific corpus, with the help of Google. Experimental results show that classification accuracy achieves 90.0%. And further analysis of several influencing factors affecting detection effects is also presented.