期刊名称:International Journal of Security and Its Applications
印刷版ISSN:1738-9976
出版年度:2016
卷号:10
期号:7
页码:363-372
DOI:10.14257/ijsia.2016.10.7.32
出版社:SERSC
摘要:In order to get a higher ranking, spam pages deceive the search engine using cheating technology, which will disturb the users to find useful information via search engine. The web spam is designed for search engines rather than for users, so it is important to make a distinction between the normal web pages and the web spam pages. The links of the normal web pages have a wide variety of sources and the content feature of the normal web pages are distributed regularly, while links source of the web spam pages is single and the content features of them are distributed disorderly. So after analyzing the link diversity and content features distribution of the web pages, a new web page ranking algorithm was proposed in this paper. In this method, the web pages ranking score is calculated by the TrustRank method combining web pages links diversity and the web pages content features. It can be shown from the experimental results that this method can effectively reduce spam pages ranking score.
关键词:web spam; normal web pages; link diversity; content features; TrustRank; ; PageRank