期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:4
页码:4936-4942
出版社:TechScience Publications
摘要:While Web document quality gets decreases by the presence of Web spam, as a effect of this high commercial value of top-ranked search-engine results. Web spamming refers to the introduction of some ranking algorithm by which some web pages gets higher rank than they actually deserve. To get over of this we present first new approaches like Language Model Disagreement and Qualified link analysis and also introduced new classification methods. There are variety of classification algorithm are available but Decision tree is the simplest one because it’s having uncomplicated hierarchical structure. Here we are utilizing C5.0 as a modified decision tree classification algorithm of C4.5. In this paper we are comparing the accuracy of various classification algorithm and finds C5.0 gives highest accuracy. For this we are using publicly available dataset of WEBSPAM-UK2006 and WEBSPAM-UK2007.
关键词:Classification; Data mining; Web spam;detection; Language Model; Feature selection; Decision tree.