首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:DETECTING ARABIC SPAM WEB PAGES USING CONTENT ANALYSIS
  • 本地全文:下载
  • 作者:RADWAN JARAMH ; TALAL SALEH ; SHERIF KHATTAB
  • 期刊名称:International Journal of Reviews in Computing
  • 印刷版ISSN:2076-3328
  • 电子版ISSN:2076-3336
  • 出版年度:2011
  • 卷号:6
  • 出版社:Little Lion Scientific Research and Developement
  • 摘要:In this paper, we propose a set of new features to enhance the classification of Arabic Web pages into spam and non-spam under different classification algorithms, namely Decision Tree, Naїve Bayes, and LogitBoost. We compare our features, which we call Arabic Content Analysis (ACA) features, to state-of-the-art Content Analysis (CA) features for spam detection in the English Web. We show that augmenting the CA features with our ACA features achieves an increase in detection accuracy of Arabic spam pages compared to CA features alone. When combined, ACA and CA features correctly identified 5,536 pages of the 5,645 Arabic spam pages that we used for testing with a false positive rate of 1.9% using the Decision Tree classifier. We also identified the top-ranked features using the Gain Ratio method.
  • 关键词:Web Spam; Web Pages; Arabic Web Spam; Detecting Arabic Spam; Arabic Corpus; Arabic Keywords; Spamdexing
国家哲学社会科学文献中心版权所有