首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Cost-Sensitive Spam Detection Using Parameters Optimization and Feature Selection
  • 作者:Sang Min Lee ; Dong Seong Kim ; Jong Sou Park
  • 期刊名称:Journal of Universal Computer Science
  • 印刷版ISSN:0948-6968
  • 出版年度:2011
  • 卷号:17
  • 期号:6
  • 页码:944-960
  • 出版社:Graz University of Technology and Know-Center
  • 摘要:E-mail spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients' system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning techniques have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should counteract with it. To cope with this, parameters optimization and feature selection have been used to reduce processing overheads while guaranteeing high detection rates. However, previous approaches have not taken into account feature variable importance and optimal number of features. Moreover, to the best of our knowledge, there is no approach which uses both parameters optimization and feature selection together for spam detection. In this paper, we propose a spam detection model enabling both parameters optimization and optimal feature selection; we optimize two parameters of detection models using Random Forests (RF) so as to maximize the detection rates. We provide the variable importance of each feature so that it is easy to eliminate the irrelevant features. Furthermore, we decide an optimal number of selected features using two methods; (i) only one parameters optimization during overall feature selection and (ii) parameters optimization in every feature elimination phase. Finally, we evaluate our spam detection model with cost-sensitive measures to avoid misclassification of legitimate messages, since the cost of classifying a legitimate message as a spam far outweighs the cost of classifying a spam as a legitimate message. We perform experiments on Spambase dataset and show the feasibility of our approaches.
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有