首页    期刊浏览 2024年07月06日 星期六
登录注册

文章基本信息

  • 标题:Active Multi-Field Learning for Spam Filtering
  • 本地全文:下载
  • 作者:Liu, Wuying ; Wang, Lin ; Yi, Mianzhu
  • 期刊名称:COMPUTING AND INFORMATICS
  • 印刷版ISSN:1335-9150
  • 出版年度:2014
  • 卷号:33
  • 期号:6
  • 页码:1400-1427
  • 语种:English
  • 出版社:COMPUTING AND INFORMATICS
  • 摘要:Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm.
  • 关键词:Spam filtering; active multi-field learning; email spam; short message service spam; TREC spam track;68T50; 68Q32; 62H30; 68T30
国家哲学社会科学文献中心版权所有