首页    期刊浏览 2025年06月22日 星期日
登录注册

文章基本信息

  • 标题:On Supervised and Unsupervised Discretisation
  • 本地全文:下载
  • 作者:G. Agre ; S. Peev
  • 期刊名称:Cybernetics and Information Technologies
  • 印刷版ISSN:1311-9702
  • 电子版ISSN:1314-4081
  • 出版年度:2002
  • 卷号:2
  • 期号:2
  • 出版社:Bulgarian Academy of Science
  • 摘要:The paper discusses the problem of supervised and unsupervised discretization of continuous attributes - an important pre-processing step for many machine learning (ML) and data mining (DM) algorithms. Two ML algorithms - Simple Bayesian Classifier (SBC) and Symbolic Nearest Mean Classifier (SNMC)) essentially using attribute discretization have been selected for empirical comparison of supervised entropy-based discretization versus unsupervised equal width and equal frequency binning discretization methods. The results of such evaluation on 13 benchmark datasets do not confirm the widespread opinion (at least for SBC) that entropy-based MDL heuristics outperforms the unsupervised methods. Based on analysis of these results a modification of the entropy-based method as well as a new supervised discretization method have been proposed. The empirical evaluation shows that both methods significantly improve the classification accuracy of both classifiers.
  • 关键词:supervised and unsupervised discretization; machine learning; data mining.
国家哲学社会科学文献中心版权所有