首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Term-Centric Active Learning for Naïve Bayes Document Classification
  • 本地全文:下载
  • 作者:Sunghwan Sohn ; Donald C. Comeau ; Won Kim
  • 期刊名称:The Open Information Systems Journal
  • 电子版ISSN:1874-1339
  • 出版年度:2009
  • 卷号:3
  • 页码:54-67
  • DOI:10.2174/1874133900903010054
  • 出版社:Bentham open
  • 摘要:
    In real world document classification, a subset of documents often needs to be chosen for labeling as a training set for a machine learner. Random sampling is generally not the most effective approach for choosing documents to be labeled. Active learning selects useful examples for labeling to improve the efficiency of learning. We consider two factors in order to measure the usefulness of a document for labeling. Such a document should be 1) largely unknown to the current learner 2) influential by being close to many other documents. These factors are stated from a document-centric viewpoint. A similar analysis can be made from a term-centric viewpoint. It is the purpose of this paper to present this term-centric approach to active learning using a naïve Bayes classifier. We study both document-centric and our new term-centric active learning methods. We find good performance of the term-centric methods on numerous data sets with different characteristics. In addition, a genetic algorithm is employed to compare our results with estimated optimal performance at fixed training set size and our results are between 84% and 99% of the estimated optimum.


国家哲学社会科学文献中心版权所有