首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Text Document Classification: An Approach Based on Indexing
  • 本地全文:下载
  • 作者:B S Harish ; S Manjunath ; D S Guru
  • 期刊名称:International Journal of Data Mining & Knowledge Management Process
  • 印刷版ISSN:2231-007X
  • 电子版ISSN:2230-9608
  • 出版年度:2012
  • 卷号:2
  • 期号:1
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of text documents. In addition, in order to avoid sequential matching during classification, we propose to index the terms in B- tree, an efficient index scheme. Each term in B-tree is associated with a list of class labels of those documents which contain the term. Further the corresponding classification technique has been proposed. To corroborate the efficacy of the proposed representation and status matrix based classification, we have conducted extensive experiments on various datasets.
  • 关键词:Text documents; Representation; Term sequence; Status Matrix; B-Tree; Classification
国家哲学社会科学文献中心版权所有