首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Feature Extraction for Document Classification
  • 本地全文:下载
  • 作者:S.Vidhya ; D.Asir Antony Gnana Singh ; E.Jebamalar Leavline
  • 期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
  • 印刷版ISSN:2347-6710
  • 电子版ISSN:2319-8753
  • 出版年度:2015
  • 期号:MULTICON
  • 页码:50
  • 出版社:S&S Publications
  • 摘要:Document classification is a significant and well studied area of pattern recognition, with a variety ofmodern applications. The purpose of document classification is to allocate the contents of a text or document for one ormore categories. It is employed in document association and management, information retrieval, and certain machinelearning algorithms. Feature extraction acquires an important subset of features from a dataset for improving thedocument classification task. Correctly identifying the related features in a text is of vital importance for the task ofdocument classification. The document categorization problem is more challengeable when the data are in highdimensional.In text mining, feature extraction and document classification are important techniques. The main aim offeature extraction is to reduce the dimensionality and eliminate irrelevant features so that efficiency and performance ofthe classification algorithms is improved. In this paper a term frequency (TF) with stemmer-based feature extractionalgorithm is proposed and the performance of the algorithm is tested using various classifiers and it is observed that theproposed method outperforms other methods.
  • 关键词:Document classification; Text mining; Feature extraction; High-dimensionality
国家哲学社会科学文献中心版权所有