期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN:2347-6710
电子版ISSN:2319-8753
出版年度:2015
期号:MULTICON
页码:50
出版社:S&S Publications
摘要:Document classification is a significant and well studied area of pattern recognition, with a variety ofmodern applications. The purpose of document classification is to allocate the contents of a text or document for one ormore categories. It is employed in document association and management, information retrieval, and certain machinelearning algorithms. Feature extraction acquires an important subset of features from a dataset for improving thedocument classification task. Correctly identifying the related features in a text is of vital importance for the task ofdocument classification. The document categorization problem is more challengeable when the data are in highdimensional.In text mining, feature extraction and document classification are important techniques. The main aim offeature extraction is to reduce the dimensionality and eliminate irrelevant features so that efficiency and performance ofthe classification algorithms is improved. In this paper a term frequency (TF) with stemmer-based feature extractionalgorithm is proposed and the performance of the algorithm is tested using various classifiers and it is observed that theproposed method outperforms other methods.
关键词:Document classification; Text mining; Feature extraction; High-dimensionality