期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2012
卷号:3
期号:1-1
出版社:Seventh Sense Research Group
摘要:Now a day’s managing a vast amount of documents in digital forms is very important in text mining applications. Text categorization is a task of automatically sorting a set of documents into categories from a predefined set. A major characteristic or difficulty of text categorization is high dimensionality of feature space. The reduction of dimensionality by selecting new attributes which is subset of old attributes is known as feature selection. Featureselection methods are discussed in this paper for reducing the dimensionality of the dataset by removing features that are considered irrelevant for the classification. In this paper we discuss several approaches of text categorization, feature selection methods and applications of text categorization.
关键词:Text mining; text classification; feature selection