期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2015
卷号:3
期号:5
DOI:10.15680/ijircce.2015.0305174
出版社:S&S Publications
摘要:Text classification approach gaining more importance because of the accessibility of large number ofelectronic documents from a variety of resource. Text categorization (Also called Text Categorization) is the task ofassigning predefined categories to documents. It is the method of finding interesting regularities in large textual, whereinteresting means non trivial, hidden, previously unknown and potentially useful. The goal of text mining is to enableusers to extract information from textual resource and deals with operation such as retrieval, classification, clustering,data mining, natural language preprocessing and machine learning techniques together to classify different pattern. Amajor characteristic or difficulty of text categorization is high dimensionality of feature space. The reduction ofdimensionality by selecting new attributes which is subset of old attributes is known as feature selection. Featureselectionmethods are discussed in this paper for reducing the dimensionality of the dataset by removing features thatare considered irrelevant for the classification. This paper surveys of text classification, several approaches of textclassification, feature selection methods and applications of text classification.
关键词:Information Retrieval; Text Classification; Text Mining; Feature Selection