文章基本信息

标题：Survey on Feature Selection in Document Clustering
本地全文：下载
作者：MS. K.Mugunthadevi ; MRS. S.C. Punitha ; Dr..M. Punithavalli 等
期刊名称：International Journal on Computer Science and Engineering
印刷版ISSN：2229-5631
电子版ISSN：0975-3397
出版年度：2011
卷号：3
期号：3
页码：1240-1244
出版社：Engg Journals Publications
摘要：Text mining is to research technologies to discover useful knowledge from enormous collections of documents, and to develop a system to provide knowledge and to support in decision making. Basically cluster means a group of similar data, document clustering means segregating the data into different groups of similar data. Clustering is a fundamental data analysis technique used for various applications such as biology, psychology, control and signal processing, information theory and mining technologies. Text mining is not a stand-alone task that human analysts typically engage in. The goal is to transform text composed of everyday language into a structured, database format. In this way, heterogeneous documents are summarized and presented in a uniform manner. Among others, the challenging problems of text clustering are big volume, high dimensionality and complex semantics.
关键词：text mining; feature selection; information retrieval; ontology; document clustering