文章基本信息

标题：Text Document Clustering Using Semantic Neighbors
本地全文：下载
作者：Malihe Danesh ; Hossein Shirgahi
期刊名称：Journal of Software Engineering
印刷版ISSN：1819-4311
电子版ISSN：2152-0941
出版年度：2011
卷号：5
期号：4
页码：136-144
DOI：10.3923/jse.2011.136.144
出版社：Academic Journals Inc., USA
摘要：Data clustering is a powerful technique for discovering knowledge from textual documents. In this field, K-means family algorithms have many applications because of simplicity and high speed in clustering of large scale data. In these algorithms, the criterion of cosine similarity only measures the pairwise similarity of documents that it doesn't have fine operation whenever the clusters are not properly separated. On the contrary, the concepts of Neighbors and Link with the spot of general information in calculating of closeness rate of two documents, in addition to pairwise similarity between them, have better operation. In this model, semantic relations between words have been ignored and only documents with the same terms have been clustered together. This study uses WordNet Ontology for making new model of documents representation that semantic relations between words for reweighing words frequency in documents vector space model, have been used and then Neighbors and Link concepts applied to this model. Results of using the proposed method (Semantic Neighbors) on real-world text data show better operation than previous methods and more efficient in text document clustering.