首页    期刊浏览 2025年02月22日 星期六
登录注册

文章基本信息

  • 标题:A Comparative Study to Find a Suitable Method for Text Document Clustering
  • 本地全文:下载
  • 作者:S.C.Punitha ; M.Punithavalli
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2012
  • 卷号:12
  • 期号:10
  • 页码:115-122
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:Text mining is used in various text related tasks such as information extraction, concept/entity extraction, document summarization, entity relation modeling (i.e., learning relations between named entities), categorization/classification and clustering. This paper focuses on document clustering, a field of text mining, which groups a set of documents into a list of meaningful categories. The main focus of this paper is to present a performance analysis of various techniques available for document clustering. The results of this comparative study can be used to improve existing text data mining frameworks and improve the way of knowledge discovery. This paper considers six clustering techniques for document clustering. The techniques are grouped into three groups namely Group 1 ? K-means and its variants (traditional K-means and K* Means algorithms), Group 2 - Expectation Maximization and its variants (traditional EM, Spherical Gaussian EM algorithm and Linear Partitioning and Reallocation clustering (LPR) using EM algorithms), Group 3 - Semantic-based techniques (Hybrid method and Feature-based algorithms). A total of seven algorithms are considered and were selected based on their popularity in the text mining field. Several experiments were conducted to analyze the performance of the algorithm and to select the winner in terms of cluster purity, clustering accuracy and speed of clustering.
  • 关键词:Text mining; Traditional K-Means; Traditional EM Algorithm; sGEM; HSTC model; TCFS method
国家哲学社会科学文献中心版权所有