首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Improved Text Clustering with Neighbors
  • 本地全文:下载
  • 作者:Sri Lalitha Y ; Govardhan A
  • 期刊名称:International Journal of Data Mining & Knowledge Management Process
  • 印刷版ISSN:2231-007X
  • 电子版ISSN:2230-9608
  • 出版年度:2015
  • 卷号:5
  • 期号:2
  • 页码:23
  • DOI:10.5121/ijdkp.2015.5203
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:With ever increasing number of documents on web and other repositories, the task of organizing andcategorizing these documents to the diverse need of the user by manual means is a complicated job, hencea machine learning technique named clustering is very useful. Text documents are clustered by pair wisesimilarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering resultsare seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hencefor this problem, to find document similarity we apply link and neighbor introduced in ROCK. Linkspecifies number of shared neighbors of a pair of documents. Significantly similar documents are called asneighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seeddocuments in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means tofind the number of partitions possible in the dataset. Our experiments on real-time datasets showed asignificant improvement in terms of accuracy with minimum time.
  • 关键词:Similarity Measures; Coherent Clustering; Bisecting kmeans; Neighbors.
国家哲学社会科学文献中心版权所有