首页    期刊浏览 2024年09月18日 星期三
登录注册

文章基本信息

  • 标题:An Customized Vector Space Model Implementation in Document Clustering to Enhance the Performance
  • 本地全文:下载
  • 作者:M. Praveen ; Dora Babu Sudarsa
  • 期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
  • 印刷版ISSN:2277-6451
  • 电子版ISSN:2277-128X
  • 出版年度:2013
  • 卷号:3
  • 期号:5
  • 出版社:S.S. Mishra
  • 摘要:Document clustering is the task of grouping a set of documents into clusters so that the documents in the same cluster are similar to each other than to those in other clusters. One of the applications of document clustering is in web search engine retrieval system to help the users find relevant information quicker, and allow them to focus their search in the appropriate direction. Kmeans is a commonly used algorithm for document clustering, but it has some disadvantages. The main limitations of K-means are: 1) The number of clusters K has to be given as input and 2) Based on the initializations it converges to different local minima. 3) It is slow and cannot be used for large number of data novel algorithm to eliminate all these basic drawbacks of K-means.
  • 关键词:Document clustering; K-means; Cosine similarity; Threshold
国家哲学社会科学文献中心版权所有