首页    期刊浏览 2025年02月21日 星期五
登录注册

文章基本信息

  • 标题:Comparison of Keyword Based Clustering of Web Documents by Using Openstack 4j and by Traditional Method
  • 本地全文:下载
  • 作者:Shiza Anand ; Prof. Pradeep Pant ; Dr. Mukesh Rawat
  • 期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
  • 印刷版ISSN:2347-6710
  • 电子版ISSN:2319-8753
  • 出版年度:2016
  • 卷号:5
  • 期号:11
  • 页码:20105
  • DOI:10.15680/IJIRSET.2016.0511113
  • 出版社:S&S Publications
  • 摘要:As the number of hypertext documents are increasing continuously day by day on world wideweb. Therefore, clustering methods will be required to bind documents into the clusters (repositories) accordingto the similarity lying between the documents. Various clustering methods exist such as: Hierarchical Based,K-means, Fuzzy Logic Based, Centroid Based etc. These keyword based clustering methods takes much moreamount of time for creating containers and putting documents in their respective containers. These traditionalmethods use File Handling techniques of different programming languages for creating repositories andtransferring web documents into these containers. In contrast, openstack4j SDK is a new technique forcreating containers and shifting web documents into these containers according to the similarity in much moreless amount of time as compared to the traditional methods. Another benefit of this technique is that thisSDK understands and reads all types of files such as jpg, html, pdf, doc etc. This paper compares the timerequired for clustering of documents by using openstack4j and by traditional methods and suggests varioussearch engines to adopt this technique for clustering so that they give result to the user queries in lessamount of time.
  • 关键词:clustering; openstack4j; K-Means; centroid based; document-matching.
国家哲学社会科学文献中心版权所有