首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:A Framework for Hierarchical Clustering Based Indexing in Search Engines
  • 本地全文:下载
  • 作者:Parul Gupta ; A.K. Sharma
  • 期刊名称:BVICAM's International Journal of Information Technology
  • 印刷版ISSN:0973-5658
  • 出版年度:2011
  • 卷号:3
  • 期号:2
  • 出版社:Bharati Vidyapeeth's Institute of Computer Applications and Management
  • 摘要:Granting efficient and fast accesses to the index is a key issue for performances of Web Search Engines. In order to enhance memory utilization and favor fast query resolution, WSEs use Inverted File (IF) indexes that consist of an array of the posting lists where each posting list is associated with a term and contains the term as well as the identifiers of the documents containing the term. Since the document identifiers are stored in sorted order, they can be stored as the difference between the successive documents so as to reduce the size of the index. This paper describes a clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. Thus the average value of the differences between the successive documents will be minimized and hence storage space would be saved. The paper further presents the extension of this clustering algorithm to be applied for the hierarchical clustering in which similar clusters are clubbed to form a mega cluster and similar mega clusters are then combined to form super cluster. Thus the paper describes the different levels of clustering which optimizes the search process by directing the search to a specific path from higher levels of clustering to the lower levels i.e. from super clusters to mega clusters, then to clusters and finally to the individual documents so that the user gets the best possible matching results in minimum possible time.?
  • 关键词:Inverted files; Index compression; Document Identifiers Assignment; Hierarchical Clustering
国家哲学社会科学文献中心版权所有