首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Heading-Aware Proximity Measure and Its Application to Web Search
  • 本地全文:下载
  • 作者:Tomohiro Manabe ; Keishi Tajima
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2016
  • 卷号:11
  • 页码:154-159
  • DOI:10.11185/imt.11.154
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:Proximity of query keyword occurrences is one important evidence which is useful for effective querybiased document scoring. If a query keyword occurs close to another in a document, it suggests high relevance of the document to the query. The simplest way to measure proximity between keyword occurrences is to use distance between them, i.e., difference of their positions. However, most web pages contain hierarchical structure composed of nested logical blocks with their headings, and it affects logical proximity. For example, if a keyword occurs in a block and another occurs in the heading of the block, we should not simply measure their proximity by their distance. This is because a heading describes the topic of the entire corresponding block, and term occurrences in a heading are strongly connected with any term occurrences in its associated block with less regard for the distance between them. Based on these observations, we developed a heading-aware proximity measure and applied it to three existing proximity-aware document scoring methods: MinDist, P6, and Span. We evaluated these existing methods and our modified methods on the data sets from TREC web tracks. The results indicate that our heading-aware proximity measure is better than the simple distance in all cases, and the method combining it with the Span method achieved the best performance.
国家哲学社会科学文献中心版权所有