首页    期刊浏览 2025年07月13日 星期日
登录注册

文章基本信息

  • 标题:Structure and Semantics of Data-IntensiveWeb Pages: An Experimental Study on their Relationships
  • 作者:Lorenzo Blanco ; Valter Crescenzi ; Paolo Merialdo
  • 期刊名称:Journal of Universal Computer Science
  • 印刷版ISSN:0948-6968
  • 出版年度:2008
  • 卷号:14
  • 期号:11
  • 页码:1877-1892
  • 出版社:Graz University of Technology and Know-Center
  • 摘要:In data-intensive web sites pages are generated by scripts that embed data from a backend database into HTML templates. There is usually a relationship between the semantics of the data in a page and its corresponding template. For example, in a web site about sports events, it is likely that pages with data about athletes are associated with a template that differs from the template used to generate pages about coaches or referees. This article presents a method to classify web pages according to the associated template. Given a web page, the goal of our method is to accurately find the pages that are about the same topic. Our method leverages on a simple, yet effective model to abstract some structural features of a web page. We present the results of an extensive experimental analysis that show the performance of our methods in terms of both recall and precision regarding a large number of real-world web pages.
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有