首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:Segmenting Webpage with Gomory-Hu Tree Based Clustering
  • 本地全文:下载
  • 作者:Liu, Xinyue ; Lin, Hongfei ; Tian, Ye
  • 期刊名称:Journal of Software
  • 印刷版ISSN:1796-217X
  • 出版年度:2011
  • 卷号:6
  • 期号:12
  • 页码:2421-2425
  • DOI:10.4304/jsw.6.12.2421-2425
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:We propose a novel web page segmentationalgorithm based on finding the Gomory-Hu tree in a planargraph. The algorithm firstly distills vision and structureinformation from a web page to construct a weightedundirected graph, whose vertices are the leaf nodes of theDOM tree and the edges represent the visible positionrelationship between vertices. Then it partitions the graphwith the Gomory-Hu tree based clustering algorithm.Experimental results show that, compared with VIPS andChakrabarti et al.’s graph theoretic algorithm, ouralgorithm improves upon the other two with much higherprecision and recall, and its running time is far lower thanthat of Chakrabarti et al.’s graph theoretic algorithm.
  • 关键词:Webpage segmentation;DOM tree;Gomory- Hu tree;Planar graph
国家哲学社会科学文献中心版权所有