首页    期刊浏览 2024年07月03日 星期三
登录注册

文章基本信息

  • 标题:Web Page Classification Based on Surrounding Page Model Representing Connection Type and Directory Hierarchy
  • 本地全文:下载
  • 作者:Yuxin Wang ; Keizo Oyama
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2009
  • 卷号:4
  • 期号:4
  • 页码:922-936
  • DOI:10.11185/imt.4.922
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:We propose a web page classification method that is suitable for building web page collections and show its effectiveness through experimentation. First, we describe a model that represents a surrounding page group structure that takes the link relation and directory hierarchy relation into consideration and a method for extracting features based on the model. The method is tested through classification experimentation on two data sets and using the support vector machine (SVM) as the classification algorithm, and its effectiveness is confirmed through comparison with a baseline and the results of previous studies. The contribution of each part of the surrounding pages is also analyzed. Next, we test the method's performance on overall recall-precision range and find that it is superior in the high recall range. Finally, we estimate the performance of a three-grade classifier composed with the method and the amount of manual assessment required to build a web page collection.
国家哲学社会科学文献中心版权所有