首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Web Page Classification using Anchor-related Text Extracted by a DOM-based Method
  • 本地全文:下载
  • 作者:Masanori Otsubo ; Bui Quang Hung ; Yoshinori Hijikata
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2010
  • 卷号:25
  • 期号:1
  • 页码:37-49
  • DOI:10.1527/tjsai.25.37
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:Directory services are popular among people who search their favorite information on the Web. Those services provide hierarchical categories for finding a user's favorite page. Pages on the Web are categorized into one of the categories by hand. Many existing studies classify a web page by using text in the page. Recently, some studies use text not only from a target page which they want to categorize, but also from the original pages which link to the target page. We have to narrow down the text part in the original pages, because they include many text parts that are not related to the target page. However these studies always use a unique extraction method for all pages. Although web pages usually differ so much in their formats, they do not change their extraction methods. We have already developed an extraction method of anchor-related text. We use text parts extracted by our method for classifying web pages. The results of the experiments showed that our extraction method improves the classification accuracy.
  • 关键词:Web page classification ; anchor-related text ; DOM ; SVM
国家哲学社会科学文献中心版权所有