首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:Joint Web-Feature (JFEAT): A Novel Web Page Classification Framework
  • 作者:Lim Wern Han ; Saadat M. Alhashmi
  • 期刊名称:Communications of the IBIMA
  • 电子版ISSN:1943-7765
  • 出版年度:2010
  • 卷号:2010
  • DOI:10.5171/2010.734081
  • 出版社:IBIMA Publishing
  • 摘要:With the increasing amount of web pages over the internet, it has been a major concern to obtain information on the internet accurately at a reasonable cost with decent performance. A potential solution is through the classification of web pages into meaningful categories. An effective classification of web pages is of benefit to various applications such as web mining and search engines. Unlike text documents, the nature of web pages limits the performance of successful traditional pure-text classification methods. Noises exist in the form of HTML tags, multimedia contents, dynamic contents and the network structure of web pages which requires a deeper look into effective feature selection of web pages. Often, these features are filtered out relying on the displayed texts of the web page for classification. This paper proposed a framework where web page features are taken into consideration during classification of the web page due to the potential valuable information that might be stored within each of the features. For this reason, this paper explores the potential of the universal Resource Locator (URL), web page title as well as the metadata for information to be used in classification with various categories defined by the users. The framework then explores suitable machine learning algorithms for individual classification of each web feature. The results would then be used for weighted voting to obtain the classification of that webpage. This approach showed improvements over pure-text as well as virtual-webpage classification approaches.
  • 关键词:web page classification; feature selection; machine learning
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有