首页    期刊浏览 2025年06月25日 星期三
登录注册

文章基本信息

  • 标题:AN IMPLEMENTATION OF WEB CONTENT EXTRACTION USING MINING TECHNIQUES
  • 本地全文:下载
  • 作者:BADR HSSINA ; ABDELKARIM MERBOUHA ; HANANE EZZIKOURI
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2013
  • 卷号:58
  • 期号:3
  • 出版社:Journal of Theoretical and Applied
  • 摘要:The Web has continued to grow up since its inception in volume of information, in the complexity of its topology, as well as in its diversity of content and services. This phenomenon was transformed the web in spite of his young age to an obscure media to take useful information. Today, they are billions of HTML documents, images and other media files on the Internet. Taking into account the wide variety of the web, the extraction of interesting content has become a necessity. Web mining came as a rescue for the above problem. Web content mining is a subdivision under web mining, which is defined as �the process of extracting useful information from the text, images and other forms of content that make up the pages� by eliminating noisy information .This extraction process can employ automatic techniques and hand-crafted rules. In this paper, we propose a method for web data extraction that uses hand-crafted rules developed in Java.
  • 关键词:Web Mining; Content Extraction; Web Cleaning
国家哲学社会科学文献中心版权所有