文章基本信息

标题：AN IMPLEMENTATION OF WEB CONTENT EXTRACTION USING MINING TECHNIQUES
本地全文：下载
作者：BADR HSSINA ; ABDELKARIM MERBOUHA ; HANANE EZZIKOURI 等
期刊名称：Journal of Theoretical and Applied Information Technology
印刷版ISSN：1992-8645
电子版ISSN：1817-3195
出版年度：2013
卷号：58
期号：3
出版社：Journal of Theoretical and Applied
摘要：The Web has continued to grow up since its inception in volume of information, in the complexity of its topology, as well as in its diversity of content and services. This phenomenon was transformed the web in spite of his young age to an obscure media to take useful information. Today, they are billions of HTML documents, images and other media files on the Internet. Taking into account the wide variety of the web, the extraction of interesting content has become a necessity. Web mining came as a rescue for the above problem. Web content mining is a subdivision under web mining, which is defined as �the process of extracting useful information from the text, images and other forms of content that make up the pages� by eliminating noisy information .This extraction process can employ automatic techniques and hand-crafted rules. In this paper, we propose a method for web data extraction that uses hand-crafted rules developed in Java.
关键词：Web Mining; Content Extraction; Web Cleaning