文章基本信息

标题：Semantic Extraction from List Web Pages
本地全文：下载
作者：Ismail Jellouli ; Mohammed El Mohajir
期刊名称：International Journal of Computer Science Issues
印刷版ISSN：1694-0784
电子版ISSN：1694-0814
出版年度：2012
卷号：9
期号：3
出版社：IJCSI Press
摘要：Extracting structured information from web pages is a problem that has many applications and that gained increased interest in recent years. We propose an approach that can achieve extraction and semantic description of data contained in a list web page. Our approach is fully automatic and is based on a \seed\ ontology that contains minimal information about the domain. It uses an instance-based classifier to characterize the attributes of the ontology. In opposition to existing methods, our approach does not make any assumption on the design of web pages ; it is totally layout independent. Experimental results obtained from different web pages of different web sites from different domains show that our approach is effective.
关键词：Web Information Extraction; list web pages; probablistic model; ontology