首页    期刊浏览 2025年05月31日 星期六
登录注册

文章基本信息

  • 标题:Semantic Extraction from List Web Pages
  • 本地全文:下载
  • 作者:Ismail Jellouli ; Mohammed El Mohajir
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2012
  • 卷号:9
  • 期号:3
  • 出版社:IJCSI Press
  • 摘要:Extracting structured information from web pages is a problem that has many applications and that gained increased interest in recent years. We propose an approach that can achieve extraction and semantic description of data contained in a list web page. Our approach is fully automatic and is based on a \seed\ ontology that contains minimal information about the domain. It uses an instance-based classifier to characterize the attributes of the ontology. In opposition to existing methods, our approach does not make any assumption on the design of web pages ; it is totally layout independent. Experimental results obtained from different web pages of different web sites from different domains show that our approach is effective.
  • 关键词:Web Information Extraction; list web pages; probablistic model; ontology
国家哲学社会科学文献中心版权所有