首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:Web Entities Extraction Based on Semi-Structured Semantic Database
  • 本地全文:下载
  • 作者:Dong, Fang ; Liu, Mengchi ; Ma, Kun
  • 期刊名称:Journal of Networks
  • 印刷版ISSN:1796-2056
  • 出版年度:2013
  • 卷号:8
  • 期号:7
  • 页码:1640-1646
  • DOI:10.4304/jnw.8.7.1640-1646
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:Web is the biggest source of information and contains many entities and relationships between them, extracting these data from Massive Web pages and Integrating to a Semi-Structured Data with rich semantics will be more conducive to the management and use of these web data. On this premise, a comprehensive method is proposed to perform extraction the entities and relationships from the webpages. The method consists of two steps: 1) The target Web pages which contains these entities will be found based on the combination of vision information and content of keyword, meanwhile recording the relationship between father and children target Web pages; 2) Extracting the entities with analysis of DOM tree structure of the obtained Web pages and definitions of some extraction rules. At last, the extracted data is organized into a Semi-Structured Data with special relationships. Experiments on a large number of HTML pages have showed that this method can get a high correct rate and coverage.
  • 关键词:Web Entities;Data Extraction;Semi-Structured Semantic Database
国家哲学社会科学文献中心版权所有