文章基本信息

标题：Web Entities Extraction Based on Semi-Structured Semantic Database
本地全文：下载
作者：Dong, Fang ; Liu, Mengchi ; Ma, Kun 等
期刊名称：Journal of Networks
印刷版ISSN：1796-2056
出版年度：2013
卷号：8
期号：7
页码：1640-1646
DOI：10.4304/jnw.8.7.1640-1646
语种：English
出版社：Academy Publisher
摘要：Web is the biggest source of information and contains many entities and relationships between them, extracting these data from Massive Web pages and Integrating to a Semi-Structured Data with rich semantics will be more conducive to the management and use of these web data. On this premise, a comprehensive method is proposed to perform extraction the entities and relationships from the webpages. The method consists of two steps: 1) The target Web pages which contains these entities will be found based on the combination of vision information and content of keyword, meanwhile recording the relationship between father and children target Web pages; 2) Extracting the entities with analysis of DOM tree structure of the obtained Web pages and definitions of some extraction rules. At last, the extracted data is organized into a Semi-Structured Data with special relationships. Experiments on a large number of HTML pages have showed that this method can get a high correct rate and coverage.
关键词：Web Entities;Data Extraction;Semi-Structured Semantic Database