首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support
  • 本地全文:下载
  • 作者:Hyuntae Kim ; Jongyun Choi ; Soyoung Park
  • 期刊名称:Sustainability
  • 印刷版ISSN:2071-1050
  • 出版年度:2022
  • 卷号:14
  • 期号:5
  • 页码:2802
  • DOI:10.3390/su14052802
  • 语种:English
  • 出版社:MDPI, Open Access Journal
  • 摘要:New scientific and technological (S&T) knowledge is being introduced rapidly, and hence, analysis efforts to understand and analyze new published S&T documents are increasing daily. Automated text mining and vision recognition techniques alleviate the burden somewhat, but the various document layout formats and knowledge content granularities across the S&T field make it challenging. Therefore, this paper proposes LA-SEE (LAME and Vi-SEE), a knowledge graph construction framework that simultaneously extracts meta-information and useful image objects from S&T documents in various layout formats. We adopt Layout-aware Metadata Extraction (LAME), which can accurately extract metadata from various layout formats, and implement a transformer-based instance segmentation (i.e., Vision based Semantic Elements Extraction (Vi-SEE)) to maximize the vision-based semantic element recognition. Moreover, to constructing a scientific knowledge graph consisting of multiple S&T documents, we newly defined an extensible Semantic Elements Knowledge Graph (SEKG) structure. For now, we succeeded in extracting about 6 million semantic elements from 49,649 PDFs. In addition, to illustrate the potential power of our SEKG, we provide two promising application scenarios, such as a scientific knowledge guide across multiple S&T documents and questions and answering over scientific tables.
国家哲学社会科学文献中心版权所有