首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Metadata Extraction Approach of PDF Documents Based on Measurement Fusion
  • 本地全文:下载
  • 作者:Zhao, Junmin ; Liu, Huazhong
  • 期刊名称:Journal of Multimedia
  • 印刷版ISSN:1796-2048
  • 出版年度:2013
  • 卷号:8
  • 期号:6
  • 页码:732-738
  • DOI:10.4304/jmm.8.6.732-738
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:To deal with the problems of low precision rate and weak adaptability in the existing metadata extraction methods, a novel metadata extraction approach is proposed based on measurement fusion rule in this paper. First, the features of the document header are extracted, the three statistical learning methods such as HMM, SVM and CRF are respectively employed to train the labeled data set, and corresponding metadata extraction models are constructed. Then, the results from three extraction models are fused by the sum rule so as to achieve the accurate metadata extraction of documents. Finally, we dynamically update the three extraction models to guarantee the effectiveness of the ensemble models by the time period statistics-based method. Experiments on different datasets are conducted and the comparative results of these extraction methods are presented; Experimental results show that the proposed approach not only improves the precision of metadata extraction, but also enhances the adaptability
  • 关键词:Metadata Extraction; Statistical Learning; Measurement Fusion; Posterior Probability; Sum Rule
国家哲学社会科学文献中心版权所有