首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Mathematical Expression Extraction in Text Fields of Documents Based on HMM
  • 本地全文:下载
  • 作者:Xuedong Tian ; Ruihan Bai ; Fang Yang
  • 期刊名称:Journal of Computer and Communications
  • 印刷版ISSN:2327-5219
  • 电子版ISSN:2327-5227
  • 出版年度:2017
  • 卷号:05
  • 期号:14
  • 页码:1-13
  • DOI:10.4236/jcc.2017.514001
  • 语种:English
  • 出版社:Scientific Research Publishing
  • 摘要:Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate.
  • 关键词:Mathematical Expression Extraction;Hidden Markov Model;Text Fields;Documents;Symbol Combination Features
国家哲学社会科学文献中心版权所有