首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Tibetan-Chinese Cross Language Text Similarity Calculation Based on LDA Topic Model
  • 本地全文:下载
  • 作者:Sun Yuan ; Zhao Qian
  • 期刊名称:The Open Cybernetics & Systemics Journal
  • 电子版ISSN:1874-110X
  • 出版年度:2015
  • 卷号:9
  • 期号:1
  • 页码:2911-2919
  • DOI:10.2174/1874110X01509012911
  • 出版社:Bentham Science Publishers Ltd
  • 摘要:

    Topic model building is the basis and the most critical module of cross-language topic detection and tracking. Topic model also can be applied to cross-language text similarity calculation. It can improve the efficiency and the speed of calculation by reducing the texts’ dimensionality. In this paper, we use the LDA model in cross-language text similarity computation to obtain Tibetan-Chinese comparable corpora: (1) Extending Tibetan-Chinese dictionary by extracting Tibetan-Chinese entities from Wikipedia. (2) Using topic model to make the texts mapped to the feature space of topics. (3) Calculating the similarity of two texts in different language according to the characteristics of the news text. The method for text similarity calculation based on LDA model reduces the dimensions of text space vector, and enhances the understanding of the text’s semantics. It also improves the speed and efficiency of calculation.

国家哲学社会科学文献中心版权所有