首页    期刊浏览 2024年09月18日 星期三
登录注册

文章基本信息

  • 标题:Improving the Accuracy of English-ArabicStatistical Sentence Alignment
  • 本地全文:下载
  • 作者:Mohammad Salameh ; Rached Zantout ; Nashat Mansour
  • 期刊名称:The International Arab Journal of Information Technology
  • 印刷版ISSN:1683-3198
  • 出版年度:2011
  • 卷号:8
  • 期号:2
  • 出版社:Zarqa Private University
  • 摘要:Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, teach in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.
  • 关键词:Word alignment; sentence alignment; parallel corpora; and statistical natural language processing.
国家哲学社会科学文献中心版权所有