文章基本信息

标题：Improving the Accuracy of English-ArabicStatistical Sentence Alignment
本地全文：下载
作者：Mohammad Salameh ; Rached Zantout ; Nashat Mansour 等
期刊名称：The International Arab Journal of Information Technology
印刷版ISSN：1683-3198
出版年度：2011
卷号：8
期号：2
出版社：Zarqa Private University
摘要：Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, teach in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.
关键词：Word alignment; sentence alignment; parallel corpora; and statistical natural language processing.