出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:This paper discusses a new metric that has been applied to verify the quality in translationbetween sentence pairs in parallel corpora of Arabic-English. This metric combines twotechniques, one based on sentence length and the other based on compression code length.Experiments on sample test parallel Arabic-English corpora indicate the combination of thesetwo techniques improves accuracy of the identification of satisfactory and unsatisfactorysentence pairs compared to sentence length and compression code length alone. The newmethod proposed in this research is effective at filtering noise and reducing mis-translationsresulting in greatly improved quality.
关键词:Parallel Corpus; Sentence Alignment for Machine Translation; Prediction by Partial Matching;Compression