文章基本信息

标题：English/Arabic Cross Language Information Retrieval (CLIR) for Arabic OCR-Degraded Text
作者：Tarek A. Elghazaly ; Aly A. Fahmy
期刊名称：Communications of the IBIMA
电子版ISSN：1943-7765
出版年度：2009
卷号：2009
出版社：IBIMA Publishing
摘要：In this paper, a novel for Query Translation and Expansion for enabling English/Arabic CLIR for both normal and OCR-Degraded Arabic Text model has been proposed, implemented, and tested. First, an English/Arabic Word Collocations Dictionary has been established plus reproducing three English/Arabic Single Words Dictionaries. Second, a modern Arabic Corpus has been built. Third, a model for simulating the Arabic OCR errors has been proposed. Forth, a comprehensive model for Query Translation and expansion is proposed. The model translates the Query from English to Arabic detecting and translating collocations, translating single words and transliterating names. It solves the replacement ambiguity then it expands the Arabic Query to handle the expected Arabic OCR errors. The proposed model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion, it gave high degree of accuracy in handling OCR errors.
关键词：Cross Language Information Retrieval; CLIR; Arabic OCR-Degraded Text; Arabic Corpus.