首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:English/Arabic Cross Language Information Retrieval (CLIR) for Arabic OCR-Degraded Text
  • 本地全文:下载
  • 作者:Tarek A. Elghazaly ; Aly A. Fahmy
  • 期刊名称:Communications of the IBIMA
  • 电子版ISSN:1943-7765
  • 出版年度:2009
  • 卷号:2009
  • 页码:208-218
  • 出版社:IBIMA Publishing
  • 摘要:In this paper, a novel for Query Translation and Expansion for enabling English/Arabic CLIR for both normal and OCR-Degraded Arabic Text model has been proposed, implemented, and tested. First, an English/Arabic Word Collocations Dictionary has been established plus reproducing three English/Arabic Single Words Dictionaries. Second, a modern Arabic Corpus has been built. Third, a model for simulating the Arabic OCR errors has been proposed. Forth, a comprehensive model for Query Translation and expansion is proposed. The model translates the Query from English to Arabic detecting and translating collocations, translating single words and transliterating names. It solves the replacement ambiguity then it expands the Arabic Query to handle the expected Arabic OCR errors. The proposed model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion, it gave high degree of accuracy in handling OCR errors.
国家哲学社会科学文献中心版权所有