首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:WORD SENSE DISAMBIGUATION BASED ON YAROWSKY APPROACH IN ENGLISH QURANIC INFORMATION RETRIEVAL SYSTEM
  • 本地全文:下载
  • 作者:OMAR JAMAL MOHAMED ; SABRINA TIUN
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2015
  • 卷号:82
  • 期号:1
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Word sense disambiguation (WSD) is the process of eliminating ambiguity that lies on some words by identifying the exact sense of a given word. In the natural languages, many words could yield multiple meaning based on the context. WSD aims to identify the most accurate sense for such cases. In particular, when translating one language to another, there would be a possibility to tackle an ambiguity among the translated words. Quran, which is the holy book for approximately billion Muslims, has been originally written in Arabic language. Apparently, when translating Quran to English language, several semantic issues have been caught by researchers. Such issues lies on the ambiguity of words such as �ليلا ونهارا� and �يوم الحساب�, which are translated into �day and night� and �judgment day�. Such ambiguity has to be eliminated by determining the exact sense of the translated word. Several research efforts have been intended to disambiguate the sense of translated Quran. However, the process of identifying an appropriate method for WSD in translated Quran is still challenging task. This is due to the complexity of Arabic morphology. Hence, this study aims to propose an adaption for Yarowsky algorithm as a WSD method for Quranic translation. In addition, this study aims to develop an IR prototype based on the proposed adaption method in order to evaluate such method based on the retrieval effectiveness. In fact, the dataset that has been used in this study is a collection of Quranic content. Several pre-processing tasks have been performed in order to eliminate the irrelevant data such as stop-words, numbers and punctuation. Sequentially, two lists of senses for each ambiguity word will be created with their context. This would be performed in order to let the Yarowsky algorithm train on such example set. After that, a decision list will be constructed by the Yarowsky algorithm, which depicts the labelling sense of each word. The evaluation method that has been used in this study is the three IR evaluation metrics; Precision, Recall and F-measure. The experimental results have shown a 77% of f-measure. Such result seems to be weak in compared to the results of Yarowsky that have been applied in open domain. This is due to the lack of examples that could be extracted from Quran for both senses. Meanwhile, such result seems to be competitive in WSD of Quranic translation. Finally, it can be concluded that WSD has a significant impact on the IR system by improving the retrieval effectiveness.
  • 关键词:Word Sense Disambiguation; Yarowsky Algorithm; Information Retrieval; Natural Language Processing; Quran
国家哲学社会科学文献中心版权所有