出版社:University of Malaya * Faculty of Computer Science and Information Technology
摘要:The nature of the Quran and its translations as classic Arabic and English texts reduces the accuracy of ordinary natural language processing tools such as pronominal anaphora resolution systems. Pronominal anaphora resolution simply involves finding an antecedent for anaphoric pronouns as the referring expressions of discourse. The performance of a pronominal anaphora resolution system is vitally related to the efficiency of preprocessing tools that analyze and prepare the input data for feeding the resolution algorithm. This paper proposes a novel preprocessing approach for pronoun extraction and pronoun mapping in the pronominal anaphora resolution system of English translations of the Quran, which facilitates the anaphora resolution, specifically for the English pronouns without an explicit antecedent that contributes close to 50% of the anaphoric relations in the Quran. This approach uses the morphologic, statistic and anaphoric knowledge that is extracted from the Arabic corpus of the Quran. For evaluating the arrangement, 1% of an English translation was annotated with labeling for all anaphoric and nonanaphoric English pronouns. These pronouns were aligned to the equivalent Arabic pronouns and linked to the concepts in the Arabic text. Through statistical results, it was shown that our rulebased preprocessing tools perform well. The precision, recall, and accuracy of pronoun extraction stage are 96.38%, 100%, and 99.5%, respectively. The result of mapping algorithm is promising whereby we score 85.51% in precision, 96.32% in recall, and 82.81% in accuracy.
关键词:Pronominal anaphora resolution; Anaphora resolution preprocessing; Pronoun resolution; Word alignment; Quran English translation; Rulebased approach; Natural language processing