出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Coreference resolution plays an important role in Information Extraction.This paper
covers the investigation of two strategies based on a mention-pair resolver using Decision Tree
classifier on structured and unstructured dataset, targeting coreference resolution in Dari language.
Strategies are (1) training separate models which is specialized in particular categories
(e.g., lexical, syntactic and semantic) and types of mentions (e.g. pronouns, proper nouns) and
(2) using a structured dataset on a machine learning library that is designed to classify numerical
values. Moreover, these modifications and comparative models describe a contribution of comprehensive
factors involved in the resolution of texts. Specifically, we developed the first Dari corpus
(’DariCoref’) based on OntoNotes and WikiCoref scheme. Both strategies are produced f-score of
state-of-the-art.