文章基本信息

标题：Probabilistic Reference to Suspect or Victim in Nationality Extraction from Unstructured Crime News Documents
本地全文：下载
作者：Mohammad Darwich ; Masnizah Mohd
期刊名称：Information and Knowledge Management
印刷版ISSN：2224-5758
电子版ISSN：2224-896X
出版年度：2015
卷号：5
期号：9
页码：64-75
语种：English
出版社：International Institute for Science, Technology Education
摘要：There is valuable information in unstructured crime news documents which crime analysts must manually search for. To solve this issue, several information extraction models have been implemented, all of which are capable of being enhanced. This gap has created the motivation to propose an enhanced information extraction model that uses named entity recognition to extract the nationality from crime news documents and coreference resolution to associate the nationality to either the suspect or the victim. After the proposed model extracts the nationality, it references it to the suspect or victim by looking up all of the victim related keywords and the suspect related keywords within the text, and their corresponding distances from the position of the nationality keyword. Based on their total distances, a probability score algorithm decides whether the nationality is more likely to belong to either the victim or the suspect. Two experiments were conducted to evaluate the nationality extractor component and the reference identification component used by the model. The former experiment had achieved 90%, 94%, and 91% for precision, recall, and F-measure values respectively. The latter experiment had achieved 65%, 68%, and 66% for precision, recall, and F-measure respectively. The model had achieved promising results after evaluation.
关键词：and the suspect related keywords within the text; and their corresponding distances from the position of the nationality keyword. Based on their total distances; a probability score algorithm decides whether the nationality is more likely to belong to either the victim or the suspect. Two experiments were conducted to evaluate the nationality extractor component and the reference identification component used by the model. The former experiment had achieved 90%; 94%; and 91% for precision; recall; and F-measure values respectively. The latter experiment had achieved 65%; 68%; and 66% for precision; recall; and F-measure respectively. The model had achieved promising results after evaluation. Keywords information extraction; named entity recognition; coreference resolution; crime domain