文章基本信息

标题：A Hybrid Approach for Urdu Sentence Boundary Disambiguation
本地全文：下载
作者：Zobia Rehman ; Waqas Anwar
期刊名称：The International Arab Journal of Information Technology
印刷版ISSN：1683-3198
出版年度：2012
卷号：9
期号：3
出版社：Zarqa Private University
摘要：Sentence boundary identification is a preliminary step for preparing a text document for Natural Language Processing tasks, e.g., machine translation, POS tagging, text summarization and etc. We present a hybrid approach for Urdu sentence boundary disambiguation comprising of unigram statistical model and rule based algorithm. After implementing this approach, we obtained 99.48% precision, 86.35% recall and 92.45% F1-Measure while keeping training and testing data different from each other, and with same training and testing data, we obtained 99.36% precision, 96.45% recall and 97.89% F1-Measure
关键词：Sentence boundary disambiguation; and unigram model