期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2006
卷号:2006
出版社:ACL Anthology
摘要:This paper reports the present results of a
research on unsupervised Persian morpheme
discovery. In this paper we present
a method for discovering the morphemes
of Persian language through
automatic analysis of corpora. We utilized
a Minimum Description Length
(MDL) based algorithm with some improvements
and applied it to Persian corpus.
Our improvements include enhancing
the cost function using some heuristics,
preventing the split of high frequency
chunks, exploiting penalty for
first and last letters and distinguishing
pre-parts and post-parts. Our improved
approach has raised the precision, recall
and f-measure of discovery by respectively
%32, %17 and %23.