出版社:Information and Media Technologies Editorial Board
摘要:In order to improve the readability, we often segment a mail text into smaller paragraphs than necessary. However, this oversegmentation is a problem of mail text processing. It would negatively affect discourse analysis, information extraction, information retrieval, and so on. To solve this problem, we propose methods of estimating the connectivity between paragraphs in a mail. In this paper, we compare paragraph connectivity estimation based on machine learning methods (SVM and ME) with a rule-based method and show that the machine learning methods outperform the rule-based method.