期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2020
卷号:98
期号:18
页码:3853-3869
出版社:Journal of Theoretical and Applied
摘要:Text is one of the useful knowledge sources of a human. Each element in a text has to be analyzed to identify the piece of information and knowledge. EDU is important for NLP applications that need a smaller unit to process rather than a sentence such as text summarization, information extraction, and question answering. Therefore, EDU can be more appropriated than a sentence to extract knowledge and information from the text. This paper presents a pipeline of the process for Thai EDU segmentation from word segmentation to EDU segmentation. The shallow parser is applied to chunk a non-recursive phrase in a text to reveal partial syntactic information for EDU segmentation. And then, syntactic information is utilized to identify and reconstruct the EDU segmentation in text. From the experiment, the results show that the precision, recall, and F1 score are 0.88865, 0.91577, and 0.90200 respectively.
关键词:Word Segmentation;EDU Segmentation;Conditional Random Field;Shallow Parser;Natural Language Processing