出版社:The Japanese Society for Artificial Intelligence
摘要:In order to summarize trend information in document and visualize it, we have to have a method to automatically extract statistical information from document. In this paper, we investigate automated extraction of statistical information, especially, expressions of name of statistical information. First, we classify those expressions into three categories, namely, the action type, the attribute type, and the definition type. Second, the internal structures of them are examined. According to the internal structures, we defined an XML tag set to annotate each part of names of statistical information. As a feasibility study of automated extraction of them, we conducted an experiment in which parts of names of statistics are extracted by using a standard chunking algorithm. The experimental result shows that the parts of names of statistics defined by the tag set can be extracted with good accuracy in the case that we can prepare a training corpus of the domain similar to target documents. On the other hand, the extraction accuracy will be degraded when we cannot prepare such a training corpus.
关键词:MuST(Multimodal Summarization for Trend Information) ; statistical expressions ; information extraction