文章基本信息

标题：Automated Extraction of Statistical Expressions from Text for Information Compilation
本地全文：下载
作者：Tatsunori MORI ; Atsushi FUJIOKA ; Ichiro MURATA 等
期刊名称：人工知能学会論文誌
印刷版ISSN：1346-0714
电子版ISSN：1346-8030
出版年度：2008
卷号：23
期号：5
页码：310-318
DOI：10.1527/tjsai.23.310
出版社：The Japanese Society for Artificial Intelligence
摘要：In order to summarize trend information in document and visualize it, we have to have a method to automatically extract statistical information from document. In this paper, we investigate automated extraction of statistical information, especially, expressions of name of statistical information. First, we classify those expressions into three categories, namely, the action type, the attribute type, and the definition type. Second, the internal structures of them are examined. According to the internal structures, we defined an XML tag set to annotate each part of names of statistical information. As a feasibility study of automated extraction of them, we conducted an experiment in which parts of names of statistics are extracted by using a standard chunking algorithm. The experimental result shows that the parts of names of statistics defined by the tag set can be extracted with good accuracy in the case that we can prepare a training corpus of the domain similar to target documents. On the other hand, the extraction accuracy will be degraded when we cannot prepare such a training corpus.
关键词：MuST(Multimodal Summarization for Trend Information) ; statistical expressions ; information extraction