文章基本信息

标题：Wikipedia-Based Semantic Interpreter Using Approximate Top-k Processing and Its Application
本地全文：下载
作者：Jong Wook Kim (Teradata Corporation ; USA) Ashwin Kashyap (Technicolor ; USA) Sandilya Bhamidipati (Technicolor, USA) 等
期刊名称：Journal of Universal Computer Science
印刷版ISSN：0948-6968
出版年度：2012
卷号：18
期号：5
出版社：Graz University of Technology and Know-Center
摘要：Proper representation of the meaning of texts is crucial for enhancing many data mining and information retrieval tasks, including clustering, computing semantic relatedness between texts, and searching. Representing of texts in the concept-space derived from Wikipedia has received growing attention recently. This concept-based representation is capable of extracting semantic relatedness between texts that cannot be deduced with the bag of words model. A key obstacle, however, for using Wikipedia as a semantic interpreter is that the sheer size of the concepts derived from Wikipedia makes it hard to efficiently map texts into concept-space. In this paper, we develop an efficient and effective algorithm which is able to represent the meaning of a text by using the concepts that best match it. In particular, our approach first computes the approximate top-k Wikipedia concepts that are most relevant to the given text. We then leverage these concepts for representing the meaning of the given text. The experimental results show that the proposed technique provides significant gains in execution time without causing significant reduction in precision. We then explore the effectiveness of the proposed algorithm on a real world problem. In particular, we show that this novel scheme could be leveraged to boost the effectiveness in finding topic boundaries in a news video.
关键词：Wikipedia, concept, semantic interpretation