期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2015
卷号:10
期号:12
页码:341-354
DOI:10.14257/ijmue.2015.10.12.33
出版社:SERSC
摘要:Query subtopic mining aims to find aspects to represent people's potential intents for a query. Clustering query reformulations is the most common approach for subtopic mining these days. However, there are some challenges that the existing approaches have to face in finding both relevant and diverse subtopics, such as term mismatch and data sparseness. In this paper, a novel semantic representations for query subtopics is introduced, which including phrase embedding representation and query category distributional representation, to solve those problems mentioned above. Furthermore, we also combine multiple semantic representations into vector space model and compute a joint similarity for clustering query reformulations. To evaluate our theory an experiment is conducted on a public dataset offered by NTCIR subtopic mining project, the experimental results show that phrase embedding representation is the most effective representation while combining multiple semantics benefits short text clustering and improves the performance of query subtopic mining.
关键词:subtopic mining; query understanding; semantic representation; ; information retrieval