首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:A Comparative Study on Effective Context Selection for Distributional Similarity
  • 本地全文:下载
  • 作者:Masato Hagiwara ; Yasuhiro Ogawa ; Katsuhiko Toyama
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2008
  • 卷号:3
  • 期号:4
  • 页码:907-938
  • DOI:10.11185/imt.3.907
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:Distributional similarity is a widely adopted concept to capture the semantic relatedness of words based on their context in various NLP tasks. While accurate similarity calculation requires a huge number of context types and co-occurrences, the contribution to the similarity calcualtion depends on individual context types, and some of them even act as noise. To select well-performing context and alleviate the high computational cost, we propose and investigate the effectiveness of three context selection schemes: category-based, type-based, and co-occurrence based selection. Categorybased selection is a conventional, simplest selection method which limits the context types based on the syntactic category. Finer-grained, type-based selection assigns importance scores to each context type, which we make possible by proposing a novel formalization of distibutional similarity as a classification problem, and applying feature selection techniques. The finest-grained, co-occurrence based selection assigns importance scores to each co-occurrence of words and context types. We evaluate the effectiveness and the trade-off between co-occurrence data size and synonym acquisition performance. Our experiments show that, on the whole, the finest-grained, co-occurrence based selection achieves better performane, although some of the simple category-based selection show comparable performance/cost trade-off.
  • 关键词:Feature Selection;Contextual Information;Distributional Similarity;Synonym Acquistion
国家哲学社会科学文献中心版权所有