首页    期刊浏览 2025年02月22日 星期六
登录注册

文章基本信息

  • 标题:A Word Selection Method for Producing Interpretable Distributional Semantic Word Vectors
  • 本地全文:下载
  • 作者:Atefe Pakzad ; Morteza Analoui
  • 期刊名称:Journal of Artificial Intelligence Research
  • 印刷版ISSN:1076-9757
  • 出版年度:2021
  • 卷号:72
  • 页码:1-25
  • DOI:10.1613/jair.1.13353
  • 语种:English
  • 出版社:American Association of Artificial
  • 摘要:Distributional semantic models represent the meaning of words as vectors. We introduce a selection method to learn a vector space that each of its dimensions is a natural word. The selection method starts from the most frequent words and selects a subset which has the best performance. The method produces a vector space that each of its dimensions is a word. This is the main advantage of the method compared to fusion methods such as NMF and neural embedding models. We apply the method to the ukWaC corpus and train a vector space of N=1500 basis words. We report tests results on word similarity tasks for MEN RG-65 SimLex-999 and WordSim353 gold datasets. Also results show that reducing the number of basis vectors from 5000 to 1500 reduces accuracy by about 1.5-2%. So we achieve good interpretability without a large penalty. Interpretability evaluation results indicate that the word vectors obtained by the proposed method using N=1500 are more interpretable than word embedding models and the baseline method. We report the top 15 words of 1500 selected basis words in this paper.
  • 关键词:distributional semantic vectors;basis vectors;basis words;interpretable;word selection method;projection function
国家哲学社会科学文献中心版权所有