首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:Fundamental Vocabulary Selection Based on Word Familiarity
  • 本地全文:下载
  • 作者:Hiroshi Sato ; Kaname Kasahara ; Tomoko Kanasugi
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2004
  • 卷号:19
  • 期号:6
  • 页码:502-510
  • DOI:10.1527/tjsai.19.502
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:This paper proposes a new method for selecting fundamental vocabulary. We are presently constructing the Fundamental Vocabulary Knowledge-base of Japanese that contains integrated information on syntax, semantics and pragmatics, for the purposes of advanced natural language processing. This database mainly consists of a lexicon and a treebank: Lexeed (a Japanese Semantic Lexicon) and the Hinoki Treebank. Fundamental vocabulary selection is the first step in the construction of Lexeed. The vocabulary should include sufficient words to describe general concepts for self-expandability, and should not be prohibitively large to construct and maintain. There are two conventional methods for selecting fundamental vocabulary. The first is intuition-based selection by experts. This is the traditional method for making dictionaries. A weak point of this method is that the selection strongly depends on personal intuition. The second is corpus-based selection. This method is superior in objectivity to intuition-based selection, however, it is difficult to compile a sufficiently balanced corpora. We propose a psychologically-motivated selection method that adopts word familiarity as the selection criterion. Word familiarity is a rating that represents the familiarity of a word as a real number ranging from 1 (least familiar) to 7 (most familiar). We determined the word familiarity ratings statistically based on psychological experiments over 32 subjects. We selected about 30,000 words as the fundamental vocabulary, based on a minimum word familiarity threshold of 5. We also evaluated the vocabulary by comparing its word coverage with conventional intuition-based and corpus-based selection over dictionary definition sentences and novels, and demonstrated the superior coverage of our lexicon. Based on this, we conclude that the proposed method is superior to conventional methods for fundamental vocabulary selection.
  • 关键词:Lexeed ; fundamental vocabulary ; word familiarity ; semantics ; ontology
国家哲学社会科学文献中心版权所有