首页    期刊浏览 2025年12月22日 星期一
登录注册

文章基本信息

  • 标题:Reducing the Word Length-Token Frequency Function to an Equation
  • 本地全文:下载
  • 作者:Michael Gradoville
  • 期刊名称:Divergencias : Revista de Estudios Linguisticos y Literarios
  • 印刷版ISSN:1555-7596
  • 出版年度:2006
  • 卷号:4
  • 期号:02
  • 出版社:University of Arizona
  • 摘要:The fact that language structure is affected by usage is a cornerstone to functionallinguistics. One specific idea that is generally accepted is that the words with the greatesttoken frequency are also the shortest (e.g. Bybee, 2002). The purpose of this paper is tooutline a statistical method that may be used to perform tests on corpus data related tothe word length-token frequency function. The data used to develop this method comefrom the spoken portion of Davies' (2005) Corpus del espa.ol, a 100 million word corpusof the Spanish language including sources from eight centuries. A rank-order list thatincludes the number of occurrences of each form was extracted from the Corpus delespa.ol and the 1000 most frequent forms were then tagged for length in terms of numberof syllables. Using linear regression analysis, equations were created from the datapresenting word length to be a function of rank in the list in one case and frequency ofoccurrence in the other. These equations represent an approximate average word lengthat any point in the rank-order list. Details for selecting data are discussed and possiblefuture applications of this method are outlined
国家哲学社会科学文献中心版权所有