首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish
  • 本地全文:下载
  • 作者:Veronika Laippala ; Veronika Laippala ; Jenna Kanerva
  • 期刊名称:Procedia - Social and Behavioral Sciences
  • 印刷版ISSN:1877-0428
  • 出版年度:2015
  • 卷号:198
  • 页码:233-241
  • DOI:10.1016/j.sbspro.2015.07.441
  • 语种:English
  • 出版社:Elsevier
  • 摘要:AbstractThis article studies syntactic ngrams, i.e. little subtrees of dependency syntax analyses, as keystructures reflecting syntactic characteristics of corpora. While traditional keywords correspond to statistically more or less frequent words of a corpus and are often informative on the corpus topic and style, unlexicalized syntactic ngrams applied in this study extend the level of description beyond individual words to sequences of syntactic elements. The article analyzes the utility of these sequences in corpus description and gives first results on the structural characteristics reflected by them in the studied texts, including Finnish literature, Internet forum discussions from the major Finnish social networking website and Internet discussions following the news and editorials of the major Finnish newspaper's website. The syntactic ngrams are produced with the freely available Finnish Dependency Parser and Ngram Builder and the keystructures analyzed with a linear classifier. The results suggest that syntactic ngrams illustrate both topical features, such as names and Internet urls discussed in the corpora, as well as structural characteristics, such as subject-verb combinations, negations and informal sentence structures, thus both generalizing the information given by traditional keywords from individual words to concepts and providing new knowledge about typical constructions not reached by lexemes.
  • 关键词:Keyness;syntactic ngrams;computer-mediated communication
国家哲学社会科学文献中心版权所有