首页    期刊浏览 2025年06月27日 星期五
登录注册

文章基本信息

  • 标题:ReaderBench: Multilevel analysis of Russian text characteristics
  • 本地全文:下载
  • 作者:Dragos Corlatescu ; Ștefan Ruseti ; Mihai Dascalu
  • 期刊名称:Russian Journal of Linguistics
  • 印刷版ISSN:2687-0088
  • 电子版ISSN:2686-8024
  • 出版年度:2022
  • 卷号:26
  • 期号:2
  • 页码:342-370
  • DOI:10.22363/2687-0088-30145
  • 语种:English
  • 出版社:Peoples’ Friendship University of Russia (RUDN University)
  • 摘要:This paper introduces an adaptation of the open source ReaderBench framework that now supports Russian multilevel analyses of text characteristics, while integrating both textual complexity indices and state-of-the-art language models, namely Bidirectional Encoder Representations from Transformers (BERT). The evaluation of the proposed processing pipeline was conducted on a dataset containing Russian texts from two language levels for foreign learners (A - Basic user and B - Independent user). Our experiments showed that the ReaderBench complexity indices are statistically significant in differentiating between the two classes of language level, both from: a) a statistical perspective, where a Kruskal-Wallis analysis was performed and features such as the “nmod” dependency tag or the number of nouns at the sentence level proved the be the most predictive; and b) a neural network perspective, where our model combining textual complexity indices and contextualized embeddings obtained an accuracy of 92.36% in a leave one text out cross-validation, outperforming the BERT baseline. ReaderBench can be employed by designers and developers of educational materials to evaluate and rank materials based on their difficulty, as well as by a larger audience for assessing text complexity in different domains, including law, science, or politics.
国家哲学社会科学文献中心版权所有