首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Evaluation Methods for Statistically Dependent Text
  • 本地全文:下载
  • 作者:Sarvnaz Karimi ; Jie Yin ; Jiri Baum
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2015
  • 卷号:41
  • 期号:3
  • 页码:539-548
  • DOI:10.1162/COLI_a_00230
  • 语种:English
  • 出版社:MIT Press
  • 摘要:In recent years, many studies have been published on data collected from social media, especially microblogs such as Twitter. However, rather few of these studies have considered evaluation methodologies that take into account the statistically dependent nature of such data, which breaks the theoretical conditions for using cross-validation. Despite concerns raised in the past about using cross-validation for data of similar characteristics, such as time series, some of these studies evaluate their work using standard k-fold cross-validation. Through experiments on Twitter data collected during a two-year period that includes disastrous events, we show that by ignoring the statistical dependence of the text messages published in social media, standard cross-validation can result in misleading conclusions in a machine learning task. We explore alternative evaluation methods that explicitly deal with statistical dependence in text. Our work also raises concerns for any other data for which similar conditions might hold.
国家哲学社会科学文献中心版权所有