首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:Mega-COV: A Billion-Scale Dataset of 100+ Languages forCOVID-19
  • 本地全文:下载
  • 作者:Muhammad Abdul-Mageed ; AbdelRahim Elmadany ; El Moatez Billah Nagoudi
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:3402-3420
  • DOI:10.18653/v1/2021.eacl-main.298
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 268 countries), longitudinal (goes as back as 2007), multilingual (comes in 100+ languages), and has a significant number of location-tagged tweets (~169M tweets). We release tweet IDs from the dataset. We also develop two powerful models, one for identifying whether or not a tweet is related to the pandemic (best F1=97%) and another for detecting misinformation about COVID-19 (best F1=92%). A human annotation study reveals the utility of our models on a subset of Mega-COV. Our data and models can be useful for studying a wide host of phenomena related to the pandemic. Mega-COV and our models are publicly available.
国家哲学社会科学文献中心版权所有