首页    期刊浏览 2024年09月15日 星期日
登录注册

文章基本信息

  • 标题:H-Prop and H-Prop-News: Computational Propaganda Datasets in Hindi
  • 本地全文:下载
  • 作者:Deptii Chaudhari ; Ambika Vishal Pawar ; Alberto Barrón-Cedeño
  • 期刊名称:Data
  • 印刷版ISSN:2306-5729
  • 出版年度:2022
  • 卷号:7
  • 期号:3
  • 页码:1-11
  • DOI:10.3390/data7030029
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:In this digital era, people rely on the internet for their news consumption. As people arefree to express their opinions on social media, much information shared on the internet is loadedwith propaganda. Propagandist contents are intended to influence public opinion. In the mainstreammedia or prominent news agencies, the authors’ and news agencies’ own bias may impact in the newscontents. Hence, it is required to detect such propaganda spread through news articles. Detectionand classification of propagandist text require standard, high-quality, annotated datasets. A fewdatasets are available for propaganda classification. However, these datasets are mostly in English.Hindi is the most spoken language in India, and efforts are needed to detect its propagandist contents.This research work introduces two new datasets: H-Prop and H-Prop-News, which consist of newsarticles in Hindi annotated as propaganda or non-propaganda. The H-Prop dataset is generatedby translating 28,630 news articles from the QProp dataset. The H-Prop-News dataset contains5500 news articles collected from 32 prominent Hindi news websites. We experiment with theproposed datasets using four supervised machine learning models combined with different featurevectors and word embeddings. Our experiments achieve 87% accuracy using Logistic Regressionwith TF-IDF feature vectors. The datasets provide high-quality labeled news articles in Hindi andopen new avenues for researchers to explore techniques for analyzing and classifying propaganda inHindi text.
  • 关键词:propaganda identification;news articles analysis;Hindi text processing
国家哲学社会科学文献中心版权所有