首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:A Model for Processing Arabic Text on Twitter
  • 本地全文:下载
  • 作者:Mohamed Osman Hegazi ; Yasser Al-Dossari ; Abdullah Al-Yahya
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2020
  • 卷号:20
  • 期号:5
  • 页码:150-157
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:This paper proposes a model that can be used as a framework for preprocessing Arabic text on Twitter for data analysis and information extraction. The model provides an online collection of Arabic text on Twitter and stores it in a structured database. The source data are then preprocessed to derive clean, meaningful Arabic text from which information can be extracted. The paper presents new methods and algorithms for preprocessing unstructured Arabic text on social media, and it provides solutions that address the difficulties of working with Arabic text on social media, including uncleaned, informal, and dialect language. The preprocessed Arabic text is stored in structured database tables to provide a useful data set to which information selection and data analysis algorithms can be applied. The implementation of the model yields a useful and full-featured dataset, and the text is presented as the source data, the cleaned data and separate Arabic words with their stems, roots and morphologies, among other forms. In addition, the model shows how information can be selected and extracted from this dataset.
  • 关键词:Information retrieval;Natural Language Processing;Database;Data Analysis;Text Mining;Arabic Text.
国家哲学社会科学文献中心版权所有