首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Clustering Students’ Arabic Tweets using Different Schemes
  • 本地全文:下载
  • 作者:Hamed Al-Rubaiee ; Khalid Alomar
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2017
  • 卷号:8
  • 期号:4
  • DOI:10.14569/IJACSA.2017.080438
  • 出版社:Science and Information Society (SAI)
  • 摘要:In this paper, Twitter has been chosen as a platform for clustering the topics that have been mentioned by King Abdulaziz University students to understand students’ behaviours and answer their inquiries. The aim of the study is to propose a model for clustering analysis of Saudi Arabian (standard and Arabian Gulf dialect) tweets to segment topics included in the students’ posts. A combination of the natural language processing (NLP) and the machine learning (ML) method to build models is used to cluster tweets according to their text similarity. K-mean algorithm is utilised with different vector representation schemes such as TF-IDF (term frequency-inverse document frequency) and BTO (binary-term occurrence). Distinct preprocessing is explored to obtain the N-grams term of tokens. The cluster distance performance task is applied to determine the average between the centroid clusters. Moreover, human evaluation clustering is performed by looking at the data source to make sure that the clusters are making sense to an educational domain. At this moment, each cluster has been identified, and students’ accounts on Twitter have been known by their facilities or their educational system, such as e-learning. The results show that the best vector’s representation was using BTO, and it will be useful to apply it to cluster students’ text instead of the TF-IDF scheme.
  • 关键词:Twitter; Arabic tweets; Saudi Arabia; King Abdulaziz University; data mining; data preparation
国家哲学社会科学文献中心版权所有