期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:8
DOI:10.14569/IJACSA.2021.0120889
语种:English
出版社:Science and Information Society (SAI)
摘要:The aim of this research is to detect and classify websites based on their content if it encourages spreading hate speech toward Islam and Muslims, or Islamophobia using sentiment analysis and web text mining techniques. In this research, a large dataset corpus has been collected, to identify and classify anti-Islamic online contents. Our target is to automatically detect the content of those websites that are hostile to Islam and transmitting extremist ideas against it. The main purpose is to reduce the spread of those webpages that give the wrong idea about Islam. The proper dataset is collected from different sources, and the two datasets for the Arabic language (balanced and unbalanced) have been produced. The framework of the proposed approach has been described. The approach used in this framework is based on supervised Machine Learning (ML) approach using Support Vector Machines (SVM) and Multinomial Naive Bayes (MNB) models as classifiers, and Term Frequency-Inverse Document Frequency (TF-IDF) as feature extraction. Different experiments including word level and tri-gram level on the two datasets have been conducted, and compared the obtained results. The experimental results shows that the supervised ML approach using word level is the finest approach for both datasets that produce high accuracy with 97% applied on the balanced Arabic dataset using SVM algorithm with TF-IDF as feature extraction. Finally, an interactive web-application prototype has been developed and built in order to detect and classify toxic language such as anti-Islamic online text-contents.
关键词:Web Text mining; text classification; Arabic computational linguistics; natural language processing; SVM; MNB; opinion mining; hate speech; toxic language detection