期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2020
卷号:11
期号:9
DOI:10.14569/IJACSA.2020.0110972
出版社:Science and Information Society (SAI)
摘要:Social media networks such as Twitter are increasingly utilized to propagate hate speech while facilitating mass communication. Recent studies have highlighted a strong correlation between hate speech propagation and hate crimes such as xenophobic attacks. Due to the size of social media and the consequences of hate speech in society, it is essential to develop automated methods for hate speech detection in different social media platforms. Several studies have investigated the application of different machine learning algorithms for hate speech detection. However, the performance of these algorithms is generally hampered by inefficient sequence transduction. The Vanilla recurrent neural networks and recurrent neural networks with attention have been established as state-of-the-art methods for the assignments of sequence modeling and sequence transduction. Unfortunately, these methods suffer from intrinsic problems such as long-term dependency and lack of parallelization. In this study, we investigate a transformer-based method and tested it on a publicly available multiclass hate speech corpus containing 24783 labeled tweets. DistilBERT transformer method was compared against attention-based recurrent neural networks and other transformer baselines for hate speech detection in Twitter documents. The study results show that DistilBERT transformer outperformed the baseline algorithms while allowing parallelization.
关键词:Attention transformer; deep learning; neural network; recurrent network; sequence transduction