文章基本信息

标题：OffTamil@DravideanLangTech-EASL2021: Offensive Language Identification inTamil Text
本地全文：下载
作者：Disne Sivalingam ; Sajeetha Thavareesan
期刊名称：Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度：2021
卷号：2021
页码：346-351
语种：English
出版社：ACL Anthology
摘要：In the last few decades, Code-Mixed Offensive texts are used penetratingly in social media posts. Social media platforms and online communities showed much interest on offensive text identification in recent years. Consequently, research community is also interested in identifying such content and also contributed to the development of corpora. Many publicly available corpora are there for research on identifying offensive text written in English language but rare for low resourced languages like Tamil. The first code-mixed offensive text for Dravidian languages are developed by shared task organizers which is used for this study. This study focused on offensive language identification on code-mixed low-resourced Dravidian language Tamil using four classifiers (Support Vector Machine, random forest, k- Nearest Neighbour and Naive Bayes) using chiˆ2 feature selection technique along with BoW and TF-IDF feature representation techniques using different combinations of n-grams. This proposed model achieved an accuracy of 76.96% while using linear SVM with TF-IDF feature representation technique.