标题:HYBRID MODEL FOR TWITTER DATA SENTIMENT ANALYSIS BASED ON ENSEMBLE OF DICTIONARY BASED CLASSIFIER AND STACKED MACHINE LEARNING CLASSIFIERS-SVM, KNN AND C5.0
期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2020
卷号:98
期号:4
页码:624-635
出版社:Journal of Theoretical and Applied
摘要:Social Networking sites like Twitter and Facebook has offered the possibility to users to express their opinion on various topics and events. Opinion mining is a technique to find the sentiment of people about these topics, which can be useful in decision support. Various government policies can also be monitored by doing the sentiment analysis of related tweets. The objective of this research is to enhance the accuracy of twitter sentiment classification. The paper proposes a framework for a hybrid approach with an ensemble of stacked machine learning algorithms and dictionary based classifier. Sentiment Score extracted from dictionary based classifier is added as additional feature in the feature set. Three machine learning algorithms SVM, KNN and C5.0 are stacked to build an ensemble by using two Meta learners RF and GLM. Real time manually labeled tweets based on “Clean India Mission” an Indian government policy is used for implementation of the model. Proposed model is compared with different machine learning and ensemble classifiers. Proposed hybrid model recorded higher accuracy of 0.9066377 for 5 fold cross validation and 0.9124793 for 10 fold cross validation as compared to 0.8667328 in case of stacked ensemble of SVMRadial, KNN and C5.0 by using RF as Meta classifier. RF Meta classifier performed better as compared to GLM in all stacked based ensemble. Proposed model also recorded higher accuracy as compared to machine learning classifiers-SVM, Naïve Bayes, Decision Tree, Random forest and Maximum Entropy. The contribution of the research is to enhance the accuracy of stacked based ensemble classifiers for twitter sentiment classification by using additional sentiment score provided by dictionary based classifier.
关键词:Clean India Mission;C5.0;KNN;Sentiment Analysis;Stack Ensemble;SVM;Swatch Bharat.