文章基本信息

标题：Comparative Analysis of Lexicon and Machine Learning Approach for Sentiment Analysis
本地全文：下载
作者：Roopam Srivastava ; P. K. Bharti ; Parul Verma 等
期刊名称：International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN：2158-107X
电子版ISSN：2156-5570
出版年度：2022
卷号：13
期号：3
DOI：10.14569/IJACSA.2022.0130312
语种：English
出版社：Science and Information Society (SAI)
摘要：Opinion mining or analysis of text are other terms for sentiment analysis. The fundamental objective is to extract meaningful information and data from unstructured text using natural language processing, statistical, and linguistics methodologies. This further is used for deriving qualitative and quantitative results on the scale of ‘positive’, ‘neutral’, or ‘negative to get the overall sentiment analysis. In this research, we worked with both approaches, machine learning, and an unsupervised lexicon-based algorithm for sentiment calculation and model performance. Stochastic gradient descent (SGD) is utilized in this work for optimization for support vector machine (SVM) and logistic regression. AFINN and Vader lexicon are used for the lexicon model. Both the feature TF-IDF and bag of a word are used for classification. This dataset includes "Trip advisor hotel reviews". There are around 20k reviews in the dataset. Cleaned and preprocessed data were used in our work. We conducted some training and assessment. A classifier's accuracy is measured using evaluation metrics. In TF-IDF, the Support Vector Machine is the more accurate of the two classifiers used to assess machine learning accuracy. The classification rate in Bag of Words was 95.2 percent and the accuracy in TF-IDF was 96.3 percent on the support vector machine algorithm. VADER outperforms the Lexicon model with an accuracy of 88.7%, whereas AFINN Lexicon has an accuracy of 86.0%. When comparing the Supervised and unsupervised lexicon approaches, support vector machine model outperforms with a TFIDF accuracy of 96.3 percent and a VADER lexicon accuracy of 88.7%.
关键词：NLP; sentiment analysis; SGD (stochastic gradient descent); machine learning; TFIDF; BoW; VADER; SVM; AFINN