期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2022
卷号:13
期号:1
DOI:10.14569/IJACSA.2022.0130112
语种:English
出版社:Science and Information Society (SAI)
摘要:In recent times, one of the most emerging sub-dimensions of natural language processing is sentiment analysis which refers to analyzing opinion on a particular subject from plain text. Drug sentiment analysis has become very significant in present times as classifying medicines based on their effectiveness through analyzing reviews from users can assist potential future consumers in gaining knowledge and making better decisions about a particular drug. The objective of this proposed research is to measure the effectiveness level of a particular drug. Currently most of the text mining researches are based on unsupervised machine learning methods to cluster data. When supervised learning methods are used for text mining, the usual primary concern is to classify the data into two classes. Lack of technical terms in similar datasets make the categorization even more challenging. The proposed research focuses on finding out the keywords through tokenization and lemmatization so that better accuracy can be achieved for categorizing the drugs based on their effectiveness using different algorithms. Such categorization can be instrumental for treating illness as well as improve one’s health and well-being. Four machine learning algorithms have been applied for binary classification and one for multiclass classification on the drug review dataset acquired from the UCI machine learning repository. The machine learning algorithms used for binary classification are naive Bayes classifier, random forest, support vector classifier (SVC), and multilayer perceptron; among these machine learning algorithms, linear SVC was used for multiclass classification. Results obtained from these four classifier algorithms have been analyzed to evaluate their performances. The random forest has been proven to have the best performance among these four algorithms. However, multiclass classification was found to have low performance when applied to natural language processing. On the contrary, the applied linear SVC algorithm performed better for class 2 with AUC 0.82 in this research.
关键词:Machine Learning Algorithms; natural language processing; drugs sentiment analysis; text mining