期刊名称:Indian Journal of Computer Science and Engineering
印刷版ISSN:2231-3850
电子版ISSN:0976-5166
出版年度:2021
卷号:12
期号:3
页码:743-751
DOI:10.21817/indjcse/2021/v12i3/211203298
出版社:Engg Journals Publications
摘要:The expansion of the internet leads to an increase in the number of cyber-attacks over the days. One of the most common cybersecurity attacks is social engineering, which depends on human physiology. The phishing attack is the most popular form of social engineering. The phishing attacks have many forms, but the traditional one from them is the messages. We need techniques to protect us from these attacks. Awareness, usage policies, and other procedures are not enough. Therefore, we proposed to use natural language processing (NLP) along with machine learning techniques for text phishing detection in this paper. We started with 6,224 emails from an existing dataset that contains both phishing and legitimate emails. NLP was used for preparing the data before extracting features from it and using the features for training the classification models by machine learning algorithm and for testing these models. The features were extracted using the Continuous Bag of Words (CBOW) in the Word2Vec algorithm. We are training four models using four different machine learning algorithms which are knearest neighbors (KNN), Multinomial Naive Bayes (MNB), Decision Tree and AdaBoost. The developed models had to classify the text messages into two categories, which are phishing and legitimate. While the dataset is unbalanced, we used performance measurements for unbalanced data in the evaluation process. Three of our models, which were trained by KNN, Decision Tree and AdaBoost algorithms, obtained considerable values while the MNB model obtained an insignificant value.