期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:7
DOI:10.14569/IJACSA.2021.0120733
语种:English
出版社:Science and Information Society (SAI)
摘要:Phishing is a most popular and dangerous cyber-attack in the world of internet. One of the most common attacks in cyber security is to access the personal information of internet users through “Phishing Website”. The major element through which hacker can do this job is through URL. Hacker creates an almost replica of original URL in which there is a very small difference, generally not revealed without keen observation. By pipelining various machine learning algorithms, the proposed model aims to recognize the important features to classify the URL using a recursive feature elimination process. In this work the data set of various URL records has been collected with 112 features including one target value. In this work a Machine Learning based model is proposed to identify the significant features, used to classify a URL, the wrapper method recursive feature elimination compares different bagging and boosting machine learning approaches .Ensemble algorithms, Bootstrap Aggregation Algorithms, Boosting and stacking algorithms are used for feature selection. The proposed work has five sections: work on the pre-processing phase, finding the relation between the features of the dataset, automatic selection of number of features using Extra Tree Classifier, comparison of the various ensemble algorithm and finally generates the best features for URL analysis. This paper, designs meta learner with XG BOOST classifier as base classifier and achieved an accuracy of 93% Out of 112 features, this model has performed an extensive comparative study on feature selection and identified 29 features as core features by performing URL analysis.
关键词:Recursive feature elimination; principal component analysis; standard scalar transformation; eXtreme gradient boosting classifier; correlation matrix