文章基本信息

标题：An Associative Classification Data Mining Approach for Detecting Phishing Websites
本地全文：下载
作者：Suzan Wedyan ; Fadi Wedyan
期刊名称：Journal of Emerging Trends in Computing and Information Sciences
电子版ISSN：2079-8407
出版年度：2014
卷号：4
期号：12
页码：888-899
出版社：ARPN Publishers
摘要：Phishing websites are fake websites that are created by dishonest people to mimic webpages of real websites. Victims of phishing attacks may expose their financial sensitive information to the attacker whom might use this information for financial and criminal activities. Various approaches have been proposed to detect phishing websites, among which, approaches that utilize data mining techniques had shown to be more effective. The main goal of data mining is to analyze a large set of data to identify unsuspected relation and extract understandable useful patterns. Associative Classification (AC) is a promising data mining approach which integrates two known data mining tasks, association rule mining and classification. This paper, proposes a new AC algorithm called Phishing Associative Classification (PAC), for detecting phishing websites. PAC employed a novel methodology in construction the classifier which results in generating moderate size classifiers. The algorithm improved the effectiveness and efficiency of a known algorithm called MCAR, by introducing a new prediction procedure and adopting a different rule pruning procedure. The conducted experiments compared PAC with 4 well-known data mining algorithms, these are: covering algorithm (Prism), decision tree (C4.5), associative Classification (CBA) and MCAR. Experiments are performed on a dataset that consists of 1010 website. Each Website is represented using 17 features categorized into 4 sets. The features are extracted from the website contents and URL. The results on each features set show that PAC is either equivalent or more effective than the compared algorithms. When all features are considered, PAC outperformed the compared algorithms and correctly identified 99.31% of the tested websites. Furthermore, PAC produced less number of rules than MCAR, and therefore, is more efficient.
关键词：Associative classification; Data Mining; Phishing Websites; Machine Learning