期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2014
卷号:67
期号:2
出版社:Journal of Theoretical and Applied
摘要:In spam filtering, the use of machine learning as a filtering method is prone to a high dimensionality of features space. In order to overcome the problem, a lot of feature selection methods have been introduced. Besides, the number of features used as an input to machine learning classifier is still high, thus it will delay the delivery of incoming emails to user�s inbox. Therefore, two stages of feature selection by using Taguchi method to reduce a high dimensionality of features and obtained a good result are introduced. Firstly, we used Gini Index to reduce a high dimensionality and selecting the best subset of features, while Taguchi method is applied to assist Gini Index and PSO-SVM in selecting the best combination of parameter settings. Apart from this, the impact of population size on different classifier is investigated as it brings a high impact on the classifier performances. This method is trained and tested on Ling-spam email dataset. The experimental result shows that a hybrid Gini PSO- SVM feature selection with Taguchi method is able to produce a good classification result even when the population size is less than 10.
关键词:Spam Email; High Dimensionality; Feature Selection; Orthogonal Array; and Population Size.