期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2018
卷号:6
期号:5
页码:5918-5922
DOI:10.15680/IJIRCCE.2018.0605115
出版社:S&S Publications
摘要:Nowadays a primary trouble in spam filtering in addition to textual content classification in natural
language processing is the huge size of vector area due to the several characteristic terms that is commonly the purpose
of widespread calculation and slow classification. Support Vector Machine (SVM) takes a set of input data and output
the prediction that data lays in one of the two classes i.e. It classify the data into possible classes. SVM has the greater
ability to generalize the problem, which is the goal in statistical learning. The statistical learning theory provides an
outline for studying the problem of gaining knowledge, making predictions, making decisions from a set of data. In the
existing work, Support Vector machine (SVM) used for training and testing datasets. It has many drawbacks which
degrades the performance of process. Although SVMs have good generalization performance, they can be abnormally
slow in test phase. Another limitation is speed and size, both in training and testing. the feature vector of every email
will be extracted by the feature selection module. Because most of the features present redundancy and inconsistency,
we adopt a feature selection method that is based on the information gain (IG). Specifically, we compute the IG for
every feature vector, no matter whether it corresponds to a spam or a regular email. These feature vectors are then
ordered based on their IG values, in a decreasing order.
关键词:Information gain; Support Vector machine; Spam; e;mails;