摘要:The network malicious information filtering is a binary classification problem. The SVM (Support Vector Machine) algorithm can be used in such information filtering model, but the negative samples were difficult to gain in such practical application, so the sample space is imbalance for SVM, the prediction result of classifier would tend to majority class and the filtering error is larger. This paper generate virtual samples space by using the K-means cluster and GA (Genetic Algorithm) algorithm to reduce the imbalance of two kinds of samples, and then use the new sample set to train the SVM classifier. Build the new network malicious information filtering model based on SVM. Experiment based on UCI data set and webpage from internet proved the validity of the new filtering model. The improved information filtering model is suitable for network malicious information filtering.
关键词:network information filtering;malicious information;SVM;genetic crossover;k-means clustering;virtual sample build