文章基本信息

标题：Feature Selection for Intrusion Detection Using Random Forest
本地全文：下载
作者：Md. Al Mehedi Hasan ; Mohammed Nasser ; Shamim Ahmad 等
期刊名称：Journal of Information Security
印刷版ISSN：2153-1234
电子版ISSN：2153-1242
出版年度：2016
卷号：07
期号：03
页码：129-140
DOI：10.4236/jis.2016.73009
语种：English
出版社：Scientific Research Publishing
摘要：An intrusion detection system collects and analyzes information from different areas within a computer or a network to identify possible security threats that include threats from both outside as well as inside of the organization. It deals with large amount of data, which contains various ir-relevant and redundant features and results in increased processing time and low detection rate. Therefore, feature selection should be treated as an indispensable pre-processing step to improve the overall system performance significantly while mining on huge datasets. In this context, in this paper, we focus on a two-step approach of feature selection based on Random Forest. The first step selects the features with higher variable importance score and guides the initialization of search process for the second step whose outputs the final feature subset for classification and in-terpretation. The effectiveness of this algorithm is demonstrated on KDD’99 intrusion detection datasets, which are based on DARPA 98 dataset, provides labeled data for researchers working in the field of intrusion detection. The important deficiency in the KDD’99 data set is the huge number of redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by eliminating redundant record from KDD’99 train and test dataset, so the classifiers and feature selection method will not be biased towards more frequent records. This RRE-KDD consists of both KDD99Train+ and KDD99Test+ dataset for training and testing purposes, respectively. The experimental results show that the Random Forest based proposed approach can select most im-portant and relevant features useful for classification, which, in turn, reduces not only the number of input features and time but also increases the classification accuracy.
关键词：Feature Selection;KDD’99 Dataset;RRE-KDD Dataset;Random Forest;Permuted Importance Measure