期刊名称:International Journal of Signal Processing, Image Processing and Pattern Recognition
印刷版ISSN:2005-4254
出版年度:2015
卷号:8
期号:11
页码:433-444
DOI:10.14257/ijsip.2015.8.11.39
出版社:SERSC
摘要:With the rapid development of the Internet, the application of data mining in the Internet is becoming more and more extensive. However, the data source's complex feature redundancy leads that data mining process becomes very inefficient and complex. So feature selection research is essential to make data mining more efficient and simple. In this paper, we propose a new way to measure the correlation degree of internal features of dataset which is a mutation of mutual information. Additionally we also introduce Hoeffding inequality as constraint of constructing algorithm. During the experiments, we use C4.5 classification algorithm as test algorithm and compare HSF with BIF(feature selection algorithm based on mutual information). Experiments results show that HSF performances better than BIF[1] in TP and FP rate, what's more the feature subset obtained by HSF can significantly improve the TP, FP and memory usage of C4.5 classification algorithm.
关键词:Hoeffding inequality; data-stream; feature selection; mutual Information