摘要:Feature selection is based on the notion that redundant and/or irrelevant variables bring no additional information about the data classes and can be considered noise for the predictor. As a result, the total feature set of a dataset could be minimized to only a few features containing maximum discrimination information about the class. Classification accuracy is used as the evaluation measure in guiding the feature selection process. At the same time, such measure does not take into account the privacy of the resulting dataset. In this work, we introduce E(S) a multi-dimensional privacy-aware evaluation function in automatic feature selection that enables the DH to select and eventually release the best subset according to its desired efficacy (e.g., accuracy), privacy, and dimensionality of the resulting dataset.
关键词:Feature selection; privacy; data mining; classification; evaluation measure