期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2018
卷号:96
期号:6
出版社:Journal of Theoretical and Applied
摘要:Sanitization of big data before it is subjected to mining or publishing is very important for privacy reasons. Though sanitization is not new, sanitization of big data based on a measurability score is a novel idea. We proposed a framework known as M-Sanit to realize this idea. The framework is meant for big data sanitization prior to processing it. We proposed an extended misusablity score function that can return misuse probability of given dataset. This score plays an important role in determining the level of sanitization needed. This kind of sanitization provides expected level of anonymity and protects data from privacy attacks. The rationale behind this is that outsourced data may be misused by insiders. To get rid of this problem, the data is subjected to sanitization after finding measurability score. Our contributions in this paper are two-fold. First we provided mathematical model for extended measurability score. Second we proposed an algorithm to utilize the measurability score to determine the level of sanitization. We built a prototype application using locally configured Hadoop in clustered environment to demonstrate proof of the concept. Our results revealed the utility of M-Sanit for protecting big data from privacy problems.