期刊名称:Indian Journal of Computer Science and Engineering
印刷版ISSN:2231-3850
电子版ISSN:0976-5166
出版年度:2017
卷号:8
期号:3
页码:201-209
出版社:Engg Journals Publications
摘要:The world is in the era where the Internet based data storage, retrieval and processing are prevailing. There isincreasing demand for knowledge discovery from big data. In order to have comprehensive business intelligencefrom big data, MapReduce is the programming paradigm used. This kind of computation occurs in clusters of alarge number of commodity computers in an economical fashion. However, source code written for mapper andreducer is untrusted and can lead to leakage of sensitive data. This is the potential and challenging privacyproblem to be addressed. Anonymization techniques such as k-anonymity, t-closeness and l-diversity are used toanonymize sensitive information. Unfortunately, anonymization cannot provide the level of privacy needed. Inthis paper a methodology is proposed. It is to ensure that privacy is guaranteed in distributed computingframeworks such as MapReduce. The methodology is realized using the MapReduce framework of AmazonElastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). Empirical study revealed that ourmethodology is useful in privacy preserving big data mining.
关键词:Big data; big data mining; MapReduce framework; untrusted mapper and reducer; privacy of big;data