文章基本信息

标题：S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle
本地全文：下载
作者：Satish B Basapur ; B S Shylaja ; Venkatesh 等
期刊名称：Journal of Computer Science
印刷版ISSN：1549-3636
出版年度：2022
卷号：18
期号：3
页码：138-150
DOI：10.3844/jcssp.2022.138.150
语种：English
出版社：Science Publications
摘要：Timely data analysis on a wide variety and a large volume of data unveil valuable information or new insights. The analysis results could be used to innovate new avenues in health care service, business and e-service, etc. However, releasing, storing and reusing sensitive data to third parties results in breaching the data privacy of the individual. To combat privacy breach invasion, privacy-preserving techniques such as suppression, generalization and encryption-based privacy models have been proposed in the literature. The widely used privacy preservation model k-anonymity model prevents record-linkage invasions but fails to satisfy monotonicity property. It has more data distortion and fails to defend semantic-similarity, closeness, nearest-neighborhood data privacy breaches. Moreover, existing approaches are not scalable for the large-scale data set. The paper proposes a semantic similarity two-phase cluster based privacy preservation model. The proposed model considers both numerical and categorical attribute values for data anonymization. Two-phase clustering contains two phases. In the first phase, the t-centroid clustering algorithm is designed and used to partition a set of transaction records of data set D into a set of t-centroids based on the Euclidean distance between transaction records. In the second phase, the neighborhood-aware hierarchical clustering algorithm is designed. It is used to split a set of transaction records within clusters based on neighborhood aware attribute values. Two-phase clustering operations are carried out in parallel and scalable for Big Data sets. The proposed privacy model relies on cell generalization to combat records linkage and semantic-similarity, closeness, nearest-neighborhood privacy breach invasion. All experiments are carried out on two different datasets: Income-Census (KDD) and Bank Credit Card dataset. The experimental results demonstrate that the proposed privacy model can combat privacy breach invasion with cell generalization principles. The proposed privacy model is scalable and time efficient for large-scale data sets.
关键词：Privacy Preservation Model;Cell Generalization;Transaction Records;Clusters;Quasi-Identifiers and Sensitive Attributes