首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle
  • 本地全文:下载
  • 作者:Satish B Basapur ; B S Shylaja ; Venkatesh
  • 期刊名称:Journal of Computer Science
  • 印刷版ISSN:1549-3636
  • 出版年度:2022
  • 卷号:18
  • 期号:3
  • 页码:138-150
  • DOI:10.3844/jcssp.2022.138.150
  • 语种:English
  • 出版社:Science Publications
  • 摘要:Timely data analysis on a wide variety and a large volume of data unveil valuable information or new insights. The analysis results could be used to innovate new avenues in health care service, business and e-service, etc. However, releasing, storing and reusing sensitive data to third parties results in breaching the data privacy of the individual. To combat privacy breach invasion, privacy-preserving techniques such as suppression, generalization and encryption-based privacy models have been proposed in the literature. The widely used privacy preservation model k-anonymity model prevents record-linkage invasions but fails to satisfy monotonicity property. It has more data distortion and fails to defend semantic-similarity, closeness, nearest-neighborhood data privacy breaches. Moreover, existing approaches are not scalable for the large-scale data set. The paper proposes a semantic similarity two-phase cluster based privacy preservation model. The proposed model considers both numerical and categorical attribute values for data anonymization. Two-phase clustering contains two phases. In the first phase, the t-centroid clustering algorithm is designed and used to partition a set of transaction records of data set D into a set of t-centroids based on the Euclidean distance between transaction records. In the second phase, the neighborhood-aware hierarchical clustering algorithm is designed. It is used to split a set of transaction records within clusters based on neighborhood aware attribute values. Two-phase clustering operations are carried out in parallel and scalable for Big Data sets. The proposed privacy model relies on cell generalization to combat records linkage and         semantic-similarity, closeness, nearest-neighborhood privacy breach invasion. All experiments are carried out on two different datasets:         Income-Census (KDD) and Bank Credit Card dataset. The experimental results demonstrate that the proposed privacy model can combat privacy breach invasion with cell generalization principles. The proposed privacy model is scalable and time efficient for large-scale data sets.
  • 关键词:Privacy Preservation Model;Cell Generalization;Transaction Records;Clusters;Quasi-Identifiers and Sensitive Attributes
国家哲学社会科学文献中心版权所有