首页    期刊浏览 2024年07月07日 星期日
登录注册

文章基本信息

  • 标题:Differential Privacy for Coverage Analysis of Software Traces
  • 本地全文:下载
  • 作者:Hao, Yu ; Latif, Sufian ; Zhang, Hailong
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2021
  • 卷号:194
  • DOI:10.4230/LIPIcs.ECOOP.2021.8
  • 语种:English
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:This work considers software execution traces, where a trace is a sequence of run-time events. Each user of a software system collects the set of traces covered by her execution of the software, and reports this set to an analysis server. Our goal is to report the local data of each user in a privacy-preserving manner by employing local differential privacy, a powerful theoretical framework for designing privacy-preserving data analysis. A significant advantage of such analysis is that it offers principled "built-in" privacy with clearly-defined and quantifiable privacy protections. In local differential privacy, the data of an individual user is modified using a local randomizer before being sent to the untrusted analysis server. Based on the randomized information from all users, the analysis server computes, for each trace, an estimate of how many users have covered it. Such analysis requires that the domain of possible traces be defined ahead of time. Unlike in prior related work, here the domain is either infinite or, at best, restricted to many billions of elements. Further, the traces in this domain typically have structure defined by the static properties of the software. To capture these novel aspects, we define the trace domain with the help of context-free grammars. We illustrate this approach with two exemplars: a call chain analysis in which traces are described through a regular language, and an enter/exit trace analysis in which traces are described by a balanced-parentheses context-free language. Randomization over such domains is challenging due to their large size, which makes it impossible to use prior randomization techniques. To solve this problem, we propose to use count sketch, a fixed-size hashing data structure for summarizing frequent items. We develop a version of count sketch for trace analysis and demonstrate its suitability for software execution data. In addition, instead of randomizing separately each contribution to the sketch, we develop a much-faster one-shot randomization of the accumulated sketch data. One important client of the collected information is the identification of high-frequency ("hot") traces. We develop a novel approach to identify hot traces from the collected randomized sketches. A key insight is that the very large domain of possible traces can be efficiently explored for hot traces by using the frequency estimates of a visited trace and its prefixes and suffixes. Our experimental study of both call chain analysis and enter/exit trace analysis indicates that the frequency estimates, as well as the identification of hot traces, achieve high accuracy and high privacy.
  • 关键词:Trace Profiling;Differential Privacy;Program Analysis
国家哲学社会科学文献中心版权所有