Delta-Presence #322

JimAchterbergLUMC · 2025-01-31T13:28:05Z

Description

The Delta-Presence privacy metric computes the ratio of real to synthetic samples in "similar groups" (i.e., k-means clusters). High ratios indicate there are generally many real samples per synthetic sample in similar groups, thus low disclosure risk from synthetic data. That is also why the goal is to maximize this metric.

However, the code takes the maximum computed ratio over all clusters. This seems to indicate whether there is low privacy risk (i.e., high delta-presence) for some group. Wouldn't it be more sensible to take the minimum ratio over all clusters, thereby indicating whether there is high privacy risk (i.e., low delta-presence) for some group? Especially since, in privacy assessments, we usually want to consider worst-case instead of best-case scenarios?

Glad to hear your thoughts, or whether my interpretation of this metric is incorrect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delta-Presence #322

Delta-Presence #322

JimAchterbergLUMC commented Jan 31, 2025

Delta-Presence #322

Delta-Presence #322

Comments

JimAchterbergLUMC commented Jan 31, 2025

Description