Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta-Presence #322

Open
JimAchterbergLUMC opened this issue Jan 31, 2025 · 0 comments
Open

Delta-Presence #322

JimAchterbergLUMC opened this issue Jan 31, 2025 · 0 comments

Comments

@JimAchterbergLUMC
Copy link

Description

The Delta-Presence privacy metric computes the ratio of real to synthetic samples in "similar groups" (i.e., k-means clusters). High ratios indicate there are generally many real samples per synthetic sample in similar groups, thus low disclosure risk from synthetic data. That is also why the goal is to maximize this metric.

However, the code takes the maximum computed ratio over all clusters. This seems to indicate whether there is low privacy risk (i.e., high delta-presence) for some group. Wouldn't it be more sensible to take the minimum ratio over all clusters, thereby indicating whether there is high privacy risk (i.e., low delta-presence) for some group? Especially since, in privacy assessments, we usually want to consider worst-case instead of best-case scenarios?

Glad to hear your thoughts, or whether my interpretation of this metric is incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant