How is train-test-splitting beneficial? #270

ms2017 · 2024-04-11T13:48:36Z

Question

How is splitting the real input dataset into train and test parts beneficial model evaluation?

Further Information

Based on line 130 and lines 232-273 in file eval.py, all evaluation metrics with the exception of DOMINAS are passed the real test data and the generated data, but not the train dataset. I was wondering whether this hinders effective evaluation with respect to the metrics' ability to detect privacy violation and generalization as well as the benefit this decision has to statistical fidelity metrics. The only metric I think splitting off a test dataset that the generator is not allowed to see is makes some sense is TSTR. More specifically, I'm unsure about:

How does a privacy metric check privacy violations if it does not know the training dataset it is supposed to protect?
How is a metric supposed to detect the generator's ability to generalize if the training dataset is unknown?
What problems do you see when statistical fidelity metrics or discriminative score, for instance, compare train dataset with generated dataset? (In my opinion, this would enable us to make the test dataset for TSTR and others smaller and the train dataset bigger.)

Screenshots

Not applicable

System Information

Not applicable

Additional Context

Not applicable

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is train-test-splitting beneficial? #270

How is train-test-splitting beneficial? #270

ms2017 commented Apr 11, 2024

How is train-test-splitting beneficial? #270

How is train-test-splitting beneficial? #270

Comments

ms2017 commented Apr 11, 2024

Question

Further Information

Screenshots

System Information

Additional Context