[SPIKE] Investigate deleting all files at the end of a read job #428

Aryex · 2022-06-16T19:22:50Z

Description

Currently, we cleanup exported files at 2 points

When a reader is closing. At this point, the reader will delete files that have all their portions read.
When the application is closing. It will delete all non-external data folders created.

However, if the user has a long-running Spark application (Jupiter notebook) and Spark doesn't read all of the exported files, like with df.show() which only reads 20 rows by default, then the unread files will not be deleted.

If we are able to know when a read job has finished, it would greatly simplify our clean procedure.

Reason: Improves user experience by removing superfluous files

Aryex added the enhancement New feature or request label Jun 16, 2022

jonathanl-bq added the High Priority label Jul 15, 2022

jonathanl-bq added Normal Priority and removed High Priority labels Jul 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPIKE] Investigate deleting all files at the end of a read job #428

[SPIKE] Investigate deleting all files at the end of a read job #428

Aryex commented Jun 16, 2022 •

edited by jonathanl-bq

Loading

[SPIKE] Investigate deleting all files at the end of a read job #428

[SPIKE] Investigate deleting all files at the end of a read job #428

Comments

Aryex commented Jun 16, 2022 • edited by jonathanl-bq Loading

Description

Aryex commented Jun 16, 2022 •

edited by jonathanl-bq

Loading