Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIKE] Investigate deleting all files at the end of a read job #428

Open
Aryex opened this issue Jun 16, 2022 · 0 comments
Open

[SPIKE] Investigate deleting all files at the end of a read job #428

Aryex opened this issue Jun 16, 2022 · 0 comments
Labels
enhancement New feature or request Normal Priority

Comments

@Aryex
Copy link
Collaborator

Aryex commented Jun 16, 2022

Description

Currently, we cleanup exported files at 2 points

  1. When a reader is closing. At this point, the reader will delete files that have all their portions read.
  2. When the application is closing. It will delete all non-external data folders created.

However, if the user has a long-running Spark application (Jupiter notebook) and Spark doesn't read all of the exported files, like with df.show() which only reads 20 rows by default, then the unread files will not be deleted.

If we are able to know when a read job has finished, it would greatly simplify our clean procedure.

Reason: Improves user experience by removing superfluous files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Normal Priority
Projects
None yet
Development

No branches or pull requests

2 participants