Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove full-depth traversal - only list items in the requested 'directory' #871

Merged
merged 2 commits into from
Nov 29, 2023

Conversation

sjuls
Copy link
Contributor

@sjuls sjuls commented Nov 24, 2023

Currently the azure container storage implementation of list_bucket does a depth-first traversal of the blob directory while the aws s3 implementation only lists the items in the target directory. Since doing the full traversal can be VERY time-consuming when a lot of wal segments are archived in the bucket this degrades recovery operations.

We are currently experiencing 10 minute download times for a 300 byte .history file because the tree walk lists all wal segments on all timelines.

This PR removes the recursive calls and only returns first-level content similar to aws s3 implementation.

@sjuls
Copy link
Contributor Author

sjuls commented Nov 28, 2023

Hi @mikewallace1979,

Any chance I can get a review of this PR? This should bring list_bucket implementation of azure blob storage in-line with the aws s3 implementation so it should be safe.

Happy to make any change you deem necessary.

@mikewallace1979
Copy link
Contributor

Hi @sjuls - thanks for doing the analysis and proposing a fix, we will take a closer look this week.

Copy link
Contributor

@mikewallace1979 mikewallace1979 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjuls I think your proposed fix is correct.

Recursing through the "directories" when fetching the .history file occurs during the download_wal function where the bucket prefix will be the wals directory of the server and the .history files will be directly under it. Recursing through the common prefixes containing the WAL segments themselves is indeed completely unnecessary here, since when a WAL is required from a prefix Barman will have already determined the correct path under which to look.

I checked the other calls to list_bucket to verify nothing depends on recursing through the directories. The following places do need to iterate through all the objects under the prefix, however they call list_bucket with no delimiter and therefore will get all the object keys:

The remaining calls to list_bucket (in get_backup_list and get_backup_files) are where the default delimiter of / is used but the code has already determined the correct prefix under which the target objects will be:

If you could add a commit removing the now unused import of BlobPrefix on L47 then I think we should be able to merge this PR.

@sjuls
Copy link
Contributor Author

sjuls commented Nov 29, 2023

@mikewallace1979 Ah right, good catch. I've added a commit to remove the import.

Thanks for the review 🙏

@mikewallace1979 mikewallace1979 merged commit 933dd65 into EnterpriseDB:master Nov 29, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants