Remove full-depth traversal - only list items in the requested 'directory' #871

sjuls · 2023-11-24T09:42:10Z

Currently the azure container storage implementation of list_bucket does a depth-first traversal of the blob directory while the aws s3 implementation only lists the items in the target directory. Since doing the full traversal can be VERY time-consuming when a lot of wal segments are archived in the bucket this degrades recovery operations.

We are currently experiencing 10 minute download times for a 300 byte .history file because the tree walk lists all wal segments on all timelines.

This PR removes the recursive calls and only returns first-level content similar to aws s3 implementation.

…tory'

sjuls · 2023-11-28T14:13:02Z

Hi @mikewallace1979,

Any chance I can get a review of this PR? This should bring list_bucket implementation of azure blob storage in-line with the aws s3 implementation so it should be safe.

Happy to make any change you deem necessary.

mikewallace1979 · 2023-11-28T14:55:45Z

Hi @sjuls - thanks for doing the analysis and proposing a fix, we will take a closer look this week.

mikewallace1979

@sjuls I think your proposed fix is correct.

Recursing through the "directories" when fetching the .history file occurs during the download_wal function where the bucket prefix will be the wals directory of the server and the .history files will be directly under it. Recursing through the common prefixes containing the WAL segments themselves is indeed completely unnecessary here, since when a WAL is required from a prefix Barman will have already determined the correct path under which to look.

I checked the other calls to list_bucket to verify nothing depends on recursing through the directories. The following places do need to iterate through all the objects under the prefix, however they call list_bucket with no delimiter and therefore will get all the object keys:

The remaining calls to list_bucket (in get_backup_list and get_backup_files) are where the default delimiter of / is used but the code has already determined the correct prefix under which the target objects will be:

If you could add a commit removing the now unused import of BlobPrefix on L47 then I think we should be able to merge this PR.

sjuls · 2023-11-29T14:38:23Z

@mikewallace1979 Ah right, good catch. I've added a commit to remove the import.

Thanks for the review 🙏

Remove full-depth traversal - only list items in the requested 'direc…

8ee2ab2

…tory'

mikewallace1979 requested changes Nov 29, 2023

View reviewed changes

Remove unused import BlobPrefix

ec6eebd

sjuls requested a review from mikewallace1979 November 29, 2023 14:39

mikewallace1979 approved these changes Nov 29, 2023

View reviewed changes

mikewallace1979 merged commit 933dd65 into EnterpriseDB:master Nov 29, 2023
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove full-depth traversal - only list items in the requested 'directory' #871

Remove full-depth traversal - only list items in the requested 'directory' #871

sjuls commented Nov 24, 2023 •

edited

Loading

sjuls commented Nov 28, 2023

mikewallace1979 commented Nov 28, 2023

mikewallace1979 left a comment

sjuls commented Nov 29, 2023

Remove full-depth traversal - only list items in the requested 'directory' #871

Remove full-depth traversal - only list items in the requested 'directory' #871

Conversation

sjuls commented Nov 24, 2023 • edited Loading

sjuls commented Nov 28, 2023

mikewallace1979 commented Nov 28, 2023

mikewallace1979 left a comment

Choose a reason for hiding this comment

sjuls commented Nov 29, 2023

sjuls commented Nov 24, 2023 •

edited

Loading