Skip to content

Commit

Permalink
Remove full-depth traversal - only list items in the requested 'direc…
Browse files Browse the repository at this point in the history
…tory' (#871)

Currently the azure container storage implementation of list_bucket does
a depth-first traversal of the blob directory while the aws s3
implementation only lists the items in the target directory. Since doing
the full traversal can be VERY time-consuming when a lot of wal segments
are archived in the bucket this degrades recovery operations.

This commit removes the recursive calls and only returns first-level content
similar to aws s3 implementation.

Signed-off-by: Michael Wallace <[email protected]>
  • Loading branch information
sjuls authored Nov 29, 2023
1 parent b62ab54 commit 933dd65
Showing 1 changed file with 3 additions and 22 deletions.
25 changes: 3 additions & 22 deletions barman/cloud_providers/azure_blob_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@

try:
from azure.storage.blob import (
BlobPrefix,
ContainerClient,
PartialBatchErrorException,
)
Expand Down Expand Up @@ -290,26 +289,6 @@ def _create_bucket(self):
# the storage account level in Azure)
self.container_client.create_container()

def _walk_blob_tree(self, obj, ignore=None):
"""
Walk a blob tree in a directory manner and return a list of directories
and files.
:param ItemPaged[BlobProperties] obj: Iterable response of BlobProperties
obtained from ContainerClient.walk_blobs
:param str|None ignore: An entry to be excluded from the returned list,
typically the top level prefix
:return: List of objects and directories in the tree
:rtype: List[str]
"""
if obj.name != ignore:
yield obj.name
if isinstance(obj, BlobPrefix):
# We are a prefix and not a leaf so iterate children
for child in obj:
for v in self._walk_blob_tree(child):
yield v

def list_bucket(self, prefix="", delimiter=DEFAULT_DELIMITER):
"""
List bucket content in a directory manner
Expand All @@ -322,7 +301,9 @@ def list_bucket(self, prefix="", delimiter=DEFAULT_DELIMITER):
res = self.container_client.walk_blobs(
name_starts_with=prefix, delimiter=delimiter
)
return self._walk_blob_tree(res, ignore=prefix)

for item in res:
yield item.name

def download_file(self, key, dest_path, decompress=None):
"""
Expand Down

0 comments on commit 933dd65

Please sign in to comment.