High level Python wrapper for the Azure CLI to download or upload files in batches from or to Azure Blob Storage Containers. This project aims to be the missing functionality in the Python SDK of Azure Storage since there is no possibility to download or upload batches of files from or to containers. The only option in the Azure Storage Python SDK is downloading file by file, which takes a lot of time.
Besides doing loads in batches, since version 0.0.5
it's possible to set method to single
which will use the
Azure Python SDK to process files one by one.
pip install azurebatchload
See PyPi for package index.
Note: For batch uploads (method="batch"
) Azure CLI has to be installed
and configured.
Check if Azure CLI is installed through terminal:
az --version
Azure Storage connection string has to be set as environment variable AZURE_STORAGE_CONNECTION_STRING
or
the seperate environment variables AZURE_STORAGE_KEY
and AZURE_STORAGE_NAME
which will be used to create the connection string.
Azure-batch-load automatically checks for environment variables: AZURE_STORAGE_CONNECTION_STRING
,
AZURE_STORAGE_KEY
and AZURE_STORAGE_ACCOUNT
.
So if the connection_string or storage_key + storage_account are set as environment variables,
we can leave the argument connection_string
, account_key
and account_name
empty:
from azurebatchload import Download
Download(
destination='../pdfs',
source='blobcontainername',
extension='.pdf'
).download()
We can make skip the usage of the Azure CLI
and just make use Python SDK by setting the method="single"
:
from azurebatchload import Download
Download(
destination='../pdfs',
source='blobcontainername',
extension='.pdf',
method='single'
).download()
We can download a folder by setting the folder
argument. This works both for single
and batch
.
from azurebatchload import Download
Download(
destination='../pdfs',
source='blobcontainername',
folder='uploads/invoices/',
extension='.pdf',
method='single'
).download()
We can give a list of files to download with the list_files
argument.
Note, this only works with method='single'
.
from azurebatchload import Download
Download(
destination='../pdfs',
source='blobcontainername',
folder='uploads/invoices/',
list_files=["invoice1.pdf", "invoice2.pdf"],
method='single'
).download()
from azurebatchload import Upload
Upload(
destination='blobcontainername',
source='../pdf',
extension='*.pdf'
).upload()
from azurebatchload import Upload
Upload(
destination='blobcontainername',
source='../pdf',
extension='*.pdf',
method="single"
).upload()
from azurebatchload import Upload
Upload(
destination='blobcontainername',
source='../pdf',
list_files=["invoice1.pdf", "invoice2.pdf"],
method="single"
).upload()
With the Utils.list_blobs
method we can do advanced listing of blobs in a container or specific folder in a container.
We have several argument we can use to define our scope of information:
name_starts_with
: This can be used to filter files with certain prefix, or to select certain folders:name_starts_with=folder1/subfolder/lastfolder/
dataframe
: Define if you want a pandas dataframe object returned for your information.extended_info
: Get just the blob names or more extended information like size, creation date, modified date.
from azurebatchload import Utils
list_blobs = Utils(container='containername').list_blobs()
from azurebatchload import Utils
df_blobs = Utils(
container='containername',
dataframe=True
).list_blobs()
from azurebatchload import Utils
list_blobs = Utils(
container='containername',
name_starts_with="foldername/"
).list_blobs()
from azurebatchload import Utils
dict_blobs = Utils(
container='containername',
name_starts_with="foldername/",
extended_info=True
).list_blobs()
from azurebatchload import Utils
df_blobs = Utils(
container='containername',
name_starts_with="foldername/",
extended_info=True,
dataframe=True
).list_blobs()