Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instantiation contacts minio server when auto_create_bucket enabled #94

Open
jayvdb opened this issue May 30, 2020 · 7 comments
Open

Instantiation contacts minio server when auto_create_bucket enabled #94

jayvdb opened this issue May 30, 2020 · 7 comments

Comments

@jayvdb
Copy link

jayvdb commented May 30, 2020

The constructor of django-minio-storage with auto_create_bucket enabled contacts the server and can raise an exception if it hasnt started yet.

It would be nice if this could be delayed until the storage is being used. I do want the auto-create, but not during the constructor where delays ad exceptions like this are not expected.

  File "/usr/lib/python3.8/site-packages/sorl/thumbnail/default.py", line 24, in _setup
    self._wrapped = get_module_class(settings.THUMBNAIL_STORAGE)()
  File "/usr/lib/python3.8/site-packages/minio_storage/storage.py", line 360, in __init__
    super().__init__(
  File "/usr/lib/python3.8/site-packages/minio_storage/storage.py", line 75, in __init__
    self._init_check()
  File "/usr/lib/python3.8/site-packages/minio_storage/storage.py", line 81, in _init_check
    if self.auto_create_bucket and not self.client.bucket_exists(
  File "/usr/lib/python3.8/site-packages/minio/api.py", line 453, in bucket_exists
    self._url_open('HEAD', bucket_name=bucket_name)
  File "/usr/lib/python3.8/site-packages/minio/api.py", line 2035, in _url_open
    region = self._get_bucket_region(bucket_name)
  File "/usr/lib/python3.8/site-packages/minio/api.py", line 1965, in _get_bucket_region
    region = self._get_bucket_location(bucket_name)
  File "/usr/lib/python3.8/site-packages/minio/api.py", line 1998, in _get_bucket_location
    response = self._http.urlopen(method, url,
  File "/usr/lib/python3.8/site-packages/urllib3/poolmanager.py", line 330, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 747, in urlopen
    return self.urlopen(
  File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 747, in urlopen
    return self.urlopen(
  File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 747, in urlopen
    return self.urlopen(
  [Previous line repeated 2 more times]
  File "/usr/lib/python3.8/site-packages/urllib3/connectionpool.py", line 719, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3.8/site-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /media?location= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f936a1dc790>: Failed to establish a new connection: [Errno 111] Connection refused'))
@thomasf
Copy link
Collaborator

thomasf commented May 30, 2020

I would prefer to defer such duties to service deamons, configure it with a restart policy for django to auto restart a few times if you can't make it depend on some external condition that minio is up. Would that work for you?

I prefer if application start up fails fast as much as possible.

@jayvdb
Copy link
Author

jayvdb commented May 31, 2020

Startup is almost completely unrelated, but I will touch on that later. My apologies for mentioning startup which is only one scenario, and I can appreciate that multi-service startup sequence and health-check during start is a more complicated topic which is a distraction.

fwiw, I already have health checks on my minio service, and mechanisms in place to bring it back up in the event of failure. I do not have my Django killed because my minio is problematic. I expect my Django to gracefully degrade wherever possible until problems in other services have been rectified.

The thumbnailer class instantiation and default storage instantiation happen whenever they are needed, which can occur at any time, for any reason, after Django and the storage has been operating normally.

This is about constructors being constructors , not being time consuming health-checks. This is especially important for the storage class, because they are used like a factory and thus are instantiated all the time, and running this bucket-exists health-check every time is wasteful and slows down the page render time. Bucket auto-creation should occur as a reaction to failure, not as a check before any use.

Exceptions should occur on the reasonable code path where the caller is expecting them. In my case there is an exception handler surrounding this use of storage, which isnt working as expected because it is expecting the exception when using the storage, not when instantiating the default storage from the settings. Where the exception should occur, the context handler has the bucket and name of the object which is causing the problem, and a fallback mechanism.

If you insist that the backend constructor should be where buckets are checked and created, what might make sense is for this check to occur once for each minio storage backend accessed, ideally with some opt-in or opt-out, either global or via a constructor arg. i.e. a private module level variable which ensures each new unique endpoint/bucket seen by any backend is verified to be operational and the bucket is ready.

I would still like to opt-out of that, as I do want bucket auto-creation but I do not want it during instantiation of a class.

Also constructor arg assume_bucket_exists seems to merely make auto_create_bucket and auto_create_policy and policy_type in-operative. I can understand why having an envvar which blocks other envars might help, but it makes no sense for a class constructor arg to only exist in order to force three other constructor args to be ignored. The caller shouldnt provide the args with values if they do not want those values to be used.

Fwiw, I can code, could help, but ideally we agree on how it should work before coding.
If we cant agree, everyone is poorer for it, but at least then I know I should go add auto-bucket-creation to one of the other two minio storage providers, as at first glance they seem to be missing that feature, and it is an important feature IMO.

@jayvdb
Copy link
Author

jayvdb commented May 31, 2020

To make this a bit more concrete, one backwards compatible solution would be to add a 'delay_init' arg and setting, which could be False by default to maintain the existing behaviour.

@thomasf
Copy link
Collaborator

thomasf commented May 31, 2020

Storage classes are instanced once, when they are created which is on Django start up (for media/static-storage, example: https://github.com/django/django/blob/17752003a8c115ff79f5f21655f5e9b8b2af67f4/django/core/files/storage.py#L367 ). The FileStorage class is the class that instantiates StorageFile instances as a factory but not the FileStorage instance is a (per processs?) singleton instance.

I think that was the initial reason why I wanted to avoid postponing checks like this to avoid a potential on every request-scenario. I guess that some care as to be taken where delayed init should run, because generating urls, maybe only on save/write since get/list will fail naturally with a bucket-not-exist error and url() doens't really require the storage back end at all.

I consider auto creation of bucket and policies to be more of a local development convenience utility, I always have it turned off for staging and production deployments and let service provisioning take care of those things.

But sure maybe we can come up with an alternative configuration as well, I mostly wanted a real justification before considering adding something.

@thomasf
Copy link
Collaborator

thomasf commented May 31, 2020

If assume_bucket_exists is a little bit weird it's probably because it was added later in a pull request.

@jayvdb
Copy link
Author

jayvdb commented May 31, 2020

Storage classes are instanced once, ..

Not guaranteed. sorl.thumbnails has its own instance.
And it wouldnt surprise me to find an app instantiating multiple instances of the same storage class, or keeping it alive only for a short period when it is needed, and creating another one if it is needed again.

when they are created which is on Django start up (for media/static-storage ..

Django may start its instances of those classes during its startup, but other apps are free to do it whenever it suits them. Usually they are created using LazyObject (as sorl.thumbnails does), which means the constructor will run when they are used (for sorl.thumbnail, this typically happens on template render).

I consider auto creation of bucket and policies to be more of a local development convenience utility ..

Yup; same here. But it is an important convenience IMO.

Initialising the bucket before a write seems like a good spot. I could even be part of the exception handler for a failed write, detecting the error was missing bucket, create the bucket and re-attempt the write.

@thomasf
Copy link
Collaborator

thomasf commented May 31, 2020

I would argue that if a django app create multiuple instances of the FileStorage it's using the system soft wrong but sure I guess it's not bad to handle even when people are using it wrong as well (at the very least wasting resources for no reason). sorl seems to use the same strategy as Django by using a singleton https://github.com/jazzband/sorl-thumbnail/blob/b6358d234d7de3927a2666a2a5ab3d7870c0e1d3/sorl/thumbnail/default.py#L30

I guess that there might be some use case if you need to dynamically create a lot of instances (even if every separate instance could still be stored in a dict) and maybe not use them. I have never seen such usage in practise though. Maybe something along the lines of supporting end user configurable s3 endpoints or something where it's bad to have a dict with hundreds of thousands of FileStorage instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants