-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support detection of shadowed built-in python modules #120
Comments
Hi @asfaltboy thanks for taking the time to provide feedback! 🤗 🎉 So, the idea would be that rather than checking the content of the files, we would check the name of the files themselves and compare them to the list of builtin modules, right? 🤔 That's a very good idea IMHO! ✨ A quick look at the code though shows that usually I'm happy to review PRs on that direction though, if you feel like looking into it! 🌟 |
Hi @gforcada nice to meet you 👋 my apologies for not getting back to you sooner. I haven't actually thought about how we'd solve it, but what you say does make sense. Thank you for setting me on the right course. I have checked out the and wrote a test, and now that I'm thinking about the implementation I have a couple of questions:
I'm leaning a bit towards (1) as this approach is simpler to start with, and would protect against future import or improper usage downstream (see here for an example). |
@asfaltboy hi! 👋🏾 no need to apologize, I'm not specially known for replying fast in GitHub 😅 your 1) idea sounds good, can we achieve that within Otherwise my idea would be to check the imports, and if the import is not a stdlib import, i.e. It might be tricky though, as some distributions might want to do that on purpose, but as you were saying, those distributions surely know how to turn off a Let me know if you need help, otherwise, unfortunately, I would not push it myself, at least not right now |
This is the first thing I'm going to check. I'm hoping that filename is already being passed from flake8 according to the property in
My fear is that even if we don't import the package it will break when using a package that wants to import the builtin. I certainly experienced this with at least a couple of libraries. I think it was with transformers and scipy, but will try to provide a 'Minimal reproducible example' so we can reason about it easier.
No worries dude. I have suddenly come across a bit of extra time, but also this feature is not very urgent either. I will do my best to do small bits of incremental work on this. |
And Thanks for contributing!! ✨ |
Hi all, thanks for this wonderful library and feature. Much appreciated. I like this idea to check module name conflict with python built-in. However, A005 sort of works unintended for me. I'm developing this sqllineage package, where I provide the following modules: from sqllineage.core.parser.sqlfluff.extractors import copy
from sqllineage.core.parser.sqlfluff.extractors import select
from sqllineage import io They're conflicting with builtin modules, true. But I would argue that these modules, in particular I'm turning off A005 for now and would love to know your thoughts. Thanks. |
Hi @reata, thanks for using and providing feedback about @asfaltboy provided a The way |
Thanks both, I'll try to add more details. This new A005 rule is intended more for application devs than library devs, as it occurs when modules in the python path shadow the built-in modules. For instance, a user of sqllineage library could create two modules in their current dir:
When a user runs ❯ python run.py
Traceback (most recent call last):
File "/home/asfaltboy/Code/oss/temp/run.py", line 1, in <module>
from sqllineage import runner
File "/home/asfaltboy/Code/oss/sqllineage/sqllineage/runner.py", line 4, in <module>
from typing import Dict, List, Optional, Tuple
ImportError: cannot import name 'Dict' from 'typing' (/home/asfaltboy/Code/oss/temp/typing.py) With the A005 rule, the user can check their project for this issue, and identify the root cause before the code even runs. Note that an automated testing with a tool like pytest won't necessarily catch this, as they change the way modules are imported (e.g ❯ flake8 ./
./typing.py:0:1: A005 the module is shadowing a Python builtin module "typing" |
Note, while testing this out just now, I noticed that python also comes with a bunch of frozen modules, that are necessary for the import system to function in the first place: >>> import sys
>>> from importlib import FrozenImporter
>>> [k for k, m in sys.modules.items() if m.__loader__ is FrozenImporter]
['_frozen_importlib', '_frozen_importlib_external', 'zipimport', 'codecs', 'abc', 'io', 'stat', '_collections_abc', 'genericpath', 'posixpath', 'os.path', 'os', '_sitebuiltins', 'site', 'importlib._bootstrap', 'importlib._bootstrap_external', 'importlib.machinery'] As you can see, However, I'm not sure how stable this list of frozen packages may change over time, and I'm not sure if it's a good idea to complicate the rule's behaviour and exclude packages in this lib, as it might be confusing to end users |
Thanks both for the detailed information. I second to @asfaltboy that the A005 is intended more for application devs than library devs. So I would prefer turning off A005 to I'd be very interested to know if/when you decide to adapt A005 for libraries so that BTW: nice to know about this 'frozen modules' thing that there's safeguard. I always thought we're empowered to shadow every built-in module and should tread carefully. |
Based on how I understand what this lint rule is supposed to check for, we get a few false-positives in Open-MSS/MSS#2312. E.g. we get
because the file is named time.py, but an import of that module would be I think this lint rule should consider the full names by which a module is actually importable, and not just check the file name. |
I think excluding nested modules, assuming flake8 runs from the root of the project, would prevent most of these false positives in library packages (like with @reata 's report above). Though, we might introduce false negatives, for application projects that choose to nest their scripts out of convenience. E.g: Is the false positive issue significantly more frequent than the above pattern 🤔? If it is, then I think we should go ahead with this change. If not, then perhaps we need another solution. @gforcada what are your thoughts? Side note: I think it would be hard to support all plausible project structure/ import patterns. My hope was that the new rule would discourage people from using the built-in names entirely. That said, I can see that the common pattern of namespaced/nested modules likely avoids the issue, so it's reasonable to support it |
Sorry to get late to the party 😓 My thoughts:
|
Similar to how
check_import()
compares to names ofbuiltins
but this time covering packages likelogging
orsecrets
which is a common python gotcha.The list of these packages exists as the
sys.stdlib_module_names
tuple from 3.10, so I hope 🙏 we could trivially support this rule in python 3.10+ to begin with.Some example references:
The text was updated successfully, but these errors were encountered: