Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential data model change #53

Open
CJ-Wright opened this issue Oct 24, 2020 · 5 comments
Open

Potential data model change #53

CJ-Wright opened this issue Oct 24, 2020 · 5 comments

Comments

@CJ-Wright
Copy link
Collaborator

CJ-Wright commented Oct 24, 2020

It might be good to change the depfinder data model so better reflect a data-report relationship.
This could include:

  1. Express the import search as a dict with keys being the imports the values would contain metadata about the import, for instance, line number if it is in a try/except block, if it is mutually exclusive/cumulatively exhaustive (MECE) with other imports
  2. Generate reports from that data set
    1. turn all the imports into conda package names (as we have now)
    2. turn all the imports into pypi names
    3. conda package names with version constraints (based on libcfgraph data)
    4. conda package names with delineations between MECE packages

thoughts @ericdill @ocefpaf @jkarp314

@ericdill
Copy link
Owner

My impression is that your (1) is the change to the data model and (2) - (6) are downstream functions that take in the new data model and return those things.

Regarding what that data model looks like, in my head it's something like this:

{
    "stdlib_list": {
        "occurances": {
            ("depfinder/main.py", 41): {
                "try": false,
                "if": false,
                "class": false,
                "function": false,
                "exact_line": "from stdlib_list import stdlib_list"
            },
        }
        "conda": "stdlib-list",
        "pypi": "stdlib-list",
    }
    "requests": {
        "occurances": {
            ("depfinder/main.py", 44): {
                "try": false,
                "if": false,
                "class": false,
                "function": false,
                "exact_line": "import requests"
            },
        }
        "conda": "requests",
        "pypi": "requests",
    }
}

Then you can take that data structure and do whatever you want with it downstream after the code parsing has completed. I imagine that every time an import is encountered you could add it to the occurances dict with the file / line number tuple providing unique keys in that dict. Keeping track of all of the places that each import occurs inside of depfinder is something that I've wanted to do for a while, so this seems like a good opportunity to do that.

Not sure if conda and pypi belong in this dict or not. Probably not, now that I think on it a bit more.

Anyway, thoughts?

@CJ-Wright
Copy link
Collaborator Author

Yeah, sorry markdown didn't render the tabs properly.

@CJ-Wright
Copy link
Collaborator Author

I would have the conda/pypi part done independently, since a 3rd party may want to use their own mappings.

One of my concerns is how to get enough detail into the questionable imports piece (try, if, etc.). I would want to know which libs are part of a "pick one" set (for instance try: import pyqt4; except ImportError: import pyqt5). This would enable us to make certain that you had at least some of the libs depfinder thought you needed. For tooling around generating the requirements from scratch, I'm not certain how you would include that but it would be good to have for other use cases.

@ericdill
Copy link
Owner

Agreed on the conda/pypi part.

Regarding the "pick one" set stuff, I'm pretty sure you could do that via ast. Not exactly sure how, but I don't imagine it would take too much poking around. Would probably require some reworking of how the ImportFinder works. Might need to consider designing a state machine to track where we are in the hierarchy. On second thought, this is probably a bit trickier than I originally thought

@CJ-Wright
Copy link
Collaborator Author

Right, I'm hopeful that we could do it in the code, but I'm not certain what the data model would be that supports it. Maybe we associate shielded imports with an ID, so that all imports with (or within) that ID are associated together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants