Potential data model change #53

CJ-Wright · 2020-10-24T13:37:14Z

It might be good to change the depfinder data model so better reflect a data-report relationship.
This could include:

Express the import search as a dict with keys being the imports the values would contain metadata about the import, for instance, line number if it is in a try/except block, if it is mutually exclusive/cumulatively exhaustive (MECE) with other imports
Generate reports from that data set
1. turn all the imports into conda package names (as we have now)
2. turn all the imports into pypi names
3. conda package names with version constraints (based on libcfgraph data)
4. conda package names with delineations between MECE packages

ericdill · 2020-10-24T17:18:45Z

My impression is that your (1) is the change to the data model and (2) - (6) are downstream functions that take in the new data model and return those things.

Regarding what that data model looks like, in my head it's something like this:

{
    "stdlib_list": {
        "occurances": {
            ("depfinder/main.py", 41): {
                "try": false,
                "if": false,
                "class": false,
                "function": false,
                "exact_line": "from stdlib_list import stdlib_list"
            },
        }
        "conda": "stdlib-list",
        "pypi": "stdlib-list",
    }
    "requests": {
        "occurances": {
            ("depfinder/main.py", 44): {
                "try": false,
                "if": false,
                "class": false,
                "function": false,
                "exact_line": "import requests"
            },
        }
        "conda": "requests",
        "pypi": "requests",
    }
}

Then you can take that data structure and do whatever you want with it downstream after the code parsing has completed. I imagine that every time an import is encountered you could add it to the occurances dict with the file / line number tuple providing unique keys in that dict. Keeping track of all of the places that each import occurs inside of depfinder is something that I've wanted to do for a while, so this seems like a good opportunity to do that.

Not sure if conda and pypi belong in this dict or not. Probably not, now that I think on it a bit more.

Anyway, thoughts?

CJ-Wright · 2020-10-24T17:20:56Z

Yeah, sorry markdown didn't render the tabs properly.

CJ-Wright · 2020-10-24T17:24:29Z

I would have the conda/pypi part done independently, since a 3rd party may want to use their own mappings.

One of my concerns is how to get enough detail into the questionable imports piece (try, if, etc.). I would want to know which libs are part of a "pick one" set (for instance try: import pyqt4; except ImportError: import pyqt5). This would enable us to make certain that you had at least some of the libs depfinder thought you needed. For tooling around generating the requirements from scratch, I'm not certain how you would include that but it would be good to have for other use cases.

ericdill · 2020-10-24T18:10:52Z

Agreed on the conda/pypi part.

Regarding the "pick one" set stuff, I'm pretty sure you could do that via ast. Not exactly sure how, but I don't imagine it would take too much poking around. Would probably require some reworking of how the ImportFinder works. Might need to consider designing a state machine to track where we are in the hierarchy. On second thought, this is probably a bit trickier than I originally thought

CJ-Wright · 2020-10-24T18:16:56Z

Right, I'm hopeful that we could do it in the code, but I'm not certain what the data model would be that supports it. Maybe we associate shielded imports with an ID, so that all imports with (or within) that ID are associated together.

CJ-Wright mentioned this issue Nov 9, 2020

create new data model in total_imports and supply a secondary report using conda-forge graphs, also split up main into a bunch of of modules #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential data model change #53

Potential data model change #53

CJ-Wright commented Oct 24, 2020 •

edited

Loading

ericdill commented Oct 24, 2020

CJ-Wright commented Oct 24, 2020

CJ-Wright commented Oct 24, 2020

ericdill commented Oct 24, 2020

CJ-Wright commented Oct 24, 2020

Potential data model change #53

Potential data model change #53

Comments

CJ-Wright commented Oct 24, 2020 • edited Loading

ericdill commented Oct 24, 2020

CJ-Wright commented Oct 24, 2020

CJ-Wright commented Oct 24, 2020

ericdill commented Oct 24, 2020

CJ-Wright commented Oct 24, 2020

CJ-Wright commented Oct 24, 2020 •

edited

Loading