Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add repository directory to path when importing files from repository #1805

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

b-bondurant
Copy link
Contributor

@b-bondurant b-bondurant commented Dec 17, 2021

ARTIQ Pull Request

Description of Changes

When importing experiments from the repository, add the repository path to sys.path if it contains an __init__.py. Very much a draft at this point, quickly thrown together as a proof-of-concept in order to get feedback.

Related Issue

Closes #1543

Type of Changes

Type
✨ New feature
🔨 Refactoring

Steps (Choose relevant, delete irrelevant before submitting)

All Pull Requests

  • Use correct spelling and grammar.
  • Update RELEASE_NOTES.rst if there are noteworthy changes, especially if there are changes to existing APIs.
  • Close/update issues.
  • Check the copyright situation of your changes and sign off your patches (git commit --signoff, see copyright).

Code Changes

  • Run flake8 to check code style (follow PEP-8 style). flake8 has issues with parsing Migen/gateware code, ignore as necessary.
  • Test your changes or have someone test them. Mention what was tested and how.
  • Add and check docstrings and comments
  • Check, test, and update the unittests in /artiq/test/ or gateware simulations in /artiq/gateware/test

Existing tests pass, but this probably warrants the addition of a few tests. I've done some local testing and imports seem to work for in-repo experiments (with git and file system repos).

Documentation Changes

  • Check, test, and update the documentation in doc/. Build documentation (cd doc/manual/; make html) to ensure no errors.

Git Logistics

  • Split your contribution into logically separate changes (git rebase --interactive). Merge/squash/fixup commits that just fix or amend previous commits. Remove unintended changes & cleanup. See tutorial.
  • Write short & meaningful commit messages. Review each commit for messages (git show). Format:
    topic: description. < 50 characters total.
    
    Longer description. < 70 characters per line
    

Licensing

See copyright & licensing for more info.
ARTIQ files that do not contain a license header are copyrighted by M-Labs Limited and are licensed under LGPLv3+.

@b-bondurant
Copy link
Contributor Author

@lriesebos by the way this actually (kind of) allows git integration with our directory structure if you add a top-level __init__.py and start the master with something along the lines of artiq_master -g -r /path/to/bare_git_repo --experiment-subdir repository. Currently the above command wouldn't pick up the device db since the tools.file_import call for that does not currently provide a repository directory, but that's a relatively easy addition to this PR.

@b-bondurant
Copy link
Contributor Author

Here's what I've done to test so far (repos are public):

# running in a dev shell with the changes I've made
# setup
mkdir tmp && cd tmp
git clone --recursive https://github.com/b-bondurant/experiments-test-outer.git
git clone https://github.com/b-bondurant/experiments-test.git
mkdir bare-repo && cd bare-repo
git init --bare
cd ../experiments-test
git remote add bare-repo ~/tmp/bare-repo
git push bare-repo master
# run commands from here for easy device-db access
cd ../experiments-test-outer

# filesystem-based repo
artiq_session -m=-r -m experiments-test
# running TestExp prints the local paths as expected
# trying to open out_of_repo_test.py fails as expected
# quit the session

# git-based repo
artiq_session -m=-g -m=-r -m ../bare-repo
# running TestExp prints the tmpdir paths as expected
# trying to open out_of_repo_test.py fails as expected

# with the above session still running, and working from experiments-test-outer/ in another dev shell session
artiq_client submit ./experiments-test/test_exp.py
# prints the *local* foo/bar paths, which is probably desired behavior but I actually didn't expect it to work -- I probably forgot about some change I made
# same behavior when submitting the local/filesystem version of test_exp.py via the dashboard

Some potential points for discussion:

  • Also including the repository dir in the path when importing the device db -- I don't see much of a downside to adding this feature and it would be nice for the directory structure we use at duke (and more generally for having all-encompassing git integration)
  • Behavior for artiq_run and out-of-repo experiments -- for artiq_run I think it's reasonable that users should manage the path themselves, but for adding out-of-repo experiments via the gui maybe there should be some kind of additional field
  • It's rather worrying that I didn't expect the last couple of runs in the above example to actually work -- hopefully that's just an oversight or dumb moment on my part

@charlesbaynham
Copy link
Contributor

Hi @b-bondurant, thanks for taking some steps forwards on this!

One comment: my suggestion of using an __init__.py file to mark the repository as a package was based on the idea that the whole repository folder would be imported: in this scenario, using the usual python conventions means that IDEs and code liners would treat imports correctly.

If instead we're adding the repository path to sys.path then having an __init__.py file is slightly unidiomatic: this file implies that the folder containing it is importable, but actually only subdirectories are. That'll confuse autocompletion in IDEs, since in a folder structure like

repo  <-- this is the root of the ARTIQ repository
    |
    |--- __init__.py
    |--- my_experiment.py
    |--- libs
         |--- __init__.py
         |--- lib1.py
         |--- lib2.py

, linters and IDEs will expect from repo.libs import lib1 whereas we'll actually use from libs import lib1. So if we're using the sys.path method instead of importing repo itself, maybe we should use another marker filename (e.g. __artiq_root__.py).

@b-bondurant
Copy link
Contributor Author

How about something like this instead?

def file_import(filename, prefix="file_import_", repository_path=None):
    filename = pathlib.Path(filename)
    modname = prefix + filename.stem

    if repository_path:
        # find any importable modules
        for path in os.listdir(repository_path):
            if os.path.isdir(path) and (pathlib.Path(path) / "__init__.py").exists():
                sys.path.insert(0, path)

@b-bondurant
Copy link
Contributor Author

Okay just kidding, nothing that I did in a151b57 actually fixed anything and I wasn't actually testing the thing that needed to be tested (i.e. the original scenario presented in the issue). This is why we don't start new work late on a Friday afternoon.

But, tweaking the way that imports are done for in-repo experiments seems to do the trick at first glance, and doesn't actually require any extraneous __init__.pys to mimic a package structure.

Using the original example file structure from #1543 (minus the __init__.py):

experiments/
    foo/
        bar.py
    baz/
        qux.py

For importing bar.py, this change would be equivalent to:
old structure:

$ cd experiments/foo
$ python -c "import bar"

new structure:

$ cd experiments
$ python -c "import foo.bar"

IMO it seems like an intuitive change, and all existing unit tests still pass so hopefully I haven't broken anything.

@sbourdeauducq
Copy link
Member

How about something like this instead?

sys.path grows indefinitely every time you call that function (though that's a fixable bug), and also there's a major discrepancy with experiment submitted outside repos.

@charlesbaynham
Copy link
Contributor

LGTM. You're exploiting python's namespaced packages mechanism for the import of modules in subdirectories; I can't think of a problem with that.

With

experiments/
    foo/
        bar.py
        other.py
    baz/
        qux.py

bar can now call from . import other which is nice. It can also call from baz import qux which is great.

I still personally think that from ..baz import qux is more explicit than from baz import qux, and would also be understood by automatic error checkers / linters. For that, we need experiments to be a package too and to exist in sys.modules, as suggested in #1543 (comment). For me, the extra line or two of code is worth it for these features. This PR as-is is already a big step forwards in usability, however.

sys.path grows indefinitely every time you call that function (though that's a fixable bug), ...

Does it? The path-altering code is unaltered from before and protected by a try/finally (and runs in the worker).

@b-bondurant
Copy link
Contributor Author

and would also be understood by automatic error checkers / linters

FWIW, that's relatively easy to get around with most tools - e.g. in PyCharm, a simple right click on experiments and Mark Directory as -> Sources Root picks up everything correctly. It might be a little more complicated for tools like mypy though.

sys.path grows indefinitely every time you call that function (though that's a fixable bug), ...

Does it? The path-altering code is unaltered from before and protected by a try/finally (and runs in the worker).

Yeah, originally I forgot to add the sys.path.remove() calls in the finally block. I added them locally before switching to the namespace method, but never committed that change.

and also there's a major discrepancy with experiment submitted outside repos.

Still true with my most recent commit, but I'd argue that it's kind of an intuitive discrepancy. In the case of an in-repo experiment, I think it makes sense that files would be imported with "knowledge" of the surrounding repository, or to put it in a different way, imported with the repository as the working directory. Whereas for standalone files submitted from outside the repository, I would have no such expectation. However, we could mitigate that discrepancy by adding an extra field in the out-of-repo file browser to allow users to specify an effective "repository" directory for the file. For artiq_run, I think both solutions that were discussed are reasonable -- either doing nothing and expecting the user to modify their PYTHONPATH appropriately, or adding a -r option.

As for modifying sys.modules, I'm not sure what my opinion is. Seems like there's been a lot of discussion on that topic without a definitive conclusion.

@sbourdeauducq
Copy link
Member

I don't understand the user workflow with patches like this. How do you debug an experiment that relies on this? Do you commit it every time?
The "import outside repository" feature was designed for quick iteration/ debugging and without polluting the git history - you just have to save the file and submit and it takes the version on disk.

@b-bondurant
Copy link
Contributor Author

b-bondurant commented Dec 23, 2021

^ Meh? Obviously the UI could use some work but the basic functionality seems to be there.

edit: Never mind, I got ahead of myself and didn't test well enough. Will try to see if it's fixable.

@b-bondurant
Copy link
Contributor Author

Hm. So, the above commit works for the examine call when the file is opened in the dashboard, but I don't see a way to do the same thing for the actual submission of the experiment. wd is never specified in the scheduler interface -- it either retrieves it from the repo backend or sets it to None, all done internally. And that doesn't seem fixable regardless of the method we use for the import structure -- e.g. even if we modify sys.modules for out-of-repo experiments, that would also have to be done from the dashboard and therefore wouldn't make a difference since the master is running in a different process.

To make things work for submission as well, I think we'd need to consider bumping up the resolution of wd to happen in e.g. submit rather than in RunPool. Maybe something like: wd is an optional parameter, and if the exp is out-of-repo, then "repo_rev" isn't in expid, so we can fall back on the wd parameter if it's provided. For in-repo experiments, the default behavior wouldn't change -- "repo_rev" will be in the expid so we can just resolve wd via the repo backend and pass that along to RunPool.

@sbourdeauducq
Copy link
Member

edit: Never mind, I got ahead of myself and didn't test well enough. Will try to see if it's fixable.

I'm not sure if there's a solution other than (1) adding an option in the GUI and elsewhere to specify a "repository folder" when submitting outside repos or (2) adding a repository path that is discovered based on the file being submitted or its location.

@charlesbaynham
Copy link
Contributor

I'm not sure if there's a solution other than (1) adding an option in the GUI and elsewhere to specify a "repository folder" when submitting outside repos or (2) adding a repository path that is discovered based on the file being submitted or its location.

My vote here would be (1). (2) seems like it could introduce strange failures which are confusing to debug.

@b-bondurant That sounds pretty inevitable: if (for out-of-repo experiments) the root directory is a controllable parameter, it has to be part of the information passed to the scheduler (via expid). Your algorithm of

             <is repo_rev present?>
          /                    \
       yes                     no
      /                          \
wd = the repo root      <is wd present?>
                      /                 \
                     no                 yes
                    /                     \
                wd = parent of file     wd = wd

Sounds spot on to me.

@sbourdeauducq I guess the workflow for testing would be one of

  1. Single user - just switch to non-git mode and edit the repository directly.
  2. Work on a named feature branch on your git repo, but push it to the ARTIQ master branch for testing. When you're done, squash all your noisy commits as you usually would for a git repo.

A better workflow would be to push your feature branch to the ARTIQ bare repo, and then call your experiments by setting the "Revision" in the experiment call to the name of your branch. That doesn't currently work, but could be another PR.

@charlesbaynham
Copy link
Contributor

P.S. expid is growing again. In #1641 we discussed its structure and @hartytp suggested a namedtuple rather than a dict, to make it easier to use. This is a conversation for elsewhere, but we could consider doing this, or even using a light-weight validator like pydantic to make expid less error-prone.

@b-bondurant
Copy link
Contributor Author

Additionally, adding "repo_dir" (IMO more expressive than "wd") to the expid would probably make #1571 redundant.

@charlesbaynham
Copy link
Contributor

Happy Christmas, BTW 🙂

@b-bondurant
Copy link
Contributor Author

b-bondurant commented Dec 24, 2021

Okay I think that takes care of things for submissions from the dashboard (functionally, at least). Now, for other interfaces:

  • artiq_client: seeing as this goes through the master, adding an option for specifying a "repository" directory seems like a no-brainer in terms of consistency, and the infrastructure is already in place. Unfortunately, the --repository option is already taken, so I'm not sure what the best solution is in terms of semantics.
  • artiq_run: not super important since users can easily modify their local PYTHONPATH, but it would be trivial to add a --repository option that gets passed to the file_import call
  • artiq_compile: equally trivial, equally unimportant
  • anything I'm missing?

@sbourdeauducq
Copy link
Member

Single user - just switch to non-git mode and edit the repository directly.

Perhaps the master could support different sources at the same time then, and configuring them without restarting? Then you'd set up one in git mode and one in non-git mode.

@b-bondurant
Copy link
Contributor Author

I could see that being really useful. If, instead of having the -g as a store_true type argument, we switch it to being a path like -r (or just add a -G or something if we don't want to make breaking changes), then users could start the master with both a git backend for stable code and a file backend for development/testing. Then in the dashboard we could just provide an extra explorer tab for the second backend. Seems much more user-friendly than the current "open file from outside repository" workflow (not that adding this would require removing that feature).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Experiment package imports with git integration
3 participants