Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with building PyTorch 2.1.2 using EasyBuild 5 #3570

Open
lcniel opened this issue Jan 28, 2025 · 6 comments
Open

Issue with building PyTorch 2.1.2 using EasyBuild 5 #3570

lcniel opened this issue Jan 28, 2025 · 6 comments
Labels
Milestone

Comments

@lcniel
Copy link
Contributor

lcniel commented Jan 28, 2025

There seems to be an issue with the new handling of use_pip and buildcmd. Basically, for some reason the code ends up here:

build_cmd = f"{self.python_cmd} setup.py {build_cmd}"

        if self.use_setup_py:

...
            if not build_cmd:
                build_cmd = 'build'  # Default value for setup.py
            build_cmd = f"{self.python_cmd} setup.py {build_cmd}"

But in the easyconfig, e.g. PyTorch-2.1.2-foss-2023a-CUDA-12.1.1.eb, 'buildcmd' is set: buildcmd = '%(python)s setup.py build' # Run the (long) build in the build step which leads to an incorrect command being parsed together and the build failing. From the log, it seems like use_pip should be correctly set in the PyTorch easyblock, but it does not appear to actually be used... Not sure what's going on here.

@lcniel
Copy link
Contributor Author

lcniel commented Jan 28, 2025

ping @Micket

@boegel boegel transferred this issue from easybuilders/easybuild-framework Jan 29, 2025
@boegel boegel added this to the 5.0 milestone Jan 29, 2025
@boegel
Copy link
Member

boegel commented Jan 29, 2025

@Micket
Copy link
Contributor

Micket commented Feb 3, 2025

build_cmd is a dangerous name. In some easyblocks, it means the first, primary, executable to run for the build step. But in this easyblock it seems to mean the first argument, i.e. the "target" for python setup.py.

Now, i would very much like to figure out how and where this changed as to track down if we have a larger systematic error with EB5, or if it just affects PyTorch somehow.

Flamefire added a commit to Flamefire/easybuild-easyblocks that referenced this issue Feb 4, 2025
When changing `use_pip` after `PythonPackage.__init__` called
`determine_install_command` the change is not honored.
Call it again after the change.

This also requires to make it idempotent so all member variables changed
in that function need to be set in all cases.

Fixes easybuilders#3570
@Flamefire
Copy link
Contributor

One major issue is that the attempt to use pip only for the latest PyTorch versions introduced in #3079 does not work as expected:

  • We set the default to None
  • Call PythonPackage.__init__
  • That calls determine_install_command
  • Then we change the current value of use_pip
  • in build_step use_setup_py is not updated and we run into the branch that appends the build_cmd to setup.py

As I agree with @Micket that buildcmd is wrongly named I opened a PR to use a better name: #3575

The issue here should be fixed with #3574 which is related and required independent of the other PR although that helps too.

@boegel
Copy link
Member

boegel commented Feb 5, 2025

This may be introduced via #3539, but not sure...

@Flamefire
Copy link
Contributor

This may be introduced via #3539, but not sure...

No that didn't change the behavior. It would be the same without but possibly not resolving the template so fail for that additionally.

The issue is rather easybuilders/easybuild-easyconfigs#20004 that remove use_pip = True from the EC causing use_setup_py to be set by PythonPackage which is never reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Blockers
Development

No branches or pull requests

4 participants