Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44855: [Python][Packaging] Use delvewheel to repair Windows wheels #35323

Merged
merged 24 commits into from
Jan 4, 2025

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented Apr 25, 2023

Rationale for this change

We need to ship the C++ standard library with our Windows wheels, as it is not guaranteed that a recent enough version is present on the system. However, some other Python libraries may require an even more recent version than the one we ship. This may incur crashes when PyArrow is imported before such other Python library, as the older version of the C++ standard library would be used by both.

What changes are included in this PR?

Use a fixed-up version of delvewheel that allows us to name-mangle an individual DLL, and name-mangle msvcp140.dll to ensure that other Python libraries do not reuse the version we ship.

Are these changes tested?

By regular wheel build tests.

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #33981 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Apr 25, 2023
@raulcd
Copy link
Member Author

raulcd commented Apr 25, 2023

@github-actions crossbow submit wheel-windows-cp311-amd64

@github-actions
Copy link

Revision: 408852a

Submitted crossbow builds: ursacomputing/crossbow @ actions-63c44cbef0

Task Status
wheel-windows-cp311-amd64 Github Actions

@raulcd
Copy link
Member Author

raulcd commented Apr 25, 2023

@github-actions crossbow submit wheel-windows-cp311-amd64

@github-actions
Copy link

Revision: 17610b7

Submitted crossbow builds: ursacomputing/crossbow @ actions-41e9fa7976

Task Status
wheel-windows-cp311-amd64 Github Actions

@raulcd
Copy link
Member Author

raulcd commented Apr 27, 2023

@github-actions crossbow submit wheel-windows-cp39-amd64

@github-actions
Copy link

Revision: 3dcc67c

Submitted crossbow builds: ursacomputing/crossbow @ actions-732aa4049a

Task Status
wheel-windows-cp39-amd64 Github Actions

@raulcd
Copy link
Member Author

raulcd commented Apr 27, 2023

@github-actions crossbow submit wheel-windows-cp39-amd64

@github-actions
Copy link

Revision: c9beb9e

Submitted crossbow builds: ursacomputing/crossbow @ actions-f42319b210

Task Status
wheel-windows-cp39-amd64 Github Actions

@raulcd
Copy link
Member Author

raulcd commented Apr 27, 2023

@github-actions crossbow submit wheel-windows-cp39-amd64

@github-actions
Copy link

Revision: 1d9c22f

Submitted crossbow builds: ursacomputing/crossbow @ actions-ed01c3db03

Task Status
wheel-windows-cp39-amd64 Github Actions

@raulcd raulcd removed the awaiting committer review Awaiting committer review label Jun 22, 2023
kou pushed a commit that referenced this pull request Jun 25, 2023
### Rationale for this change

Those labels are unnecessary once they are merged. There was a conversation on Zulip about removing them in the past.

### What changes are included in this PR?

Once we merge a PR we remove labels that starts with the PR workflow prefix `awaiting`.

### Are these changes tested?

I have tested the code against an old testing PR I have here: #35323
The label was removed successfully.

### Are there any user-facing changes?

No
* Closes: #36243

Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@raulcd
Copy link
Member Author

raulcd commented Dec 18, 2024

A different approach seems required as per #44855

@raulcd raulcd closed this Dec 18, 2024
@raulcd raulcd deleted the GH-33981 branch December 18, 2024 09:36
@pitrou
Copy link
Member

pitrou commented Dec 18, 2024

@raulcd Can you recreate this PR? This should be the preferred approach in the short term according to #44855 (comment)

for /f %%i in ('dir dist\pyarrow-*.whl /B') do set WHEEL_NAME=dist\%%i || exit /B 1
echo "Wheel name: %WHEEL_NAME%"
delvewheel repair %WHEEL_NAME% --add-path C:\arrow\python\build\bdist.win-amd64\wheel\pyarrow -w repaired_wheels || exit /B 1
delvewheel show %WHEEL_NAME% || exit /B 1
Copy link
Member

@pitrou pitrou Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be more informative to call delvewheel show before delvewheel repair?

@raulcd raulcd restored the GH-33981 branch December 18, 2024 15:58
@raulcd raulcd reopened this Dec 18, 2024
@raulcd
Copy link
Member Author

raulcd commented Dec 18, 2024

@raulcd Can you recreate this PR?

Reopened, will rebase and fix conflicts and try to get back to it

pip install delvewheel || exit /B 1
for /f %%i in ('dir dist\pyarrow-*.whl /B') do set WHEEL_NAME=dist\%%i || exit /B 1
echo "Wheel name: %WHEEL_NAME%"
delvewheel repair %WHEEL_NAME% --add-path C:\arrow\python\build\bdist.win-amd64\wheel\pyarrow -w repaired_wheels || exit /B 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the --add-path actually necessary?

@raulcd
Copy link
Member Author

raulcd commented Dec 19, 2024

@github-actions crossbow submit wheel-windows-cp39-amd64

This comment was marked as outdated.

Copy link

github-actions bot commented Jan 2, 2025

Revision: ea22e58

Submitted crossbow builds: ursacomputing/crossbow @ actions-f74951e72c

Task Status
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@pitrou
Copy link
Member

pitrou commented Jan 2, 2025

@github-actions crossbow submit wheel-windows*

Copy link

github-actions bot commented Jan 2, 2025

Revision: ea22e58

Submitted crossbow builds: ursacomputing/crossbow @ actions-40f1ed5d9d

Task Status
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp313-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@pitrou
Copy link
Member

pitrou commented Jan 2, 2025

Ok, I got this to work using the delvewheel changes in adang1345/delvewheel#59

@raulcd

@pitrou pitrou marked this pull request as ready for review January 2, 2025 18:04
@pitrou pitrou changed the title GH-33981: [Python][Packaging] Use delvewheel to repair Windows wheels GH-44855: [Python][Packaging] Use delvewheel to repair Windows wheels Jan 2, 2025
@amoeba
Copy link
Member

amoeba commented Jan 2, 2025

I did a quick test of the wheels crossbow has built here and this seems to fix the issue.

From reading the related issues, the only package I identified that ships msvcp140.dll was pyopenms==3.2.0 though I'm sure there are more. Other packages (pandas, numpy) ship renamed versions. pyopenms has since shipped a newer package release (3.2.0-1) which fixes the issue.

On Windows 11, (x86_64), I did the following to reproduce the issue:

  • Installed Python 3.12 (pyopenms doesn't have 3.13 wheels) with winget install Python.Python.3.12
  • Installed pyarrow (18.1.0) and pyopenms==3.2.0 using pip
  • Ran the following:
    PS C:\Users\Bryce> python
    Python 3.12.8 (tags/v3.12.8:2dc476b, Dec  3 2024, 19:30:04) [MSC v.1942 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pyarrow
    >>> import pyopenms
    
    ======================================================================
    Error when loading pyOpenMS libraries!
    Libraries could not be found / could not be loaded.
    
    To debug this error, please run ldd (on linux), otool -L (on macOS) or dependency walker (on windows) on
    
    C:\Users\Bryce\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyopenms\pyopenms*.so
    
    ======================================================================
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\Bryce\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyopenms\__init__.py", line 109, in <module>
        raise e
      File "C:\Users\Bryce\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyopenms\__init__.py", line 67, in <module>
        from ._all_modules import *  # pylint: disable=wildcard-import; lgtm(py/polluting-import)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\Users\Bryce\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyopenms\_all_modules.py", line 1, in <module>
        from ._pyopenms_1 import *  # pylint: disable=wildcard-import; lgtm(py/polluting-import)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    ImportError: DLL load failed while importing _pyopenms_1: A dynamic link library (DLL) initialization routine failed.
    

On the same system, I did the following to fix the issue:

  • Kept my Python 3.12 install as-is

  • Uninstalled pyarrow

  • Installed pyarrow from https://github.com/ursacomputing/crossbow/actions/runs/12585968702/artifacts/2379754483 (pyarrow-19.0.0.dev261-cp312-cp312-win_amd64.whl, MD5 eb6d57717f756bfd1f5d92f781c30d0b)

  • Ran the following:

    PS C:\Users\Bryce\Downloads\wheel> pip install .\pyarrow-19.0.0.dev261-cp312-cp312-win_amd64.whl
    Processing c:\users\bryce\downloads\wheel\pyarrow-19.0.0.dev261-cp312-cp312-win_amd64.whl
    Installing collected packages: pyarrow
    Successfully installed pyarrow-19.0.0.dev261
    PS C:\Users\Bryce\Downloads\wheel> import ^C
    PS C:\Users\Bryce\Downloads\wheel> python
    Python 3.12.8 (tags/v3.12.8:2dc476b, Dec  3 2024, 19:30:04) [MSC v.1942 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pyarrow
    >>> import pyopenms
    

I'll note the system I tested on does have a system-wide copy of msvcp140.dll but its presence didn't seem to affect anything here which makes sense (I think...).

Copy link
Member Author

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks @pitrou

ci/scripts/python_wheel_windows_build.bat Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jan 3, 2025
@pitrou
Copy link
Member

pitrou commented Jan 3, 2025

I'll note the system I tested on does have a system-wide copy of msvcp140.dll but its presence didn't seem to affect anything here which makes sense (I think...).

I tested on a Docker container without a system-wide msvcp140.dll, and importing the produced wheel works fine.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 3, 2025
@pitrou
Copy link
Member

pitrou commented Jan 3, 2025

@github-actions crossbow submit wheel-windows*

Copy link

github-actions bot commented Jan 3, 2025

Revision: 1405d04

Submitted crossbow builds: ursacomputing/crossbow @ actions-181f327725

Task Status
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp313-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@pitrou
Copy link
Member

pitrou commented Jan 3, 2025

@kou Would you like to take a look at this PR?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

We rename msvcp140.dll and use it so that we avoid msvcp140.dll conflict, right?

I confirmed the followings:

  • pyarrow-19.0.0.dev263-cp39-cp39-win_amd64.whl includes pyarrow/msvcp140-32679400f4881c06b606906b1620c877.dll not msvcp140.dll
  • pyarrow/*.dll use msvcp140-32679400f4881c06b606906b1620c877.dll not msvcp140.dll
$ unzip pyarrow-19.0.0.dev263-cp39-cp39-win_amd64.whl
$ find pyarrow -name 'msvc*.dll'
pyarrow/msvcp140-32679400f4881c06b606906b1620c877.dll
$ for x in pyarrow/*.dll; do echo $x; LANG=C objdump -p $x | grep -i msvcp; done
pyarrow/arrow.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/arrow_acero.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/arrow_dataset.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/arrow_flight.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/arrow_python.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/arrow_python_flight.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/arrow_python_parquet_encryption.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/arrow_substrait.dll
pyarrow/msvcp140-32679400f4881c06b606906b1620c877.dll
pyarrow/msvcp140-32679400f4881c06b606906b1620c877.dll:     file format pei-x86-64
Name 				0000000000066e46 MSVCP140.dll
(format RSDS signature 44a344afaacb486aad1c45e382407415 age 1 pdb D:\a\_work\1\s\binaries\amd64ret\bin\amd64\\msvcp140.amd64.pdb)
pyarrow/parquet.dll
	DLL Name: msvcp140-32679400f4881c06b606906b1620c877.dll

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jan 4, 2025
@pitrou
Copy link
Member

pitrou commented Jan 4, 2025

We rename msvcp140.dll and use it so that we avoid msvcp140.dll conflict, right?

Yes.

@pitrou
Copy link
Member

pitrou commented Jan 4, 2025

Thanks for the reviews, let's merge now!

@pitrou pitrou merged commit 3752109 into apache:main Jan 4, 2025
15 of 16 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants