Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45129: [Python][C++] Fix usage of deprecated C++ functionality on pyarrow #45189

Merged
merged 4 commits into from
Jan 21, 2025

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented Jan 7, 2025

Rationale for this change

We are using two C++ deprecated APIs:

  • we are using decimal instead of smallest_decimal
  • we are using Arrow:Status on GetRecordBatchReader instead of Arrow::Result

What changes are included in this PR?

Update code to use non deprecated functions.

Are these changes tested?

Yes via CI with existing tests.

Are there any user-facing changes?

No

Copy link

github-actions bot commented Jan 7, 2025

⚠️ GitHub issue #45129 has been automatically assigned in GitHub to PR creator.

@raulcd
Copy link
Member Author

raulcd commented Jan 7, 2025

@github-actions crossbow submit -g python

Copy link

github-actions bot commented Jan 7, 2025

Revision: c1deb19

Submitted crossbow builds: ursacomputing/crossbow @ actions-d40040ac1a

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@raulcd
Copy link
Member Author

raulcd commented Jan 7, 2025

The cython2 failure is related:

  In file included from /build/python/pyarrow/src/arrow/python/async.h:22,
                   from /build/python/build/temp.linux-x86_64-cpython-310/_parquet.cpp:830:
  /build/python/pyarrow/src/arrow/python/common.h: In instantiation of 'arrow::py::SmartPtrNoGIL<SmartPtr, Ts>& arrow::py::SmartPtrNoGIL<SmartPtr, Ts>::operator=(V&&) [with V = std::unique_ptr<arrow::RecordBatchReader>&; SmartPtr = std::unique_ptr; Ts = {arrow::RecordBatchReader}]':
  /build/python/build/temp.linux-x86_64-cpython-310/_parquet.cpp:25556:56:   required from here
  /build/python/pyarrow/src/arrow/python/common.h:263:20: error: use of deleted function 'std::unique_ptr<_Tp, _Dp>& std::unique_ptr<_Tp, _Dp>::operator=(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::RecordBatchReader; _Dp = std::default_delete<arrow::RecordBatchReader>]'
    263 |     Base::operator=(std::forward<V>(v));
        |     ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
  In file included from /opt/conda/envs/arrow/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/memory:78,
                   from /build/python/build/temp.linux-x86_64-cpython-310/_parquet.cpp:770:
  /opt/conda/envs/arrow/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/bits/unique_ptr.h:523:19: note: declared here
    523 |       unique_ptr& operator=(const unique_ptr&) = delete;
        |                   ^~~~~~~~

the following patch would work but I don't think this should be the fix, as I am pretty sure std::forward should be used here for other operators:

diff --git a/python/pyarrow/src/arrow/python/common.h b/python/pyarrow/src/arrow/python/common.h
index 4a7886695e..30acf0d6ed 100644
--- a/python/pyarrow/src/arrow/python/common.h
+++ b/python/pyarrow/src/arrow/python/common.h
@@ -260,7 +260,7 @@ class SmartPtrNoGIL : public SmartPtr<Ts...> {
   template <typename V>
   SmartPtrNoGIL& operator=(V&& v) {
     auto release_guard = optional_gil_release();
-    Base::operator=(std::forward<V>(v));
+    Base::operator=(std::move(v));
     return *this;
   }

@raulcd
Copy link
Member Author

raulcd commented Jan 7, 2025

@github-actions crossbow submit test-conda-python-3.10-cython2

Copy link

github-actions bot commented Jan 7, 2025

Revision: 07f7acd

Submitted crossbow builds: ursacomputing/crossbow @ actions-3db166d478

Task Status
test-conda-python-3.10-cython2 GitHub Actions

@raulcd
Copy link
Member Author

raulcd commented Jan 7, 2025

@github-actions crossbow submit -g python

@raulcd raulcd marked this pull request as ready for review January 7, 2025 15:22

This comment was marked as outdated.

@raulcd
Copy link
Member Author

raulcd commented Jan 7, 2025

@github-actions crossbow submit -g python

Copy link

github-actions bot commented Jan 7, 2025

Revision: babbe9f

Submitted crossbow builds: ursacomputing/crossbow @ actions-f876b9ef9d

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@raulcd raulcd requested a review from pitrou January 7, 2025 16:38
python/pyarrow/_parquet.pxd Outdated Show resolved Hide resolved
@raulcd
Copy link
Member Author

raulcd commented Jan 8, 2025

@github-actions crossbow submit test-conda-python-3.10-cython2

Copy link

github-actions bot commented Jan 8, 2025

Revision: f292cc1

Submitted crossbow builds: ursacomputing/crossbow @ actions-58dca82b3c

Task Status
test-conda-python-3.10-cython2 GitHub Actions

@pitrou
Copy link
Member

pitrou commented Jan 13, 2025

Ok, so the shared_ptr version works but not the unique_ptr one? In this case, we can keep using the shared_ptr approach.

@pitrou
Copy link
Member

pitrou commented Jan 13, 2025

By the way, why do we want to support Cython 2? @jorisvandenbossche

@jorisvandenbossche
Copy link
Member

I would say we can drop the cython 2 support.

@raulcd
Copy link
Member Author

raulcd commented Jan 13, 2025

I would say we can drop the cython 2 support.

I created an issue to track that as it is not the first time we say the same. I'll do and then I'll rebase this one. I'll ping to merge once CI is green again.

@raulcd
Copy link
Member Author

raulcd commented Jan 17, 2025

@github-actions crossbow submit -g python

Copy link

Revision: 0a1d214

Submitted crossbow builds: ursacomputing/crossbow @ actions-74244302df

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@raulcd
Copy link
Member Author

raulcd commented Jan 17, 2025

CI failures are unrelated, they are failing on main and I've opened individual issues to track them.

@pitrou I've rebased after the required cython bump. This should be ready to review/merge now that there's no CI failure.

@raulcd raulcd requested a review from pitrou January 17, 2025 12:44
@raulcd raulcd merged commit 984519d into apache:main Jan 21, 2025
14 checks passed
@raulcd raulcd removed the awaiting committer review Awaiting committer review label Jan 21, 2025
@raulcd raulcd deleted the GH-45129 branch January 21, 2025 17:26
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 984519d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants