Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg: add returning_id option to parallel_execute #181

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cawo-odoo
Copy link
Contributor

The default behavior is unchanged.

This adds the possibility to parallelize modifying queries that have a RETURNING id clause. For those, return the resulting ids (in a defined order) instead of the affected row count.

To avoid misuse add a warning to the docstring and try to detect queries other than the ones of the intended form. Raise an error if such are found.

@robodoo
Copy link
Contributor

robodoo commented Jan 2, 2025

Pull request status dashboard

@cawo-odoo cawo-odoo force-pushed the master-imp_pg_add_returning_id_to_parallel_execute-cawo branch 2 times, most recently from 66f85ef to 5f07404 Compare January 2, 2025 09:49
The default behavior is unchanged.

This adds the possibility to parallelize modifying queries that have a
`RETURNING id` clause. For those, return the resulting ids (in a defined order)
instead of the affected row count.

To avoid misuse add a warning to the docstring and try to detect queries other
than the ones of the intended form. Raise an error if such are found.
@cawo-odoo cawo-odoo force-pushed the master-imp_pg_add_returning_id_to_parallel_execute-cawo branch from 5f07404 to 343aa7e Compare January 2, 2025 09:54
Copy link
Contributor

@Pirols Pirols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a draft, couldn't help peeking.

Could you further elaborate on the need to limit this to returning ids only?

@@ -154,15 +154,20 @@ def parallel_execute(cr, queries, logger=_logger):

:param list(str) queries: list of queries to execute concurrently
:param `~logging.Logger` logger: logger used to report the progress
:return: the sum of `cr.rowcount` for each query run
:param bool returning_id: wether to return a tuple of affected ids (default: return affected row count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param bool returning_id: wether to return a tuple of affected ids (default: return affected row count)
:param bool returning_id: whether to return a tuple of affected ids (default: return affected row count)

- As a side effect, the cursor will be committed.
- It would not be generally safe to use this function for selecting queries. Because of this,
`returning_id=True` is only accepted for `UPDATE/DELETE/INSERT/MERGE [...] RETURNING id` queries. Also, the
caller cannot influnce the order of the returned result tuples, it is always sorted in ascending order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
caller cannot influnce the order of the returned result tuples, it is always sorted in ascending order.
caller cannot influence the order of the returned result tuples, it is always sorted in ascending order.

raise ValueError("The returning_id parameter can only be used with certain queries.")

res = parallel_execute_impl(cr, queries, logger, returning_id)
return tuple(sorted([id for (id,) in res])) if returning_id else res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return tuple(sorted([id for (id,) in res])) if returning_id else res
return tuple(id for (id,) in res) if returning_id else res

Why sorting these?

Comment on lines +181 to +184
if returning_id:
returning_id_re = re.compile(r"(?s)(?:UPDATE|DELETE|INSERT|MERGE).*RETURNING\s+\S*\.?id\s*$")
if not all((bool(returning_id_re.search(q)) for q in queries)):
raise ValueError("The returning_id parameter can only be used with certain queries.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if returning_id:
returning_id_re = re.compile(r"(?s)(?:UPDATE|DELETE|INSERT|MERGE).*RETURNING\s+\S*\.?id\s*$")
if not all((bool(returning_id_re.search(q)) for q in queries)):
raise ValueError("The returning_id parameter can only be used with certain queries.")
returning_id_re = re.compile(r"(?s)(?:UPDATE|DELETE|INSERT|MERGE).*RETURNING\s+\S*\.?id\s*$")
returning_id = all((bool(returning_id_re.search(q)) for q in queries))

How about we automatically detect whether the queries at hand are returning something?

return parallel_execute_impl(cr, queries, logger=_logger)

if returning_id:
returning_id_re = re.compile(r"(?s)(?:UPDATE|DELETE|INSERT|MERGE).*RETURNING\s+\S*\.?id\s*$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
returning_id_re = re.compile(r"(?s)(?:UPDATE|DELETE|INSERT|MERGE).*RETURNING\s+\S*\.?id\s*$")
returning_id_re = re.compile(r"(?s)(?:UPDATE|DELETE|INSERT|MERGE).*RETURNING\s+(?:\w+\.)?id\s*$")

Change \S*\.? to (?:\w+\.)? to avoid false positive like "UPDATE ... RETURNING name,id" and "DELETE FROM ... RETURNING report_id".

@aj-fuentes
Copy link
Contributor

For the record: I think this is not necessary. See more comments in the linked PR in upgrades.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants