Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted rows modified when using UPDATE on partitioned table : CTID are not unique in partitioned tables #271

Open
2 tasks done
sco-atolcd opened this issue Nov 12, 2024 · 0 comments

Comments

@sco-atolcd
Copy link

What happens?

When Duckdb updates a Postgres partitioned table, it will also update rows that are not wanted.
After a bit of digging, i found that currently, ctid are used by Duckdb to target the rows to update.

https://github.com/duckdb/postgres_scanner/blob/03eaed75f0ec5500609b7a97aa05468493b229d1/src/storage/postgres_update.cpp#L49-L70

For standard table it works perfectly but not for partionned tables. Unfortunately, ctid are not unique for partitioned table. Instead, ctid are unique for each table that compose the partitioned table.

Here is my uneducated guess : This issue could be solved if the partionned keys are added to the final update statement.
It might be a bit tricky with multi level partition but doable.

To Reproduce

The postgresql partitioned table :

CREATE SCHEMA IF NOT EXISTS ctid_duplicate;

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table (
    id integer,
    category text,
    data text,
    PRIMARY KEY (id, category)
) PARTITION BY LIST (category);

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_a
    PARTITION OF ctid_duplicate.my_table
    FOR VALUES IN ('A');

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_b
    PARTITION OF ctid_duplicate.my_table
    FOR VALUES IN ('B');

CREATE TABLE IF NOT EXISTS ctid_duplicate.my_table_category_c
    PARTITION OF ctid_duplicate.my_table
    FOR VALUES IN ('C');

INSERT INTO ctid_duplicate.my_table (id, category, data)
VALUES
    (1, 'A', 'Data for category A'),
    (2, 'B', 'Data for category B'),
    (3, 'C', 'Data for category C'),
    (4, 'A', 'Another data for category A');

A simple simple select let us see that ctid aren't unique

SELECT ctid, id, category, "data"
FROM ctid_duplicate.my_table;

image

Now to actually test the issue with DuckDB :

ATTACH if not exists 'postgresql://sco:sco@localhost:5432/rece' as postgres_rece (type POSTGRES);

update postgres_rece.ctid_duplicate.my_table 
set data = 'CTID are duplicated'
where category = 'A';

Let's check again the table, only 2 rows should have been updated :

select * from postgres_rece.ctid_duplicate.my_table;

The results :
image

OS:

ubuntu 22.04

PostgreSQL Version:

15.8

DuckDB Version:

1.2.1

DuckDB Client:

Python

Full Name:

Simon Cotte

Affiliation:

AtolCD

Have you tried this on the latest main branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant