Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disappearing data when copying a join result. #211

Closed
2 tasks done
mima-hlavacek opened this issue Apr 11, 2024 · 1 comment
Closed
2 tasks done

Disappearing data when copying a join result. #211

mima-hlavacek opened this issue Apr 11, 2024 · 1 comment

Comments

@mima-hlavacek
Copy link
Contributor

What happens?

When I insert a result of a join containing some particular column configuration from duckdb to postgres, it ends in data not making it through to the postgres side.

To Reproduce

The following script is about as minimal as I could reasonably get it. I tried to annotate it so that you can see what can be changed and what can not.

create table other_properties(
    an_id text, -- This can be of any type
    has_to_be_there int[] -- This has to be a list, doesn't matter of what
)
;
                                            
create table some_properties(
    an_id text, 
    disappearing_data int -- ^ this can be of any type and it still disappears
)
;
                                            
insert into other_properties values
('2', null)
;

insert into some_properties values
('2', null),
('1', 1),
;

attach 'dbname=elephant port=5656' as postgres (type postgres);

call postgres_execute(
    'postgres', 
    '
        drop table if exists foo;
        create table foo(
            an_id text,
            has_to_be_there int[],
            disappearing_data int,
            mac macaddr -- This has to be macaddr or inet, otherwise the bug goes away
        )
    '
);

-- This is how it should look like:
select
    some_properties.an_id,
    other_properties.has_to_be_there,
    some_properties.disappearing_data,
    null::varchar as mac
from
    some_properties
    left join other_properties on
        some_properties.an_id = other_properties.an_id
;

-- This is how it actually looks:
insert into postgres.foo
select
    some_properties.an_id,
    other_properties.has_to_be_there, -- If this is made constant, the bug goes away
    some_properties.disappearing_data,
    null::varchar as mac
from
    some_properties
    left join other_properties on
        some_properties.an_id = other_properties.an_id
;

select *    
from postgres.foo
;

The "this is how it should look like" query gives

┌─────────┬─────────────────┬───────────────────┬─────────┐
│  an_id  │ has_to_be_there │ disappearing_data │   mac   │
│ varchar │     int32[]     │       int32       │ varchar │
├─────────┼─────────────────┼───────────────────┼─────────┤
│ 2       │                 │                   │         │
│ 1       │                 │                 1 │         │
└─────────┴─────────────────┴───────────────────┴─────────┘

, but the select * from postgres.foo results in

┌─────────┬─────────────────┬───────────────────┬─────────┐
│  an_id  │ has_to_be_there │ disappearing_data │   mac   │
│ varchar │     int32[]     │       int32       │ varchar │
├─────────┼─────────────────┼───────────────────┼─────────┤
│ 2       │                 │                   │         │
│ 1       │                 │                   │         │
└─────────┴─────────────────┴───────────────────┴─────────┘

The value of disappearing data is nowhere to be found.

Interestingly you can make the bug go away by changing all bunch of stuff. You can change the data type of mac to something ordinary, you can put null instead of referencing the has_to_be_there column. On the other hand, the bug stays regardless of the type of disappearing_data and regardless of the type of an_id. The type of has_to_be_there matters a bit - as long as it is a list, the bug manifests.

Lastly, you can make the bug go away by materializing the join result first like so:

insert into postgres.foo
with
    d as materialized (
        select
            some_properties.an_id,
            other_properties.has_to_be_there,
            some_properties.disappearing_data,
            null::varchar as mac
        from
            some_properties
            left join other_properties on
                some_properties.an_id = other_properties.an_id
    )
select *
from d
;

OS:

Ubuntu x64 in WSL on Windows 11

PostgreSQL Version:

15.3

DuckDB Version:

v0.10.1

DuckDB Client:

cli and python

Full Name:

Míma Hlaváček

Affiliation:

Blindspot.ai

Have you tried this on the latest main branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@Mytherin
Copy link
Contributor

Mytherin commented Sep 3, 2024

Thanks for the report!

This seems to be the same underlying issue as #242 - I've run your test and can confirm it is fixed after #254 as well

@Mytherin Mytherin closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants