You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a simple parquet file with two columns (types - bigint and varchar[] -in postgres, INT64 and BYTE_ARRAY in parquet)
When I try to write the data to postgres using the postgres connector, there is data loss happening and not all of the data is making it to postgres.
I am able to successfully able to query the parquet in duckdb itself. (Even the csv export works well)
To Reproduce
ATTACH 'dbname=<dbname> port=<port> user=<user> host=<host> password=<pass>' AS db (TYPE POSTGRES);
SELECT * FROM 'https://github.com/arpit94/duckdb/raw/main/data/parquet-testing/npi.parquet' where npi = 1003000126;
CREATE OR REPLACE TABLE db.public.my_table as FROM 'https://github.com/arpit94/duckdb/raw/main/data/parquet-testing/npi.parquet';
SELECT * FROM db.public.my_table where npi = 1003000126;
COPY (SELECT * FROM 'https://github.com/arpit94/duckdb/raw/main/data/parquet-testing/npi.parquet') TO 'output.csv' (HEADER, DELIMITER ',');
SELECT * FROM 'output.csv' WHERE npi = 1003000126;
Thanks for the report! I've pushed a fix in #254 - the issue was that we were not resetting an intermediate state correctly leading to additional NULL values creeping in.
What happens?
I have a simple parquet file with two columns (types - bigint and varchar[] -in postgres, INT64 and BYTE_ARRAY in parquet)
When I try to write the data to postgres using the postgres connector, there is data loss happening and not all of the data is making it to postgres.
I am able to successfully able to query the parquet in duckdb itself. (Even the csv export works well)
To Reproduce
The same thing works with csv format
OS:
Ubuntu
DuckDB Version:
1.0.0
DuckDB Client:
CLI tool
Full Name:
Arpit Aggarwal
Affiliation:
Candor Health
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: