-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong and inconsistent behavior when reading many NULLs #21
Comments
Thanks for the report! I would like to investigate this but that is difficult without a reproduction. If you can't make a reproduction with a local MySQL instance could you perhaps provide the dataset that triggers this on Aurora? This might be dependent on the exact MySQL version or configuration combined with the exact values in the column. |
Thank you for the quick response! Yeah, no, unfortunately I can't share the data, as this is company-owned. Instead, I'll try and spin up an RDS-Aurora database that mimics the one the error occurs in my private AWS account. Hopefully I can reproduce and will also be able to share credentials for you to explore yourself. I'll keep you posted :) |
Okay so I was able to reproduce it. The steps I took:
It's also noteworthy that I tried the steps above in both MySQL Server on localhost and AWS RDS Aurora, which gave the same results. This means that the issue is probably not related to the type of MySQL server. I attach the file with the data for you to explore: strange_data.csv Also, here are the queries I ran to load the csv into my localhost server: LOAD DATA LOCAL INFILE 'C:/path/to/strange_data.csv'
INTO TABLE strange_data
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- This was necessary because MySQL loads empty entries as 0 instead of NULL
SET SQL_SAFE_UPDATES = 0;
UPDATE strange_data
SET deleted_at = NULL
WHERE deleted_at = 0; Hope this helps :) |
…to prevent setting non-NULL values to NULL
Thanks for the reproduction! This has been very helpful - I've managed to track it down and have pushed what I believe to be a fix in #23. |
Ooh that was super quick, thanks so much for working on this so promptly 😊 And aah so it was about the large number of NULLs in the data and not about the datetime parsing. I'll try to reflect that in the issue title. Fingers crossed for the solution 🤞 |
Fix #21: correctly call Reset on the varchar chunk in the MySQL scan to prevent setting non-NULL values to NULL
This should be fixed now in the dev build, give it a shot: force install mysql from 'http://nightly-extensions.duckdb.org'; If it works let me know then I will release a new version of the extension with the fix. |
I just tried and it works as expected now. Thanks again for resolving 😊 |
What happens?
We have a MySQL database running on AWS RDS Aurora. When I make queries that filter on a
DATETIME
column, the results are wrong and even inconsistent.To me it seems that duckdb somehow does not manage to convert the MySQL
DATETIME
entries incorrectly and just converts them to NULLs, even if there actually are values in there.Thanks in advance for your help and let me know if/how I can support.
To Reproduce
First of all, my sincere apologies. I have not been able to reproduce the behavior on a local MySQL instance. But the behavior is so strange that I think it warrants an investigation.
Here is what I did so far:
Connect:
We have a table named
some_table
that looks like the following:In the actual MySQL database, the
deleted_at
column is of typeDATETIME
.Generally, we use the
deleted_at
column to determine if an entry is still relevant.So it's an important filter condition to see if there is a NULL or not.
This is where duckdb seems to get confused
COUNT(*)
indicates that there are zero non-nulldeleted_at
entries in our tableBut
SELECT *
delivers 14 rows that clearly have non-null entries in thedeleted_at
columnAnd if I only
SELECT deleted_at
, the entries are gone againIf we run the same queries with a different MySQL client (I tried polars and MySQL Workbench)
we learn that there are, in reality,
113
non-nulldeleted_at
entries in our tablewhich is something else entirely 😵
(I show polars here, but other mysql clients give the same result)
I also tried to export to parquet directly:
With mixed results:
OS:
Windows 10 / Linux Ubuntu 22.04.3
MySQL Version:
8.0
DuckDB Version:
0.9.2
DuckDB Client:
Python, CLI
Full Name:
Simon Sommer
Affiliation:
Simpego
Have you tried this on the latest
main
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: