You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are generating queries to read very wide parquet files as part of a Census data extraction service. Through experimentation I've determined the limit of named columns in a select clause is 16256. When this is exceeded we get
We realize this is a very large number of columns, and 99.5% of our workload is well under 16256 and runs very well. It would be nice if we could get a good error message, or have a way to increase the maximum width of a result with a setting.
It looks like there's a problem with the memory allocation code, and the actual physical memory of our machines aren't anywhere near used up when I monitor the process. When the column number is 14000 the memory use is pretty low for example.
I can reproduce the issue on version 1.1.1 (the one bundled with the Rust package.) With the CLI I can get a similar problem on v1.1.1 and v1.1.3. With the CLI it tries to dump to a temp file and then just gets stuck. The size of the temp file in .tmp is just 256kb.
When I test on the nightly build downloaded today I get a different error:
ccd@gp2000:~/nhgis-extract-engine/rust$ ~/duckdb/duckdb < test_query.sql
Floating point exception (core dumped)
ccd@gp2000:~/nhgis-extract-engine/rust$
Due to the nature of the problem the query and data file to reproduce the issue are both extremely large but I'm willing to share.
After looking at the other duckdb-rs issues I realized this may belong in the general issues list. The main difference with Rust is that we get the hard stop on the assertion failure which we wouldn't see when using the CLI application.
We are generating queries to read very wide parquet files as part of a Census data extraction service. Through experimentation I've determined the limit of named columns in a select clause is 16256. When this is exceeded we get
We realize this is a very large number of columns, and 99.5% of our workload is well under 16256 and runs very well. It would be nice if we could get a good error message, or have a way to increase the maximum width of a result with a setting.
It looks like there's a problem with the memory allocation code, and the actual physical memory of our machines aren't anywhere near used up when I monitor the process. When the column number is 14000 the memory use is pretty low for example.
I can reproduce the issue on version 1.1.1 (the one bundled with the Rust package.) With the CLI I can get a similar problem on v1.1.1 and v1.1.3. With the CLI it tries to dump to a temp file and then just gets stuck. The size of the temp file in .tmp is just 256kb.
When I test on the nightly build downloaded today I get a different error:
Due to the nature of the problem the query and data file to reproduce the issue are both extremely large but I'm willing to share.
test_query.txt
The text was updated successfully, but these errors were encountered: