Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying dtypes in delete_from_iceberg for empty/null columns #3077

Open
tlinkin opened this issue Jan 21, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@tlinkin
Copy link

tlinkin commented Jan 21, 2025

Describe the bug

When calling wr.athena.delete_from_iceberg_table(...) on a DataFrame that contains one or more columns with entirely null values, AWS Wrangler is unable to infer the Athena data type and raises an UndetectedType exception. This happens because delete_from_iceberg_table writes out to a temporary table first, and the columns with only null values are typed as object without additional type information.

How to Reproduce

Traceback (most recent call last):
  ...
  File "/home/ray/anaconda3/lib/python3.12/site-packages/awswrangler/_data_types.py", line 663, in athena_types_from_pandas
    raise exceptions.UndetectedType(
awswrangler.exceptions.UndetectedType: Impossible to infer the equivalent Athena data type for the accounting_document column. It is completely empty (only null values) ...

Expected behavior

Enable passing explicit dtype (or similar schema specification) to delete_from_iceberg so that AWS Wrangler can properly set column types for temporary tables even if the columns are completely null. This mirrors the approach used in other Wrangler functions (e.g. wr.s3.to_parquet) which accept a dtype argument

Your project

No response

Screenshots

No response

OS

Ray

Python version

3.12

AWS SDK for pandas version

3.10.1

Additional context

Why is this needed?:

  • Currently, when working with partially or completely null columns, delete_from_iceberg fails because it cannot infer the data type automatically.
  • Being able to specify dtype would allow users to inform Wrangler about the correct column type, preventing the UndetectedType error.
@tlinkin tlinkin added the bug Something isn't working label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant