How can I re-write column names when reading from an Excel file? #242

tboddyspargo · 2024-01-30T15:28:28Z

tboddyspargo
Jan 30, 2024

We would like to be able to apply some "column name normalization" logic to the headers in an Excel sheet that are quite similar to the logic applied by read_csv_auto (e.g. spaces become underscores). However, I'm not sure that st_read or GDAL provide any options for achieving this.

With CSV files, I can do: SELECT * FROM read_csv_auto('s3://bucket/path.csv', names=['custom_name1', 'custom_name2']) to set my own column names to use. Without that, the columns names would also be normalized according to read_csv_auto's own logic (which I like).

However, with SELECT * FROM st_read('s3://bucket/path.xlsx', layer='sheet 1', open_options=['HEADERS=force']), no normalization logic is applied and I don't see an obvious way for me to provide my own custom column names.

Any suggestions for how to enforce some column name normalization logic would be greatly appreciated!

Maxxen · 2024-01-30T15:57:03Z

Maxxen
Jan 30, 2024
Maintainer

No, this is not possible in the underlying GDAL driver. But you could provide your own column names by simply aliasing the table expression:

SELECT * FROM st_read('./google_sheets.xlsx', max_batch_size = 100) 
AS  my_table(custom_name1,custom_name2,custom_name3);

┌──────────────┬──────────────┬──────────────┐
│ custom_name1 │ custom_name2 │ custom_name3 │
│    double    │   varchar    │   varchar    │
├──────────────┼──────────────┼──────────────┤
│        123.0 │ 500,60       │ ABC          │
│        456.0 │ 300,40       │ some text    │
└──────────────┴──────────────┴──────────────┘

0 replies

Maxxen · 2025-02-06T09:00:24Z

Maxxen
Feb 6, 2025
Maintainer

Hello!

As of DuckDB 1.2.0 the duckdb excel extension provides support for reading and writing xlsx files with greater efficiency and many more options. Therefore we're going to deprecate this feature in the spatial extension in the future. I would recommend that you try out the excel extension, and if you run into any problems please open an issue over at the duckdb-excel repository.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I re-write column names when reading from an Excel file? #242

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How can I re-write column names when reading from an Excel file? #242

tboddyspargo Jan 30, 2024

Replies: 2 comments

Maxxen Jan 30, 2024 Maintainer

Maxxen Feb 6, 2025 Maintainer

tboddyspargo
Jan 30, 2024

Maxxen
Jan 30, 2024
Maintainer

Maxxen
Feb 6, 2025
Maintainer