You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue when trying to use JWT authentication with pandas file read methods, such as read_parquet. The problem arises due to the different ways headers need to be specified for HTTP(S) URLs when using pandas and fsspec.
Typically, the header for a request with JWT authentication looks like this:
{
"headers": {"Authorization": "Token XXXXXXX"}
}
When accessing files, storage_options is used to send these headers. However, there is an inconsistency in how pandas and fsspec handle these headers. While fsspec expects the storage options to include the "headers" key as shown above, pandas expects the key-value pairs to be forwarded directly as header options without the "headers" key
According to the pandas.read_parquet (applies to all read methods) documentation: For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options.
Thus, for HTTP connections, the storage options should be formatted as follows:
{
"Authorization": "Token XXXXXXX"
}
This discrepancy causes errors in pandas read methods on (file_io.py), such as read_parquet_file_to_pandas and ``.
Suggested Solution
To resolve this issue, I suggest this lines before methods that reads using pandas to correct just the headers in the storage_options. Here's the suggested code snippet:
I encountered an issue when trying to use JWT authentication with pandas file read methods, such as read_parquet. The problem arises due to the different ways headers need to be specified for HTTP(S) URLs when using pandas and fsspec.
Typically, the header for a request with JWT authentication looks like this:
When accessing files, storage_options is used to send these headers. However, there is an inconsistency in how pandas and fsspec handle these headers. While fsspec expects the storage options to include the "headers" key as shown above, pandas expects the key-value pairs to be forwarded directly as header options without the "headers" key
According to the
pandas.read_parquet
(applies to all read methods) documentation:For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options.
Thus, for HTTP connections, the storage options should be formatted as follows:
This discrepancy causes errors in pandas read methods on (file_io.py), such as
read_parquet_file_to_pandas
and ``.Suggested Solution
To resolve this issue, I suggest this lines before methods that reads using pandas to correct just the headers in the storage_options. Here's the suggested code snippet:
I tested this locally and it works.
Don't know if its better to create a function to not repeat the pattern.
The text was updated successfully, but these errors were encountered: