Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Timestamp to align with ORC spec #26

Open
Jefffrey opened this issue Mar 31, 2024 · 3 comments
Open

Fix Timestamp to align with ORC spec #26

Jefffrey opened this issue Mar 31, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request medium Medium priority

Comments

@Jefffrey
Copy link
Collaborator

ORC timestamp type is not straightforward, as though it apparently represents a timestamp without a timezone, its encoding & decoding is still dependent on the writer timezone (encoded in the stripe) and the reader timezone.

This is why the 1900 & 2038 integration tests are currently disabled as our implementation is incorrect:

https://github.com/datafusion-contrib/datafusion-orc/blob/3369b5d7bb110dafde64288416c86bc8c996e465/tests/integration/main.rs#L155-L165

@Jefffrey Jefffrey self-assigned this Mar 31, 2024
@Jefffrey
Copy link
Collaborator Author

Jefffrey commented Mar 31, 2024

FYI I've raised a PR to update ORC spec for timestamp: apache/orc#1867

Since that was what caused initial confusion for me, as some key implementation details were not mentioned in the spec leading to the initial inaccurate implementation

Also worth noting how arrow deals with this:

https://github.com/apache/arrow/blob/24feab091ab5a05b1cec234f51bd0223e2c41487/cpp/src/arrow/adapters/orc/adapter.cc#L178-L181

apache/arrow#34590

@Jefffrey
Copy link
Collaborator Author

Jefffrey commented Mar 31, 2024

Note that the 2038 test is dependent on chronotope/chrono-tz#155 as it tests timestamps 2100 and beyond, which chrono_tz currently has inaccurate DST for

Edit: chronotope/chrono-tz#135

@Jefffrey
Copy link
Collaborator Author

Partially resolved by datafusion-contrib/datafusion-orc@60288cd

Still pending chrono_tz fix for large date DST

@Jefffrey Jefffrey added enhancement New feature or request medium Medium priority labels Apr 1, 2024
@waynexia waynexia transferred this issue from datafusion-contrib/datafusion-orc Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request medium Medium priority
Projects
None yet
Development

No branches or pull requests

1 participant