All notable changes to this project will be documented in this file.
- Define Hudi error types across hudi-core (#124) by @gohalo
- Support filter pushdown for datafusion (#203) by @jonathanc-n
- Add demo app and integration tests (#226) by @xushiyan
- Add
TimelineSelector
to support timeline loading (#233) by @xushiyan - Add
hoodie.read.listing.parallelism
config (#235) by @xushiyan - Support row filters for
FileGroupReader
(#237) by @xushiyan - Implement incremental query for COW tables (#236) by @xushiyan
- Implement log file reader for parquet log block (#244) by @xushiyan
- Implement basic record merge semantics (#249) by @xushiyan
- Add APIs for MOR snapshot reads (#247) by @xushiyan
- Support time travel query for MOR tables (#256) by @xushiyan
- Support incremental read MOR tables (#258) by @xushiyan
- Support MOR read-optimized query (#259) by @xushiyan
- Support reading MOR with rollback (#264) by @xushiyan
- Align python table APIs with rust (#267) by @xushiyan
- Add APIs to support incremental query impl (#272) by @xushiyan
- Simplify partition filter format by taking tuple of strings (#170) by @kazdy
- Improve api to get file slices splits (#185) by @xushiyan
- Handle schema retrieval for datafusion api (#187) by @xushiyan
- Include commit_seqno for merge order (#250) by @xushiyan
- Format Hudi config enum should show the full config key (#254) by @Kunal-Singh-Dadhwal
- Derive record merge strategy based on table configs (#260) by @xushiyan
- Handle as-of timestamp for excluding file groups (#268) by @xushiyan
- Build up incremental file groups (#273) by @xushiyan
- Reorganize custom error types (#215) by @xushiyan
- Add API stubs for performing incremental queries (#220) by @xushiyan
- Enhance
Filter
and related structs (#221) by @xushiyan - Improve
TimelineSelector
API (#234) by @xushiyan - Improve
BaseFile
APIs (#239) by @xushiyan - Improve file system view's listing flow (#251) by @xushiyan
- Use static MetaField schema for incr query (#252) by @xushiyan
- Rename crate
hudi-tests
tohudi-test
(#262) by @xushiyan - Remove use of
Filter
from public APIs (#266) by @xushiyan
- Update README examples (#194) by @xushiyan
- Update release and dev guides (#195) by @xushiyan
- Add example to
hudi-datafusion
crate (#202) by @jonathanc-n - Add
CREATE EXTERNAL TABLE
example in datafusion crate (#213) by @jonathanc-n - Clarify issues in the dev guide (#224) by @xushiyan
- Add in-code docs for
FileGroup
(#269) by @xushiyan - Update
README.md
to show table API examples (#274) by @xushiyan
- (deps) Bump codecov/codecov-action from 4 to 5 (#184) by @dependabot[bot]
- (deps) Upgrade datafusion and object store (#182) by @kazdy
- (deps) Upgrade datafusion to 42.2.0 (#192) by @xushiyan
- (deps) Upgrade Datafusion, Arrow, and Rust versions (#197) by @jonathanc-n
- (deps) Update pyo3 requirement from 0.22.2 to 0.22.4 (#212) by @jonathanc-n
- (deps) Clean up dependencies (#240) by @xushiyan
- (dep) Upgrade rustc, arrow, and tarpaulin setting (#276) by @xushiyan
- Update release script and guide (#200) by @xushiyan
- Update changelog for 0.2.0 (#201) by @xushiyan
- Update pull request guidelines for contributors (#204) by @jonathanc-n
- Add more dev commands and update the project's short description (#217) by @xushiyan
- Update codecov threshold (#222) by @xushiyan
- Update codecov config (#245) by @xushiyan
- Update codecov-action to v5 (#248) by @K-dash
- (ci) Add rust dependency caching with rust-cache action (#265) by @K-dash
- Fix src verify script (#279)
- Update release guide and issue templates (#282)
-
@K-dash made their first contribution in #265
-
@Kunal-Singh-Dadhwal made their first contribution in #254
-
@jonathanc-n made their first contribution in #203
- Support loading hudi global configs (#118) by @zzhpro
- Add base file records' in-memory size to
FileStats
(#140) by @xushiyan - Support partition prune api (#119) by @KnightChess
- Add partition filter arg in Python APIs (#153) by @xushiyan
- Add
HudiFileGroupReader
with consolidated APIs to read records (#164) by @xushiyan - Add
TableBuilder
API for creatingTable
instances (#163) by @kazdy - Implement datafusion
TableProviderFactory
(#162) by @kazdy
- Register object store with datafusion (#107) by @abyssnlp
- Handle validating table when
DropsPartitionFields
not present (#142) by @xushiyan - Make partition loading more efficient (#152) by @xushiyan
- Simplify partition filter format by taking tuple of strings (#170)
- Improve api to get file slices splits (#185)
- Handle schema retrieval for datafusion api (#187)
- Extract common test code for creating table (#117) by @gohalo
- Improve APIs for handling options (#161) by @xushiyan
- Improve
TableBuilder
API for taking single option (#171) by @xushiyan - Minor improvement to fix coverage report status (#173) by @xushiyan
- Update readme logo and example (#65) by @xushiyan
- Update in-code comments (#132) by @KnightChess
- Add hudi core API docs with examples (#113) by @KnightChess
- Add in-code docs to hudi-core APIs (#166) by @xushiyan
- Add python binding docstrings (#169) by @kazdy
- Add step-by-step release guide (#66) by @xushiyan
- Enforce Python code style (#101) by @muyihao
- Use exact versions for arrow and datafusion (#105) by @xushiyan
- Bump up datafusion to version 41, arrow to 52.2 (#120) by @yjshen
- (deps) Update zip-extract requirement from 0.1.3 to 0.2.1 (#130) by @dependabot[bot]
- (deps) Upgrade datafusion, pyarrow, pyo3, python versions (#149) by @kazdy
- (deps) Upgrade arrow dependencies (#168) by @kazdy
- (release) Bump version to 0.2.0-rc.1
- (deps) Upgrade datafusion and object store (#182)
- (deps) Upgrade datafusion to 42.2.0 (#192)
- (release) Bump version to 0.2.0-rc.2
- Improve release scripts (#68) by @xushiyan
- Add
CHANGELOG.md
with git-cliff config (#69) by @xushiyan - Configure labeler for PRs from forked repos (#83) by @xushiyan
- Fix labeler config (#85) by @xushiyan
- Fix labeler config for dev-x (#87) by @xushiyan
- Merge python code coverage report with rust (#67) by @xushiyan
- Add pull request template (#89) by @xushiyan
- Enable dependabot (#94) by @xushiyan
- Add path ignore files for ci workflow (#93) by @abyssnlp
- Improve workflows for code checking and PR (#110) by @xushiyan
- Disable labeler due to permission and policy (#115) by @xushiyan
- (ci) Fix PR title linting to support change scope (#138) by @kazdy
- Add feature request template for GH issues (#167) by @kazdy
-
@KnightChess made their first contribution in #119
-
@gohalo made their first contribution in #117
-
@zzhpro made their first contribution in #118
-
@yjshen made their first contribution in #120
-
@abyssnlp made their first contribution in #107
-
@muyihao made their first contribution in #101
- Initial rust implementation to integrate with datafusion (#1) by @xushiyan
- Add python binding (#21) by @xushiyan
- Implement
HudiTable
as python API (#23) by @xushiyan - Use
object_store
for common storage APIs (#25) by @xushiyan - Implement Rust and Python APIs to read file slices (#28) by @xushiyan
- Add APIs for time-travel read (#33) by @xushiyan
- Implement datafusion API using ParquetExec (#35) by @xushiyan
- Add
HudiConfigs
for parsing and managing named configs (#37) by @xushiyan - Add config validation when creating table (#49) by @xushiyan
- Add internal config to skip validation (#51) by @xushiyan
- Support time travel with read option (#52) by @xushiyan
- Support taking env vars for cloud storages (#55) by @xushiyan
- Handle replacecommit for loading file slices (#53) by @xushiyan
- Use
anyhow
for generic errors (#26) by @xushiyan - Use
object_store
API for Timeline (#27) by @xushiyan - Make APIs async (#31) by @xushiyan
- Improve thread safety and error handling (#32) by @xushiyan
- Improve error handling in storage module (#34) by @xushiyan
- Adjust table APIs to skip passing options (#56) by @xushiyan
- Update readme, contributing guide, and issue template (#57) by @xushiyan
- Update CONTRIBUTING with minor changes (#58) by @codope
- Enforce rust code style (#14) by @xushiyan
- Clean up and trim down dependencies (#54) by @xushiyan
- Add info for rust and python artifacts (#60) by @xushiyan
- Add release workflow (#63) by @xushiyan
- Add tests crate and adopt testing tables (#30) by @xushiyan
- Add test cases for different table setup (#36) by @xushiyan
- Setup ci for license file and headers (#2) by @xushiyan
- Fix failing check and test case (#10) by @xushiyan
- Fix asf notification (#11) by @xushiyan
- Add commit linting (#12) by @xushiyan
- Use cargo tarpaulin to generate code coverage (#15) by @xushiyan
- Remove codecov to keep ci green (#17) by @xushiyan
- Fix codecov setup (#20) by @xushiyan
- Configure codecov (#50) by @xushiyan
- Add scripts to streamline source release (#64) by @xushiyan
- @codope made their first contribution in #58
- @xushiyan made their first contribution in #1