Minor update to documentation #97

D3SL · 2024-10-14T08:01:21Z

For "other tools": Polars, Python's equivalent to data.table but with more tidy-like syntax, directly supports parquet including "lazy" operations.

For limitations: This may be parquet in general but loading a parquet file in Arrow keeps it locked somehow. Attempting to write to that same file from python while the dataframe exists in R causes problems. Sometimes the lock is so bad I need to rm() the file in R, force a garbage collect, and then delete the parquet file to get new data loading into R again.

If nanoparquet doesn't have this limitation that would be a very significant improvement in QOL, if it does it's worth mentioning.

The text was updated successfully, but these errors were encountered:

gaborcsardi · 2025-01-23T16:57:35Z

Thanks! It seems to me that Polars internally uses arrow for at least some of the parquet functionality. Am I missing something here?
https://github.com/pola-rs/polars/blob/ca21bd7f06c88954e9c1d647c35413fec6121d22/crates/polars-parquet/Cargo.toml#L17
We can still mention Polars of course.

As for the locking, we don't explicitly lock the file, although Windows might lock it implicitly while we are reading for it, but after that it should be unlocked. Maybe arrow only locks it if you are using ALTREP, which is the default now? In any case, this seems too technical to me to mention up front in a README.

gaborcsardi added the documentation label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor update to documentation #97

Minor update to documentation #97

D3SL commented Oct 14, 2024

gaborcsardi commented Jan 23, 2025

Minor update to documentation #97

Minor update to documentation #97

Comments

D3SL commented Oct 14, 2024

gaborcsardi commented Jan 23, 2025