You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For "other tools": Polars, Python's equivalent to data.table but with more tidy-like syntax, directly supports parquet including "lazy" operations.
For limitations: This may be parquet in general but loading a parquet file in Arrow keeps it locked somehow. Attempting to write to that same file from python while the dataframe exists in R causes problems. Sometimes the lock is so bad I need to rm() the file in R, force a garbage collect, and then delete the parquet file to get new data loading into R again.
If nanoparquet doesn't have this limitation that would be a very significant improvement in QOL, if it does it's worth mentioning.
The text was updated successfully, but these errors were encountered:
As for the locking, we don't explicitly lock the file, although Windows might lock it implicitly while we are reading for it, but after that it should be unlocked. Maybe arrow only locks it if you are using ALTREP, which is the default now? In any case, this seems too technical to me to mention up front in a README.
For "other tools": Polars, Python's equivalent to data.table but with more tidy-like syntax, directly supports parquet including "lazy" operations.
For limitations: This may be parquet in general but loading a parquet file in Arrow keeps it locked somehow. Attempting to write to that same file from python while the dataframe exists in R causes problems. Sometimes the lock is so bad I need to
rm()
the file in R, force a garbage collect, and then delete the parquet file to get new data loading into R again.If nanoparquet doesn't have this limitation that would be a very significant improvement in QOL, if it does it's worth mentioning.
The text was updated successfully, but these errors were encountered: