-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for loading files in BZip2 and XZ formats #62
Add support for loading files in BZip2 and XZ formats #62
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Overall it looks quite ready, but I think we would need to make CodecXz.jl and CodecBzip2.jl dependencies optional. Otherwise we will introduce a lot of unnecessary and unexpected dependencies to a lot of packages and workflows (e.g. to RDatasets.jl, which is required by test suites of many other packages).
Ideally, it would be nice if FileIO.jl or TranscodingStreams.jl have implemented the mechanism of automatically using the appropriate decompressor stream on the file (or reporting if it's not available). But AFAIK there's nothing like that at the moment (see also JuliaIO/FileIO.jl#225).
We can implement our own mechanism using Requires.jl. You can take a look how PlotlyBase.jl is using Requires.jl for an optional Colors.jl dependency. You can define e.g.
abstract type ArchiveFormat{S} end
function decompressor_stream(::Type{ArchiveFormat{S}}, io::IOStream) where S
if S == "BZIP2"
error("CodecBzip2 package is required to read Bzip2-compressed RData files. Use Pkg.add(\"CodecBzip2\").")
elseif S=="XZ"
error("CodecXz package is required to read XZ-compressed RData files. Use Pkg.add(\"CodecXz\")")
else
throw(ArgumentError("Unsupported archive format: $S"))
end
end
decompressor_stream(::Type{ArchiveFormat{"GZIP"}}, io::IOStream) = GzipDecompressorStream(io)
and decompressor_stream()
for BZip2 and Xz in __init__()
. Then you would be calling decompress_stream(ArchiveFormat{format}, io)
from decompress(io)
. Of course, this code is just a sketch and it's quite likely that you can come up with a better solution :)
@alyst I've had a go at implementing your request. It appears to work but now I get the following message when running tests in my code that uses RData: Is that expected? (I have done From a design perspective I tried several approaches with regard to placement of the What do you think? Let me know your suggestions from here 👍 |
3d28a3e
to
d7abaca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks great, I've just added some cosmetics and noted a few minor things:
- tests for other archives (zip)
- whether we really need additional tiny submodule
- whether we need
using CodecXXX
in the user code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! We're almost there. There's one typo in the method name + some comments.
I'm sorry that optional dependencies are causing so much work. I think the only requirements for merging this PR is adding zip test and all green for CI.
Just pushed some commits. Super tired right now due to travelling. Will check everything over tomorrow to see what's left and reply properly. |
bf2f4ef
to
16d1b88
Compare
Also introduces a custom abstract error type RDataException, which should serve as a starting point for issue JuliaData#14.
16d1b88
to
987e7b1
Compare
Hi @alyst Sorry for the time it's taken to finish this. All the issues we discussed have been addressed, so it's ready for another review. Cheers 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for you hard work!
I'm approving it, but could you please change the indentation of the new files to use 4 spaces before I merge the PR? No need to re-squash as I'll squash it upon merging.
src/errors.jl
Outdated
end | ||
|
||
function Base.showerror(io::IO, e::CodecMissingError) | ||
print(io, string( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please change it to 4-space indent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh 🤦♂ there's always something. Thanks for spotting.
test/errors.jl
Outdated
@@ -0,0 +1,16 @@ | |||
module TestErrors | |||
using RData, Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4-space indent here as well
Ok done. Thank you 👍 |
Failures in AppVeyor for julia-latest seem unrelated, so I'm merging. |
Hi. This is my first PR to any Julia project so happy to be guided to a more idiomatic solution. But this is my first pass at making it work. Thanks.
TODO list:
MissingCodecError
or check the error message explicitly.