Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for loading files in BZip2 and XZ formats #62

Merged
merged 2 commits into from
Sep 21, 2019

Conversation

munckymagik
Copy link
Contributor

@munckymagik munckymagik commented Aug 10, 2019

Hi. This is my first PR to any Julia project so happy to be guided to a more idiomatic solution. But this is my first pass at making it work. Thanks.

TODO list:

  • Test error handling for unsupported compressors (e.g. zip)
  • Silence the error output from FileIO for tests that we intend to cause failures
  • Throw a custom MissingCodecError or check the error message explicitly.
  • Squash commits

@coveralls
Copy link

coveralls commented Aug 10, 2019

Coverage Status

Coverage increased (+1.9%) to 75.526% when pulling c6b1d12 on munckymagik:support_all_compression_formats into 1f4f7e2 on JuliaData:master.

@nalimilan nalimilan requested a review from alyst August 19, 2019 20:50
Copy link
Collaborator

@alyst alyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Overall it looks quite ready, but I think we would need to make CodecXz.jl and CodecBzip2.jl dependencies optional. Otherwise we will introduce a lot of unnecessary and unexpected dependencies to a lot of packages and workflows (e.g. to RDatasets.jl, which is required by test suites of many other packages).

Ideally, it would be nice if FileIO.jl or TranscodingStreams.jl have implemented the mechanism of automatically using the appropriate decompressor stream on the file (or reporting if it's not available). But AFAIK there's nothing like that at the moment (see also JuliaIO/FileIO.jl#225).

We can implement our own mechanism using Requires.jl. You can take a look how PlotlyBase.jl is using Requires.jl for an optional Colors.jl dependency. You can define e.g.

abstract type ArchiveFormat{S} end

function decompressor_stream(::Type{ArchiveFormat{S}}, io::IOStream) where S
    if S == "BZIP2"
       error("CodecBzip2 package is required to read Bzip2-compressed RData files. Use Pkg.add(\"CodecBzip2\").")
    elseif S=="XZ"
       error("CodecXz package is required to read XZ-compressed RData files. Use Pkg.add(\"CodecXz\")")
    else
       throw(ArgumentError("Unsupported archive format: $S"))
    end
end

decompressor_stream(::Type{ArchiveFormat{"GZIP"}}, io::IOStream) = GzipDecompressorStream(io)

and decompressor_stream() for BZip2 and Xz in __init__(). Then you would be calling decompress_stream(ArchiveFormat{format}, io) from decompress(io). Of course, this code is just a sketch and it's quite likely that you can come up with a better solution :)

@munckymagik
Copy link
Contributor Author

@alyst I've had a go at implementing your request. It appears to work but now I get the following message when running tests in my code that uses RData:

image

Is that expected? (I have done Pkg.resolve().

From a design perspective I tried several approaches with regard to placement of the __init__ function. What I settled on in the end was to put it in a sub-module dedicated to defining the decompressor_stream functions, so everything related is together.

What do you think? Let me know your suggestions from here 👍

@munckymagik munckymagik force-pushed the support_all_compression_formats branch 2 times, most recently from 3d28a3e to d7abaca Compare August 31, 2019 13:36
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@alyst alyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks great, I've just added some cosmetics and noted a few minor things:

  • tests for other archives (zip)
  • whether we really need additional tiny submodule
  • whether we need using CodecXXX in the user code

src/decompression.jl Outdated Show resolved Hide resolved
src/decompression.jl Outdated Show resolved Hide resolved
src/decompression.jl Outdated Show resolved Hide resolved
src/decompression.jl Outdated Show resolved Hide resolved
src/decompression.jl Outdated Show resolved Hide resolved
src/decompression.jl Outdated Show resolved Hide resolved
test/decompression.jl Outdated Show resolved Hide resolved
test/decompression.jl Show resolved Hide resolved
test/decompression.jl Outdated Show resolved Hide resolved
Copy link
Collaborator

@alyst alyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! We're almost there. There's one typo in the method name + some comments.
I'm sorry that optional dependencies are causing so much work. I think the only requirements for merging this PR is adding zip test and all green for CI.

src/decompression.jl Outdated Show resolved Hide resolved
src/decompression.jl Show resolved Hide resolved
src/decompression.jl Show resolved Hide resolved
test/decompression.jl Outdated Show resolved Hide resolved
@munckymagik
Copy link
Contributor Author

Just pushed some commits. Super tired right now due to travelling. Will check everything over tomorrow to see what's left and reply properly.

@munckymagik munckymagik force-pushed the support_all_compression_formats branch 2 times, most recently from bf2f4ef to 16d1b88 Compare September 10, 2019 07:01
Also introduces a custom abstract error type RDataException, which
should serve as a starting point for issue JuliaData#14.
@munckymagik munckymagik force-pushed the support_all_compression_formats branch from 16d1b88 to 987e7b1 Compare September 18, 2019 20:07
@munckymagik
Copy link
Contributor Author

Hi @alyst

Sorry for the time it's taken to finish this.

All the issues we discussed have been addressed, so it's ready for another review.

Cheers 👍

Copy link
Collaborator

@alyst alyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for you hard work!
I'm approving it, but could you please change the indentation of the new files to use 4 spaces before I merge the PR? No need to re-squash as I'll squash it upon merging.

src/errors.jl Outdated
end

function Base.showerror(io::IO, e::CodecMissingError)
print(io, string(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please change it to 4-space indent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh 🤦‍♂ there's always something. Thanks for spotting.

test/errors.jl Outdated
@@ -0,0 +1,16 @@
module TestErrors
using RData, Test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4-space indent here as well

@munckymagik
Copy link
Contributor Author

Ok done. Thank you 👍

@alyst alyst merged commit 3dc4e3f into JuliaData:master Sep 21, 2019
@alyst
Copy link
Collaborator

alyst commented Sep 21, 2019

Failures in AppVeyor for julia-latest seem unrelated, so I'm merging.
Thanks again, @munckymagik, Xz and Bzip2 support is quite important!

@alyst alyst mentioned this pull request Sep 21, 2019
@munckymagik munckymagik deleted the support_all_compression_formats branch September 22, 2019 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants