Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error downloading zip from GitHub: "stream reading entries with data descriptors" #1080

Closed
mtkennerly opened this issue May 22, 2023 · 22 comments · Fixed by #1942
Closed

Error downloading zip from GitHub: "stream reading entries with data descriptors" #1080

mtkennerly opened this issue May 22, 2023 · 22 comments · Fixed by #1942

Comments

@mtkennerly
Copy link

mtkennerly commented May 22, 2023

Hi! I was trying to add cargo-binstall support to a project of mine, but I got an error about the zip file on GitHub:

$ cargo binstall ludusavi --manifest-path Cargo.toml --disable-strategies compile
 INFO resolve: Resolving package: 'ludusavi'
 WARN resolve: Error while downloading and extracting from fetcher github.com: Failed to extract zipfile: feature not supported: 'stream reading entries with data descriptors (planned to be reintroduced)'
ERROR Fatal error:

  x For crate ludusavi: Fallback to cargo-install is disabled
  `-> Fallback to cargo-install is disabled

I'm using cargo-binstall 0.23.0 on Windows 11. Here's the excerpt from my Cargo.toml:

[package.metadata.binstall]
pkg-url = "{ repo }/releases/download/v{ version }/{ name }-v{ version }-{ target }{ archive-suffix }"
bin-dir = "{ bin }{ binary-ext }"
pkg-fmt = "zip"

[package.metadata.binstall.overrides.x86_64-pc-windows-msvc]
pkg-url = "{ repo }/releases/download/v{ version }/{ name }-v{ version }-win64{ archive-suffix }"

[package.metadata.binstall.overrides.i686-pc-windows-msvc]
pkg-url = "{ repo }/releases/download/v{ version }/{ name }-v{ version }-win32{ archive-suffix }"

[package.metadata.binstall.overrides.x86_64-unknown-linux-gnu]
pkg-url = "{ repo }/releases/download/v{ version }/{ name }-v{ version }-linux{ archive-suffix }"

[package.metadata.binstall.overrides.x86_64-apple-darwin]
pkg-url = "{ repo }/releases/download/v{ version }/{ name }-v{ version }-mac{ archive-suffix }"

Log excerpt showing the URL:

DEBUG Checking for package at: 'https://github.com/mtkennerly/ludusavi/releases/download/v0.18.2/ludusavi-v0.18.2-win64.zip'
DEBUG Using GitHub Restful API to check for existence of artifact, which will also cache the API response
DEBUG Winning URL is https://github.com/mtkennerly/ludusavi/releases/download/v0.18.2/ludusavi-v0.18.2-win64.zip, with pkg_fmt Zip
DEBUG resolve: Downloading package from: 'https://github.com/mtkennerly/ludusavi/releases/download/v0.18.2/ludusavi-v0.18.2-win64.zip' dst:C:\Users\mtken\.cargo\bin\cargo-binstall1fEfoU\bin-ludusavi-x86_64-pc-windows-msvc-GhCrateMeta fmt:Zip
@mtkennerly
Copy link
Author

Looks like this is a limitation of the async_zip dependency:

https://github.com/Majored/rs-async-zip/blob/ff0d985ef54cf00d73c497dbca0beea7541e37dc/src/base/read/mod.rs#L217

@mtkennerly mtkennerly changed the title Error downloading zip from GitHub Error downloading zip from GitHub: "stream reading entries with data descriptors" May 22, 2023
@NobodyXu
Copy link
Member

NobodyXu commented May 22, 2023

Yeah, I also think it is an async-zip limitation as I don't remember adding this to binstalk-downloader.

@mtkennerly can you try creating zips without data descriptors?

@NobodyXu
Copy link
Member

@mtkennerly Thank you for filing this bug and finding out the root case!

@mtkennerly
Copy link
Author

can you try creating zips without data descriptors?

Sorry, I'm actually not sure how to do that. It was just created by actions/upload-artifact@v1.

However, I just tried the async_zip examples with my zip, and it actually works:

git clone [email protected]:Majored/rs-async-zip.git
cd rs-async-zip
cargo run --example file_extraction --features full

I even went back a few versions, and it was still working. I have no idea why it works in that example but not in cargo-binstall.

@NobodyXu
Copy link
Member

@mtkennerly I think that's because you use the AsyncSeek API which does support data descriptorsc whereas binstalk-downloader uses the streaming API which does not support this.

@NobodyXu
Copy link
Member

Sorry, I'm actually not sure how to do that. It was just created by actions/upload-artifact@v1.

Can you manually create the zip, then use https://github.com/softprops/action-gh-release to upload it release or just manually create and upload the zip?

@NobodyXu
Copy link
Member

According to wikipedia page for zip format:

If the bit at offset 3 (0x08) of the general-purpose flags field is set, then the CRC-32 and file sizes are not known when the header is written.

That basically explains what data descriptor is and the challenges for implementing this in streaming mode.

@NobodyXu
Copy link
Member

From man zip(1):

In addition, streamed archives, entries encrypted with standard encryption, or split archives created with the pause option may not be compatible with PKZIP as data descriptors are used and PKZIP at the time of this writing does not support data descriptors (but recent changes in the PKWare published zip standard now include some support for the data descriptor format zip uses).

and

zip -b /tmp stuff *

will put the temporary zip archive in the directory /tmp, copying over stuff.zip to the current directory when done. This option is useful when updating an existing archive and the file system containing this old archive does not have enough space to hold both old and new archives at the same time. It may also be useful when streaming in some cases to avoid the need for data descriptors. Note that using this option may require zip take additional time to copy the archive file when done to the destination file system.

@

@NobodyXu
Copy link
Member

@mtkennerly I think if you use zip directly to create a new achieve from scratch, then it will not use data descriptors.

@mtkennerly
Copy link
Author

I think that's because you use the AsyncSeek API

Ah, yeah, you're right, the example uses the other API.

Can you manually create the zip, then use https://github.com/softprops/action-gh-release to upload it release or just manually create and upload the zip?

Sure! Created a zip through Windows Explorer and made a test release. It works:

[package.metadata.binstall.overrides.x86_64-pc-windows-msvc]
pkg-url = "https://github.com/mtkennerly/playground/v0.1.0/ludusavi-async-zip-test{ archive-suffix }"
DEBUG Checking for package at: 'https://github.com/mtkennerly/playground/v0.1.0/ludusavi-async-zip-test.zip'
DEBUG resolve: Downloading package from: 'https://github.com/cargo-bins/cargo-quickinstall/releases/download/ludusavi-0.18.2/ludusavi-0.18.2-x86_64-pc-windows-msvc.tar.gz'

[...]

DEBUG resolve: extracted_files = ExtractedFiles(
    {
        "ludusavi.exe": File,
        ".": Dir(
            {
                "ludusavi.exe",
            },
        ),
    },
)
DEBUG Found a binary install source: QuickInstall (x86_64-pc-windows-msvc)
 WARN The package ludusavi v0.18.2 will be downloaded from third-party source QuickInstall
 INFO This will install the following binaries:
 INFO   - ludusavi.exe (ludusavi.exe -> C:\Users\mtken\.cargo\bin\ludusavi.exe)
 INFO Dry-run: Not proceeding to install fetched binaries

@mtkennerly
Copy link
Author

I'm not familiar with all the trade-offs between the seek and stream APIs, but do you think it could make sense for cargo-binstall to use the seek API if it has broader compatibility? I'd guess that a lot of GitHub projects could be affected by this if they're also using actions/upload-artifact@v1 during builds.

@NobodyXu
Copy link
Member

I'm not familiar with all the trade-offs between the seek and stream APIs, but do you think it could make sense for cargo-binstall to use the seek API if it has broader compatibility? I'd guess that a lot of GitHub projects could be affected by this if they're also using actions/upload-artifact@v1 during builds.

Yeah I think this makes sense, but let me create an issue at the upstream first to see if it's going to be implemented in next version and how long it will take.

If they are going to support this soon, then we might not need to switch to AsyncSeek API.

@NobodyXu
Copy link
Member

I've opened Majored/rs-async-zip#94 in upstream

@mtkennerly
Copy link
Author

I noticed that Majored/rs-async-zip#94 is closed and that cargo-binstall is now using async_zip v0.0.17, so I gave this a try with cargo-binstall v1.9.0 and got a different error:

$ cargo binstall ludusavi --manifest-path ./Cargo.toml --disable-strategies compile
 INFO resolve: Resolving package: 'ludusavi'
 WARN resolve: Error while downloading and extracting from fetcher github.com: Failed to extract zipfile: a computed CRC32 value did not match the expected value
 WARN resolve: Error while downloading and extracting from fetcher github.com: Failed to extract zipfile: a computed CRC32 value did not match the expected value
ERROR Fatal error:
  x For crate ludusavi: Fallback to cargo-install is disabled
  `-> Fallback to cargo-install is disabled

I also get that error using the zip link from #1080 (comment) , which had worked when I made that comment.

@NobodyXu
Copy link
Member

NobodyXu commented Aug 9, 2024

It seems the pre-built binary for that crate is broken, the crc32 checksum doesn't match.

@mtkennerly
Copy link
Author

I think it's caused by this issue: Majored/rs-async-zip#141

@NobodyXu
Copy link
Member

Thanks, once async-zip fixed it I would bump it and cut a new release

@polarathene
Copy link

I could open a separate issue for this if you'd like? This seems to be another failure example with Failed to extract zipfile, error Encountered an unexpected header:

$ cargo binstall qsv

 INFO resolve: Resolving package: 'qsv'
 WARN resolve: Error while downloading and extracting from fetcher github.com: Failed to extract zipfile: Encountered an unexpected header (actual: 0x6501040c, expected: 0x4034b50).
 WARN resolve: Error while downloading and extracting from fetcher github.com: Failed to extract zipfile: Encountered an unexpected header (actual: 0x6501040c, expected: 0x4034b50).
 WARN The package qsv v0.134.0 will be installed from source (with cargo)

The qsv project does publish assets to GH releases, but it looks like there is no metadata in their Cargo.toml for cargo-binstall to reference. According to that link, binstall will still try to use the GH releases (_which might or fallback to QuickInstall repo if it has qsv built?

Building from source uses over 10GB in RAM I think (due to their release profile having lto = true when lto = thin is probably not that much of a loss? 🤷‍♂️ ), can that be customized when such performance / optimizations aren't as important for the tradeoff?


UPDATE: Ran with verbose output, it found the correct asset from the qsv repo release, then after failing tried checking the QuickInstall repo but found no result, so fallback to building from source:

Verbose output scoped from check to warning
DEBUG Checking for package at: 'https://github.com/jqnatividad/qsv/releases/download/0.134.0/qsv-0.134.0-x86_64-unknown-linux-musl.zip'
DEBUG Using GitHub API to check for existence of artifact, which will also cache the API response
DEBUG has_release_artifact{release=GhRelease { repo: GhRepo { owner: "jqnatividad", repo: "qsv" }, tag: "0.134.0" } artifact_name="qsv-0.134.0-x86_64-unknown-linux-musl.zip"}: return=Ok(Some(GhReleaseArtifactUrl(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/jqnatividad/qsv/releases/assets/191876960", query: None, fragment: None })))
DEBUG Winning URL found! resolved=Resolved { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/jqnatividad/qsv/releases/download/0.134.0/qsv-0.134.0-x86_64-unknown-linux-musl.zip", query: None, fragment: None }, pkg_fmt: Zip, archive_suffix: Some(".zip"), repo: Some("https://github.com/jqnatividad/qsv"), subcrate: None, gh_release_artifact_url: Some(GhReleaseArtifactUrl(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("api.github.com")), port: None, path: "/repos/jqnatividad/qsv/releases/assets/191876960", query: None, fragment: None })), is_repo_private: false }
DEBUG resolve: Downloading package url=https://github.com/jqnatividad/qsv/releases/download/0.134.0/qsv-0.134.0-x86_64-unknown-linux-musl.zip dst=/root/.cargo/bin/cargo-binstallGcNyjb/bin-qsv-x86_64-unknown-linux-musl-GhCrateMeta fmt=Zip
DEBUG resolve:and_extract{fmt=Zip path=/root/.cargo/bin/cargo-binstallGcNyjb/bin-qsv-x86_64-unknown-linux-musl-GhCrateMeta}: Downloading from: 'https://github.com/jqnatividad/qsv/releases/download/0.134.0/qsv-0.134.0-x86_64-unknown-linux-musl.zip'
DEBUG resolve:and_extract{fmt=Zip path=/root/.cargo/bin/cargo-binstallGcNyjb/bin-qsv-x86_64-unknown-linux-musl-GhCrateMeta}: Downloading and extracting to: '/root/.cargo/bin/cargo-binstallGcNyjb/bin-qsv-x86_64-unknown-linux-musl-GhCrateMeta'
DEBUG resolve:and_extract{fmt=Zip path=/root/.cargo/bin/cargo-binstallGcNyjb/bin-qsv-x86_64-unknown-linux-musl-GhCrateMeta}: Decompressing from zip archive to `/root/.cargo/bin/cargo-binstallGcNyjb/bin-qsv-x86_64-unknown-linux-musl-GhCrateMeta`
 WARN resolve: Error while downloading and extracting from fetcher github.com: Failed to extract zipfile: Encountered an unexpected header (actual: 0x6501040c, expected: 0x4034b50).

Since the archive has more than one file, funzip cannot be used with stdin, so archive needs to be downloaded and extracted like this:

curl -o qsv.zip -fsSL https://github.com/jqnatividad/qsv/releases/download/0.134.0/qsv-0.134.0-x86_64-unknown-linux-musl.zip && unzip qsv.zip qsv

From this rs-async-zip issue:

if compressed_size is 0, which is the case for streamed zip files, the reader ignores the self-delimited file contents (most compression algorithms are self-delimited) and does a take(0) instead. This results in either a

unexpected BufError

or

Encountered an unexpected header

astral/uv implemented a fallback that downloads archive to disk to use Seek instead of streaming. Apparently this is resolved now upstream at rs-async-zip but needs to push out a new release.


@mtkennerly off-topic, but you may want to consider dropping the version from your published releases (see this issue for justification), cargo-binstall releases do this already👍

@mtkennerly
Copy link
Author

off-topic, but you may want to consider dropping the version from your published releases

I think your reasoning in the other ticket is valid, but I do like having the version as a personal preference. I'm the type to hang onto old installers and manage programs manually, so I like when the installer file names are sorted and won't conflict with each other if I download multiple versions. (Yes, I use Windows 🙃)

@NobodyXu
Copy link
Member

NobodyXu commented Sep 17, 2024

We could fork the upstream and publish the latest main.

or maybe we could consider switching to rc-zip, it seems to have more activity and also supports async, but not sure if it reaches feature parity with rs-async-zip

@NobodyXu
Copy link
Member

The upstream has not responded yet and I think rc-zip is probably good enough, so we should switch to it.

@NobodyXu NobodyXu removed the Blocked: upstream Fix or feature is needed to be implemented upstream (in a dependency) label Oct 24, 2024
NobodyXu added a commit that referenced this issue Oct 28, 2024
Fixed #1080

In this commit, binstalk-downloader is updated to
- first download the zip into a temporary file, since
  there is no correct way to extract zip from a stream.
- then use rc-zip-sync to read from the zip and extract
  it to filesystem.

Signed-off-by: Jiahao XU <[email protected]>
NobodyXu added a commit that referenced this issue Oct 28, 2024
Fixed #1080

In this commit, binstalk-downloader is updated to
- first download the zip into a temporary file, since
  there is no correct way to extract zip from a stream.
- then use rc-zip-sync to read from the zip and extract
  it to filesystem.

Signed-off-by: Jiahao XU <[email protected]>
NobodyXu added a commit that referenced this issue Oct 28, 2024
Fixed #1080

In this commit, binstalk-downloader is updated to
- first download the zip into a temporary file, since
  there is no correct way to extract zip from a stream.
- then use rc-zip-sync to read from the zip and extract
  it to filesystem.

Signed-off-by: Jiahao XU <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Oct 30, 2024
* Use rc-zip-sync for zip extraction

Fixed #1080

In this commit, binstalk-downloader is updated to
- first download the zip into a temporary file, since
  there is no correct way to extract zip from a stream.
- then use rc-zip-sync to read from the zip and extract
  it to filesystem.

Signed-off-by: Jiahao XU <[email protected]>

* Fix returned `ExtractedFiles` in `do_extract_zip`

Signed-off-by: Jiahao XU <[email protected]>

* Fix clippy in zip_extraction.rs

Signed-off-by: Jiahao XU <[email protected]>

---------

Signed-off-by: Jiahao XU <[email protected]>
Signed-off-by: Jiahao XU <[email protected]>
@mtkennerly
Copy link
Author

This works for me now in 1.10.10 🎉 Thanks!

tmeijn pushed a commit to tmeijn/dotfiles that referenced this issue Nov 19, 2024
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [cargo-bins/cargo-binstall](https://github.com/cargo-bins/cargo-binstall) | patch | `v1.10.8` -> `v1.10.13` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>cargo-bins/cargo-binstall (cargo-bins/cargo-binstall)</summary>

### [`v1.10.13`](https://github.com/cargo-bins/cargo-binstall/releases/tag/v1.10.13)

[Compare Source](cargo-bins/cargo-binstall@v1.10.12...v1.10.13)

*Binstall is a tool to fetch and install Rust-based executables as binaries. It aims to be a drop-in replacement for `cargo install` in most cases. Install it today with `cargo install cargo-binstall`, from the binaries below, or if you already have it, upgrade with `cargo binstall cargo-binstall`.*

##### In this release:

-   Use git credential helper for github token auto discovery (can be disabled via `--no-discover-github-token`) ([#&#8203;1871](cargo-bins/cargo-binstall#1871))

##### Other changes:

-   Upgrade transitive dependencies

### [`v1.10.12`](https://github.com/cargo-bins/cargo-binstall/releases/tag/v1.10.12)

[Compare Source](cargo-bins/cargo-binstall@v1.10.11...v1.10.12)

*Binstall is a tool to fetch and install Rust-based executables as binaries. It aims to be a drop-in replacement for `cargo install` in most cases. Install it today with `cargo install cargo-binstall`, from the binaries below, or if you already have it, upgrade with `cargo binstall cargo-binstall`.*

##### In this release:

-   Speedup installation script by avoiding additional network trip

##### Other changes:

-   Upgrade dependencies (hickory-dns, thiserror, file-format, etc).

### [`v1.10.11`](https://github.com/cargo-bins/cargo-binstall/releases/tag/v1.10.11)

[Compare Source](cargo-bins/cargo-binstall@v1.10.10...v1.10.11)

*Binstall is a tool to fetch and install Rust-based executables as binaries. It aims to be a drop-in replacement for `cargo install` in most cases. Install it today with `cargo install cargo-binstall`, from the binaries below, or if you already have it, upgrade with `cargo binstall cargo-binstall`.*

##### In this release:

-   Upgrade dependencies to fix compilation failures due to yanked dependency fs4 0.10

### [`v1.10.10`](https://github.com/cargo-bins/cargo-binstall/releases/tag/v1.10.10)

[Compare Source](cargo-bins/cargo-binstall@v1.10.9...v1.10.10)

*Binstall is a tool to fetch and install Rust-based executables as binaries. It aims to be a drop-in replacement for `cargo install` in most cases. Install it today with `cargo install cargo-binstall`, from the binaries below, or if you already have it, upgrade with `cargo binstall cargo-binstall`.*

##### In this release:

-   Use rc-zip-sync for zip extraction ([#&#8203;1080](cargo-bins/cargo-binstall#1080) [#&#8203;1942](cargo-bins/cargo-binstall#1942))
-   Better UI: default to "yes" for installation prompt, and display more readable message for the prompt ([#&#8203;1943](cargo-bins/cargo-binstall#1943) [#&#8203;1948](cargo-bins/cargo-binstall#1948) [#&#8203;1950](cargo-bins/cargo-binstall#1950) )

##### Other changes:

-   Upgrade dependencies ([#&#8203;1949](cargo-bins/cargo-binstall#1949))

### [`v1.10.9`](https://github.com/cargo-bins/cargo-binstall/releases/tag/v1.10.9)

[Compare Source](cargo-bins/cargo-binstall@v1.10.8...v1.10.9)

*Binstall is a tool to fetch and install Rust-based executables as binaries. It aims to be a drop-in replacement for `cargo install` in most cases. Install it today with `cargo install cargo-binstall`, from the binaries below, or if you already have it, upgrade with `cargo binstall cargo-binstall`.*

##### In this release:

-   Upgrade dependencies

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants