Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cuVS ANN benchmarks failing due to HTTP 403 Forbidden error fetching data #724

Open
jakirkham opened this issue Jan 11, 2025 · 0 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@jakirkham
Copy link
Member

Describe the bug

It looks like the nightly builds of the Docker images have started failing due to an error when fetching data for the cuVS benchmarks. This appears to happen in all Docker images

Have taken the snippet below from this GHA log, but similar errors can be seen in the others

 > [cuvs-bench-datasets 3/3] RUN /home/rapids/cuvs-bench/get_datasets.sh:
0.214     return self._call_chain(*args)
0.214            ^^^^^^^^^^^^^^^^^^^^^^^
0.214   File "/opt/conda/lib/python3.12/urllib/request.py", line 492, in _call_chain
0.214     result = func(*args)
0.214              ^^^^^^^^^^^
0.214   File "/opt/conda/lib/python3.12/urllib/request.py", line 639, in http_error_default
0.214     raise HTTPError(req.full_url, code, msg, hdrs, fp)
0.214 urllib.error.HTTPError: HTTP Error 403: Forbidden
0.214 downloading http://ann-benchmarks.com/deep-image-96-angular.hdf5 -> /home/rapids/preloaded_datasets/deep-image-96-angular.hdf5...
0.214 Cannot download http://ann-benchmarks.com/deep-image-96-angular.hdf5

Steps/Code to reproduce bug

Run the script cuvs-bench/get_datasets.sh. It appears to fail on the first dataset (please see below). However the later ones may also have the same issue

python -m cuvs_bench.get_dataset --dataset deep-image-96-angular --normalize --dataset-path /home/rapids/preloaded_datasets

Expected behavior

The benchmark datasets are retrieved.

Environment details (please complete the following information):

  • Environment location: Docker (on CI)
  • Method of cuDF install: Conda in Docker build (reproducible with any image or just the script above)
    • If method of install is [Docker], provide docker pull & docker run commands used
  • Please run and attach the output of the cudf/print_env.sh script to gather relevant environment details

Not seeing where cudf/print_env.sh is run on CI. Where should we be looking? Or should we add this to our CI scripts?

In any event there is a bunch of diagnostic information in the log. Though suspect this is as simple as the URL changing or us needing some additional authentication to get the data

Additional context

Not that I can think of

@jakirkham jakirkham added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 11, 2025
rapids-bot bot pushed a commit that referenced this issue Jan 15, 2025
Separates the workflows for building the RAPIDS end user images and the cuVS images. The cuVS images do not depend on the RAPIDS end user images, so they can be built in parallel. This also allows for finer grained retries in case of failures.

Also switches to using `rapids-mamba-retry` for installing conda packages.

Finally, disables building the `cuvs-bench-datasets` images which are consistently failing (#724) until a better solution than the workaround in #723 is ready. 5adab54 can be reverted to re-enable this.

Authors:
  - Ray Douglass (https://github.com/raydouglass)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #725
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant