Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use of zstd compression on compute services #336

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ianmkenney
Copy link
Member

This PR closes #220. It modifies the behavior of set_task_results in the compute client and api to use compressed keyed chain representations of ProtocolDAGResults instead of simple JSON serialization as the intermediate format.

- Update env files to include zstandard

- Update set_task_result in compute api and client to handle base64
encoded data. Rather than JSON serialize the ProtocolDAGResult (PDR)
and use this is a the intermediate format, instead:

1) create a keyed chain representation of the PDR

2) JSON serialize this representation

3) compress the utf-8 encoded bytes with zstandard

4) encode with base64

- Use the above base64 encoded data as the intermediate format and
reverse the operations above to recover the PDR.
Copy link

codecov bot commented Nov 25, 2024

Codecov Report

Attention: Patch coverage is 80.35714% with 11 lines in your changes missing coverage. Please review.

Project coverage is 80.17%. Comparing base (c32f00d) to head (32bef06).

Files with missing lines Patch % Lines
alchemiscale/compute/api.py 33.33% 8 Missing ⚠️
alchemiscale/interface/api.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
+ Coverage   80.15%   80.17%   +0.01%     
==========================================
  Files          26       27       +1     
  Lines        3472     3505      +33     
==========================================
+ Hits         2783     2810      +27     
- Misses        689      695       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Use more bytes

Move compression and decompression functions to new module

Use latin-1 decoded bytes
If a decompression error is raised, assume that the original data was
never compressed.
@ianmkenney ianmkenney force-pushed the feature/220-zstd-compression-compute-services branch from 4541b72 to cce6e8d Compare December 30, 2024 06:21
Test getting extends ProtocolDAGResults as if they were stored through
the old pdr.to_dict() -> json -> utf-8 encoded format. The new test
can be removed in the next major release that drops the old format.
@ianmkenney ianmkenney force-pushed the feature/220-zstd-compression-compute-services branch from cce6e8d to b32f62d Compare December 30, 2024 06:22
To allow for better and clearer testing of result pushing and pulling,
the act of executing a task and pushing its results were separated.
Code coverage was artificially low due to run test run order. A reset
and reinitialization of the s3os_server shows the correct results.
It's more robust to paramterize the old tests to use the legacy kwarg
for pushing results rather than writing a new test that covers less of
the codebase.
@ianmkenney ianmkenney changed the title [WIP] Add use of zstd compression on compute services Add use of zstd compression on compute services Jan 2, 2025
@ianmkenney ianmkenney marked this pull request as ready for review January 2, 2025 18:46
@ianmkenney ianmkenney requested a review from dotsdl January 2, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant