Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support .json.gz files #114

Merged
merged 5 commits into from
Jan 6, 2025
Merged

Support .json.gz files #114

merged 5 commits into from
Jan 6, 2025

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Jan 4, 2025

Description of proposed changes

Auspice JSONs can be large, and in such cases they are often compressed using gzip. Adding the ability to load these files directly removes the need for users to manually decompress the file.

Done using the Compression Streams API¹.

¹ https://developer.mozilla.org/en-US/docs/Web/API/Compression_Streams_API

Related issue(s)

Closes #112

Checklist

  • Tested locally with files downloaded by nextstrain remote download https://next.nextstrain.org/ncov/gisaid/global/6m and compressed with gzip
  • Checks pass

File format detection based on file extension facilitates support for
.json.gz files in a future commit.
@victorlin victorlin self-assigned this Jan 4, 2025
@nextstrain-bot nextstrain-bot temporarily deployed to auspice-us-victorlin-su-8diu77 January 4, 2025 01:55 Inactive
@victorlin victorlin force-pushed the victorlin/support-json-gz branch from 0c20fec to 15ca8fd Compare January 4, 2025 02:00
@victorlin victorlin temporarily deployed to auspice-us-victorlin-su-8diu77 January 4, 2025 02:00 Inactive
Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know browsers now exposed decompression APIs - that's great!

We should also support gzipped sidecar files as part of this PR. For instance, loading a pair of gzipped main + frequencies JSONs doesn't work and the console shows:

Read measurements-panel_flu_seasonal_h3n2_ha_tip-frequencies.json.gz as a main dataset JSON file
Read measurements-panel_flu_seasonal_h3n2_ha.json.gz as a main dataset JSON file

I'd suggest an implementation whereby the linking of main & sidecars is compression agnostic, e.g. we allow a gzipped main and non-gzipped sidecar, or vice versa, however I'm not wedded to this so let me know if you think there's good reason not to.

auspice_client_customisation/handleDroppedFiles.js Outdated Show resolved Hide resolved
Prepare for supporting .json.gz files. The file extension should not be
used in the key since it can be either .json or .json.gz. Simply use the
filename without extension. This is simplified with a new function to
get the dataset name from any Auspice JSON file.
Auspice JSONs can be large, and in such cases they are often compressed
using gzip. Adding the ability to load these files directly removes the
need for users to manually decompress the file.

Done using the Compression Streams API¹.

¹ <https://developer.mozilla.org/en-US/docs/Web/API/Compression_Streams_API>
Narratives and Newick file descriptions specify file extensions. Do the
same for Auspice JSONs.
@victorlin victorlin force-pushed the victorlin/support-json-gz branch from 15ca8fd to f89f81e Compare January 6, 2025 20:53
@victorlin victorlin temporarily deployed to auspice-us-victorlin-su-8diu77 January 6, 2025 20:53 Inactive
@victorlin
Copy link
Member Author

I'd suggest an implementation whereby the linking of main & sidecars is compression agnostic, e.g. we allow a gzipped main and non-gzipped sidecar, or vice versa, however I'm not wedded to this so let me know if you think there's good reason not to.

Thanks for flagging sidecar files. I just force-pushed some changes to allow any combination of .json and .json.gz.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jameshadfield jameshadfield merged commit c802708 into master Jan 6, 2025
1 check passed
@jameshadfield jameshadfield deleted the victorlin/support-json-gz branch January 6, 2025 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow to drop .json.gz files
3 participants