Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve blosc efficiency #5

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eschnett
Copy link

This is a continuation of braingram#1. This PR discusses two improvements to the current interface to blosc:

  1. The compressor does not pass the data type size. Knowing the data type size allows the shuffle filter to reorder the data, exposing more regularity, which allows the compression algorithm to compress better.
  2. The decompressor has the ability to write into a preallocated buffer instead of allocating its own output buffer. This saves memory bandwidth and would improve the decompression speed slightly.

I experimented with creating a large 3d float64 array (1000 x 1000 x 250 elements) and compressing it with the shuffle filter, using as type sizes either 8 (describing the data) or 1:

  for (int64_t i = 0; i < ni; ++i)
    for (int64_t j = 0; j < nj; ++j)
      for (int64_t k = 0; k < nk; ++k) {
        int64_t idx = getidx(i, j, k);
        rho.at(idx) = 1.0 / (1.1 * i + 1.2 * j + 1.3 * k + 1);
      }

. The resulting file sizes are:

  -rw-r--r--   1 eschnett staff 1993847361 Nov 16 11:31 large-new-shuffle-typesize-1.asdf
  -rw-r--r--   1 eschnett staff  395927299 Nov 16 11:29 large-new-shuffle-typesize-8.asdf

In this case the efficiency drops by a factor of 5 when using the wrong type size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant