Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an API for tree hashing #436

Open
josnyder-2 opened this issue Dec 3, 2024 · 3 comments
Open

Provide an API for tree hashing #436

josnyder-2 opened this issue Dec 3, 2024 · 3 comments

Comments

@josnyder-2
Copy link

The blake3 crate provides some methods for incrementally hashing a byte array (viz. update_rayon, update_mmap_rayon), but these lock the user into using a specific framework for parallelism, and basically cannot work in situations where multiple separate computers want to collaborate to produce a single valid BLAKE3 hash tree. The algorithm itself is designed around tree-hashing as a paradigm, but it doesn't seem to expose a public API for interacting with that tree representation, emitting unfinalized hashes, or ingesting unfinalized hashes for inclusion into a larger file.

Of course, users who want these capabilities can simply design their own Merkle tree formats atop blake3, but it would be nice to not have a boundary between spec-compliant blake3 hashes (as emitted by this crate) and bespoke Merkle trees (as emitted by people's homemade formats).

@oconnor663
Copy link
Member

oconnor663 commented Dec 3, 2024

There's an undocumented blake3::guts module that you can currently use for this: https://github.com/BLAKE3-team/BLAKE3/blob/master/src/guts.rs. I've never considered it "officially" stable, but it's been used by the bao tool for years, so at this point it's unlikely that we'll ever break it. The main downside of this API is that it's slow, because it doesn't use any multi-chunk parallelism. The other downside is that it doesn't support the keyed or derive-key modes, though that would be easy to add if someone needed it.

There's also a fork of this repo at https://github.com/n0-computer/iroh-blake3, which exposes a iroh_blake3::guts::hash_subtree function, which is more efficient.

It's probably time we designed a stable, documented API for this into the blake3 crate, maybe behind some scary-sounding feature flag. The functions that would be needed are (not worrying too much about the names):

  • set_chunk_index, letting you create a Hasher that starts at an index other than 0. Doing this through mutation avoids duplicating all three public constructors, though you'd get garbage if you used it after update.
  • finalize_internal, for getting non-root chaining values out of a Hasher
  • something for joining internal hashes together into higher level parent nodes. unclear whether this should be a standalone operation vs a way to feed large subtree hashes into a Hasher (easy to implement but again easy to misuse and get garbage out)
  • Maybe also nice to have: a way to pre-hash a derive-key context string and then a corresponding way to initialize a Hasher from that pre-hash.

@oconnor663
Copy link
Member

crosslink #82

@devinrsmith
Copy link

devinrsmith commented Jan 9, 2025

I'm working on a blake3 tree traversal API to support the logic needed to implement a bao-like incremental producer / consumer. Essentially, a bunch of implementation math so you can arbitrarily walk through a logical blake3 node tree (divorced from any actual data, except for the total content length).

I think it would make sense for it to live in the official blake3 repo; I would be open to contributing it if there was sufficient interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants