Provide an API for tree hashing #436

josnyder-2 · 2024-12-03T00:30:48Z

The blake3 crate provides some methods for incrementally hashing a byte array (viz. update_rayon, update_mmap_rayon), but these lock the user into using a specific framework for parallelism, and basically cannot work in situations where multiple separate computers want to collaborate to produce a single valid BLAKE3 hash tree. The algorithm itself is designed around tree-hashing as a paradigm, but it doesn't seem to expose a public API for interacting with that tree representation, emitting unfinalized hashes, or ingesting unfinalized hashes for inclusion into a larger file.

Of course, users who want these capabilities can simply design their own Merkle tree formats atop blake3, but it would be nice to not have a boundary between spec-compliant blake3 hashes (as emitted by this crate) and bespoke Merkle trees (as emitted by people's homemade formats).

The text was updated successfully, but these errors were encountered:

oconnor663 · 2024-12-03T02:23:15Z

There's an undocumented blake3::guts module that you can currently use for this: https://github.com/BLAKE3-team/BLAKE3/blob/master/src/guts.rs. I've never considered it "officially" stable, but it's been used by the bao tool for years, so at this point it's unlikely that we'll ever break it. The main downside of this API is that it's slow, because it doesn't use any multi-chunk parallelism. The other downside is that it doesn't support the keyed or derive-key modes, though that would be easy to add if someone needed it.

There's also a fork of this repo at https://github.com/n0-computer/iroh-blake3, which exposes a iroh_blake3::guts::hash_subtree function, which is more efficient.

It's probably time we designed a stable, documented API for this into the blake3 crate, maybe behind some scary-sounding feature flag. The functions that would be needed are (not worrying too much about the names):

set_chunk_index, letting you create a Hasher that starts at an index other than 0. Doing this through mutation avoids duplicating all three public constructors, though you'd get garbage if you used it after update.
finalize_internal, for getting non-root chaining values out of a Hasher
something for joining internal hashes together into higher level parent nodes. unclear whether this should be a standalone operation vs a way to feed large subtree hashes into a Hasher (easy to implement but again easy to misuse and get garbage out)
Maybe also nice to have: a way to pre-hash a derive-key context string and then a corresponding way to initialize a Hasher from that pre-hash.

oconnor663 · 2024-12-04T17:54:36Z

crosslink #82

devinrsmith · 2025-01-09T22:15:57Z

I'm working on a blake3 tree traversal API to support the logic needed to implement a bao-like incremental producer / consumer. Essentially, a bunch of implementation math so you can arbitrarily walk through a logical blake3 node tree (divorced from any actual data, except for the total content length).

I think it would make sense for it to live in the official blake3 repo; I would be open to contributing it if there was sufficient interest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide an API for tree hashing #436

Provide an API for tree hashing #436

josnyder-2 commented Dec 3, 2024

oconnor663 commented Dec 3, 2024 •

edited

Loading

oconnor663 commented Dec 4, 2024

devinrsmith commented Jan 9, 2025 •

edited

Loading

Provide an API for tree hashing #436

Provide an API for tree hashing #436

Comments

josnyder-2 commented Dec 3, 2024

oconnor663 commented Dec 3, 2024 • edited Loading

oconnor663 commented Dec 4, 2024

devinrsmith commented Jan 9, 2025 • edited Loading

oconnor663 commented Dec 3, 2024 •

edited

Loading

devinrsmith commented Jan 9, 2025 •

edited

Loading