FRC: Retrieval Checking Requirements #1089

bajtos · 2024-12-04T13:55:22Z

When we set out to build Spark, a protocol for testing whether payload of Filecoin deals can be retrieved back, we designed it based on how Boost worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that Venus Droplet, an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release.

This FRC has the following goals:

Document the retrieval process based on IPFS/IPLD.
Specify what Spark needs from miner software.
Collaborate with the community to tweak the requirements to work well for all parties involved.
Let this spec and the building blocks like IPNI Reverse Index empower other builders to design & implement their own retrieval-checking networks as alternatives to Spark.

Discussion

#1086

Progress

bajtos · 2024-12-04T14:01:49Z

Tagging @steven004 @LexLuthr @magik6k @masih @willscott @juliangruber @patrickwoodhead for visibility.

FRCs/frc-retrieval-checking-requirements.md

LexLuthr · 2024-12-12T08:21:26Z

FRCs/frc-retrieval-checking-requirements.md

+
+#### Link on-chain MinerId and IPNI provider identity
+
+Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).


This requirement cannot be fulfilled in Curio. We no longer have a concept of minerID <> Unique peerID binding. IPNI must be extended to support other keys types like worker key to sign ads.

I am aware of that; see the note in the text below this paragraphs.

> [!NOTE] > This is open to extensions in the future, we can support more than one form of linking > index-provides to filecoin-miners. See e.g. [ipni/spec#33](https://github.com/ipni/specs/issues/33).

From my point of view, I prefer not to block progress on this FRC until the Curio team figures out how to extend IPNI to support other key types. Instead, I'd like this FRC to document the solution that works with Boost & Venus now and then enhance it with the new mechanism Curio needs once that new solution is agreed on.

I am probably mistaken here but Droplet (Venus' Boost) supports multiple minerIDs being associated with a single PeerID (see docs), does that mean if I am using Droplet, I need to limit myself to a 1:1 relationship to meet this requirement?

Great call, @lanzafame! I am still learning more about how Venus Droplet work and what features they offer.

Based on the docs you linked to, I believe you can have multiple minerIDs associated with a single Droplet PeerID and still meet this requirement.

In Spark, we need the PeerID returned by Filecoin.StateMinerInfo to match the PeerID used in IPNI advertisements. Spark does not check whether that PeerID is unique or shared by multiple miners.

Signed-off-by: Miroslav Bajtoš <[email protected]>

jsoares

Left a few editorial comments. I do not know enough about the specific topic to be able to opine on a technical level. I also found the explanation somewhat unclear, but that could be a consequence of my lack of knowledge, so not holding that against the draft.

Others will be better suited to provide a full review.

jsoares · 2025-01-19T22:43:21Z

FRCs/frc-retrieval-checking-requirements.md

+When we set out to build [Spark](https://filspark.com), a protocol for testing whether _payload_ of Filecoin deals can be retrieved back, we designed it based on how [Boost](https://github.com/filecoin-project/boost) worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that [Venus](https://github.com/filecoin-project/venus) [Droplet](https://github.com/ipfs-force-community/droplet), an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release.
+
+This FRC has the following goals:
+1. Document the retrieval process based on IPFS/IPLD.
+2. Specify what Spark needs from miner software.
+3. Collaborate with the community to tweak the requirements to work well for all parties involved.
+4. Let this spec and the building blocks like [IPNI Reverse Index](https://github.com/filecoin-project/devgrants/issues/1781) empower other builders to design & implement their own retrieval-checking networks as alternatives to Spark.


This reads more like motivation than an abstract. It'd be useful for the abstract to summarise the actual requirements/spec.

jsoares · 2025-01-19T22:47:40Z

FRCs/frc-retrieval-checking-requirements.md

+3. Map `(PieceCID, PieceSize)` to IPNI `ContextID` value.
+4. Query IPNI reverse index for a sample of payload blocks advertised by `ProviderID` with
+`ContextID` (see the [proposed API
+spec](https://github.com/ipni/xedni/blob/526f90f5a6001cb50b52e6376f8877163f8018af/openapi.yaml)).


Should this be in the FRC or is it our of scope? The link is fine, but trying to understand whether we see it as central.

jsoares · 2025-01-19T22:48:43Z

FRCs/frc-retrieval-checking-requirements.md

+
+#### Link on-chain MinerId and IPNI provider identity
+
+Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).


Suggested change

Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).

Storage providers are required to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).

jsoares · 2025-01-19T22:51:17Z

FRCs/frc-retrieval-checking-requirements.md

+}
+```
+
+IPNI provider status ([query](https://cid.contact/providers/12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys)):


Not a huge fan of these arbitrary links on a document that's intended to be frozen for a long time.

jsoares · 2025-01-19T22:53:44Z

FRCs/frc-retrieval-checking-requirements.md

+ 1. It's inefficient.
+
+    1. Each retrieval check requires two requests - one to download ~8MB chunk of a piece, the second one to download the payload block found in that chunk.
+
+    1. Spark typically repeats every retrieval check 40-100 times. Scanning CAR byte range 40-100 times does not bring enough value to justify the network bandwidth & CPU cost.
+
+ 1. It's not clear how can retrieval checkers discover the address where the SP serves piece retrievals.


The 1 numbered list renders fine, but is not great for reading in raw md.

jsoares · 2025-01-19T22:54:55Z

FRCs/frc-retrieval-checking-requirements.md

+[Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes:
+- Miner software must construct IPNI `ContextID` values in a specific way.
+- Because such ContextIDs are scoped per piece (not per deal), miner software must de-duplicate advertisements for deals storing the same piece.


Just to be clear, and given this is an FRC, what are we breaking exactly?

jsoares · 2025-01-19T23:01:25Z

FRCs/frc-retrieval-checking-requirements.md

+To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements:
+
+1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)).
+2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/).
+3. Advertise retrievals to IPNI.
+4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize))
+
+Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs.


Suggested change

To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements:

1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)).

2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/).

3. Advertise retrievals to IPNI.

4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize))

Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs.

To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. This FRC outlines requirements that SPs and their software stacks should meet to allow 3rd-party networks to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content.

The goal here is not to go into technical detail. I left a non-binding suggestion; something along these lines would be preferable.

The content in 26-29 would potentially be a good fit for the abstract; see comment below.

bajtos requested review from momack2, arajasek, jennijuju, kaitlin-beegle, anorth, raulk, jsoares, TippyFlitsUK and rvagg as code owners December 4, 2024 13:55

bajtos force-pushed the frc-retrieval-checking-requirements branch from dd101ee to 950bc84 Compare December 4, 2024 13:56

This was referenced Dec 4, 2024

feat: index provider filecoin-project/curio#182

Merged

Secondary provider identifer ipni/specs#33

Open

bajtos marked this pull request as draft December 4, 2024 14:03

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Outdated Show resolved Hide resolved

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Outdated Show resolved Hide resolved

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Outdated Show resolved Hide resolved

bajtos commented Dec 11, 2024

View reviewed changes

FRCs/frc-retrieval-checking-requirements.md Show resolved Hide resolved

LexLuthr reviewed Dec 12, 2024

View reviewed changes

bajtos marked this pull request as ready for review December 18, 2024 12:56

bajtos added 8 commits December 18, 2024 13:56

FRC: Retrieval Checking Requirements

2ccf501

Signed-off-by: Miroslav Bajtoš <[email protected]>

README: add a link to the new RFC

85ee7bf

Signed-off-by: Miroslav Bajtoš <[email protected]>

improve compatibility section, add incentives

483fa94

Signed-off-by: Miroslav Bajtoš <[email protected]>

finish the document

fe10040

Signed-off-by: Miroslav Bajtoš <[email protected]>

add link to Spark v2 milestone issue

f758e5b

Signed-off-by: Miroslav Bajtoš <[email protected]>

updates

71ed7d4

Signed-off-by: Miroslav Bajtoš <[email protected]>

add IPNI support to compat table

3e9ecd9

Signed-off-by: Miroslav Bajtoš <[email protected]>

add mitigation for SPs not advertising all blocks

a91383e

Signed-off-by: Miroslav Bajtoš <[email protected]>

bajtos force-pushed the frc-retrieval-checking-requirements branch from 859745d to a91383e Compare December 18, 2024 12:57

bajtos mentioned this pull request Dec 18, 2024

Write FRC describing Spark's requirements on miner SW (Curio, Venus) space-meridian/roadmap#189

Closed

explain that Venus supports HTTP retrievals

1b8f592

Signed-off-by: Miroslav Bajtoš <[email protected]>

jsoares reviewed Jan 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FRC: Retrieval Checking Requirements #1089

FRC: Retrieval Checking Requirements #1089

bajtos commented Dec 4, 2024 •

edited

Loading

bajtos commented Dec 4, 2024

LexLuthr Dec 12, 2024

bajtos Dec 12, 2024

lanzafame Dec 12, 2024

bajtos Dec 13, 2024

jsoares left a comment

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025

jsoares Jan 19, 2025


		#### Link on-chain MinerId and IPNI provider identity

		Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).

	Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).
	Storage providers are required to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact).

FRC: Retrieval Checking Requirements #1089

Are you sure you want to change the base?

FRC: Retrieval Checking Requirements #1089

Conversation

bajtos commented Dec 4, 2024 • edited Loading

Discussion

Progress

bajtos commented Dec 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoares left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bajtos commented Dec 4, 2024 •

edited

Loading