-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FRC: Retrieval Checking Requirements #1089
base: master
Are you sure you want to change the base?
FRC: Retrieval Checking Requirements #1089
Conversation
dd101ee
to
950bc84
Compare
Tagging @steven004 @LexLuthr @magik6k @masih @willscott @juliangruber @patrickwoodhead for visibility. |
|
||
#### Link on-chain MinerId and IPNI provider identity | ||
|
||
Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement cannot be fulfilled in Curio. We no longer have a concept of minerID <> Unique peerID binding. IPNI must be extended to support other keys types like worker key to sign ads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am aware of that; see the note in the text below this paragraphs.
> [!NOTE]
> This is open to extensions in the future, we can support more than one form of linking
> index-provides to filecoin-miners. See e.g. [ipni/spec#33](https://github.com/ipni/specs/issues/33).
From my point of view, I prefer not to block progress on this FRC until the Curio team figures out how to extend IPNI to support other key types. Instead, I'd like this FRC to document the solution that works with Boost & Venus now and then enhance it with the new mechanism Curio needs once that new solution is agreed on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably mistaken here but Droplet (Venus' Boost) supports multiple minerIDs being associated with a single PeerID (see docs), does that mean if I am using Droplet, I need to limit myself to a 1:1 relationship to meet this requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great call, @lanzafame! I am still learning more about how Venus Droplet work and what features they offer.
Based on the docs you linked to, I believe you can have multiple minerIDs associated with a single Droplet PeerID and still meet this requirement.
In Spark, we need the PeerID returned by Filecoin.StateMinerInfo
to match the PeerID used in IPNI advertisements. Spark does not check whether that PeerID is unique or shared by multiple miners.
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
859745d
to
a91383e
Compare
Signed-off-by: Miroslav Bajtoš <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few editorial comments. I do not know enough about the specific topic to be able to opine on a technical level. I also found the explanation somewhat unclear, but that could be a consequence of my lack of knowledge, so not holding that against the draft.
Others will be better suited to provide a full review.
When we set out to build [Spark](https://filspark.com), a protocol for testing whether _payload_ of Filecoin deals can be retrieved back, we designed it based on how [Boost](https://github.com/filecoin-project/boost) worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that [Venus](https://github.com/filecoin-project/venus) [Droplet](https://github.com/ipfs-force-community/droplet), an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release. | ||
|
||
This FRC has the following goals: | ||
1. Document the retrieval process based on IPFS/IPLD. | ||
2. Specify what Spark needs from miner software. | ||
3. Collaborate with the community to tweak the requirements to work well for all parties involved. | ||
4. Let this spec and the building blocks like [IPNI Reverse Index](https://github.com/filecoin-project/devgrants/issues/1781) empower other builders to design & implement their own retrieval-checking networks as alternatives to Spark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads more like motivation than an abstract. It'd be useful for the abstract to summarise the actual requirements/spec.
3. Map `(PieceCID, PieceSize)` to IPNI `ContextID` value. | ||
4. Query IPNI reverse index for a sample of payload blocks advertised by `ProviderID` with | ||
`ContextID` (see the [proposed API | ||
spec](https://github.com/ipni/xedni/blob/526f90f5a6001cb50b52e6376f8877163f8018af/openapi.yaml)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in the FRC or is it our of scope? The link is fine, but trying to understand whether we see it as central.
|
||
#### Link on-chain MinerId and IPNI provider identity | ||
|
||
Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact). | |
Storage providers are required to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact). |
} | ||
``` | ||
|
||
IPNI provider status ([query](https://cid.contact/providers/12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan of these arbitrary links on a document that's intended to be frozen for a long time.
1. It's inefficient. | ||
|
||
1. Each retrieval check requires two requests - one to download ~8MB chunk of a piece, the second one to download the payload block found in that chunk. | ||
|
||
1. Spark typically repeats every retrieval check 40-100 times. Scanning CAR byte range 40-100 times does not bring enough value to justify the network bandwidth & CPU cost. | ||
|
||
1. It's not clear how can retrieval checkers discover the address where the SP serves piece retrievals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1 numbered list renders fine, but is not great for reading in raw md.
[Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: | ||
- Miner software must construct IPNI `ContextID` values in a specific way. | ||
- Because such ContextIDs are scoped per piece (not per deal), miner software must de-duplicate advertisements for deals storing the same piece. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, and given this is an FRC, what are we breaking exactly?
To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements: | ||
|
||
1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)). | ||
2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). | ||
3. Advertise retrievals to IPNI. | ||
4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize)) | ||
|
||
Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements: | |
1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)). | |
2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). | |
3. Advertise retrievals to IPNI. | |
4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize)) | |
Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs. | |
To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. This FRC outlines requirements that SPs and their software stacks should meet to allow 3rd-party networks to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content. |
The goal here is not to go into technical detail. I left a non-binding suggestion; something along these lines would be preferable.
The content in 26-29 would potentially be a good fit for the abstract; see comment below.
When we set out to build Spark, a protocol for testing whether payload of Filecoin deals can be retrieved back, we designed it based on how Boost worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that Venus Droplet, an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release.
This FRC has the following goals:
Discussion
#1086
Progress