-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce storage_proof_size
Host Function for Improved Parachain Block Utilization
#43
Introduce storage_proof_size
Host Function for Improved Parachain Block Utilization
#43
Conversation
I also would like to raise the concern that the algorithm used to generate a proof must now be precisely spec'ced (at least on paper). I'm not completely sure of what I'm saying because it's been a long time since I've looked into this, but if I remember correctly Substrate sometimes generates sub-optimal proofs (i.e. some proofs include slightly too much information), for example IIRC when deleting a prefix. Adding this host function means that other host implementers must now reproduce the exact same undocumented optimizations and flaws as Substrate, but also that adding these optimizations later would be difficult as it would break backwards-compatibility. |
Parachains are necessarily free to chose their own storage proof algorithms.
Yeah, there is not necessarily a well defined storage proof size until the collator produces the PoV block, like adjusting the relay parent could change the size even if the block does not otherwise change. Any given PoV block does however have some well defined storage proof size, so this makes sense to the collator and polkadot, but not before. |
Is the new host function only accessible to parachain runtimes when executed by parachain hosts? Or must the relay chain host also implement it? If it's the former, then this change doesn't even need an RFC, as it would be an implementation detail of parachains. If it's the latter, then there's a contradiction here. |
Only parachain hosts need to implement it. The RC isn't involved into this. However, light clients (like smoldot) will may need for re-executing blocks or similar. |
This was my main motivator for opening this RFC. Even though this will not be included on the relay chain, light clients will need to have the host function. |
If you run PoV blocks then you'd want this host function. Could we maybe disable this host function when not running PoV blocks though? If smoldot runs a PoV block then yes sure it's expose this, but then it knows the size. If it runs a lone tx without full context, then it has nothing to expose here. In practice, this means Amusingly collators would not be validating one another's weight reclamation, but depending upon the relay chain for this. lol |
What do you mean by this? If a collator would build a block reclaiming too much weight, the block would fail to import on other collators. And fail |
So yeah, maybe an RFC was not necessary after all. If nothing changes here until next week, I will close here and merge the PR. |
I said that this doesn't need an RFC because parachains are in principle free to do whatever they want here. |
Yes they are, but smoldot would also need to support "whatever they are doing". We can also close this, but I think it is worth to have it here. |
/rfc propose |
Hey @skunert, here is a link you can use to create the referendum aiming to approve this RFC number 0043. Instructions
It is based on commit hash 30fa392f64d9942a5d0e089dfc938de05de34073. The proposed remark text is: |
My concern hasn't really been addressed:
This change would make it very hard (hard enough that it will likely never happen) to later fix the sub-optimal proof generation and to write an alternative implementation of a system parachain collator. |
The same proof generation is already used for the light client proofs. Thus, it is already "fixed" and I don't get your comment here. Or do you want to argue that you can change a networking version and give up on the old version faster? |
In some situations, proof generation can be really tricky. For example, if the runtime calls At the moment, you can generate a proof that contains unnecessary trie nodes and everything works fine. So it's fine to just include too many things in proofs "just in case", and as far as I understand (by looking at proofs while debugging smoldot) that's what Substrate sometimes does. But if we add a host function that returns the current size of a proof, then these unnecessary entries do matter when it comes to determinism. An implementation that includes too many things in proofs might treat some blocks as valid while an implementation that has optimized its proof generation will refuse these blocks. |
Okay, this indeed would be problematic. I was not aware of this behaviour. We should spec the proofs properly then. |
Again maybe I'm wrong, or maybe it's been optimized since then. My concern mostly comes from the fact that this is insanely complicated. Just from the top of my head:
In general trying to figure out what should be included in the case of |
Another approach would opaque newtype(s) vaguely resembling
After collation, In principle, A parachain could break these invariants using unsafe code, but they could do much worse anyways. |
I thought about this some more and do not think that this is actually a blocker for this RFC. Initially, I was under the impression that we were talking about a suboptimal storage-proof algorithm in substrate. But the cases you are referencing are hostfunction implementation related. If a hostfunction reads more than strictly needed, we include extra items in the proof. However, that is separate from the changes proposed here. If you want to implement a custom collator node today, you already need to be super careful. Since the parachain runtime contains the Not a great state to be in, but I believe the proposed hostfunction here does not change much compared to the current state. We should still correctly specify the host functions, but this sounds like a separate effort to me. |
I don't think that's accurate. Yes, implementers need to mimic the externalities and host functions behavior anyway, but all the existing host functions are very clearly defined (maybe not explicitly in a document, but they're all pretty straight forward and their corner cases is easily figured). The one introduced in this RFC is the first host function that has tons of obscure corner cases with what seems to me to be over-the-roof implementation mimicking difficulty. Your counter-argument also doesn't address the backwards compatibility concern. |
At some level, our whole goal here is to "reclaim" weight not really used by tx, as well as to have separate space and cpu weights. If we could do what you say then we could just adjust the weights in roughly the current weight system, but this obviously does not work correctly. Instead, we want the space weights to more accurately reflect the size in the final PoV block. If someone ships an optimization that improves space under niche conditions then the blocks from collators who upgraded should benefit, while the blocks from other collators should not benefit. This in inherently non-determinism is several ways: Some collators do not have the new code. Some collators have more CPU time to spend compressing the proof. Some parachain designed will result in some collators not even being able to build the proof. And all collators must consider the tx before they finish building the block. Importantly, there are different classes of non-determinism: (1) Any post-collation non-determinism is a treat to polkadot consensus, and must never be allowed. (2) Pre-collation non-determinism in parablocks feels really weird, and complicates some things parachains might do, but should basically be fine for typical parachains who simply follow relay chain consensus. (3) Now partial block non-determinism is annoying for UX, but it's also unavoidable for smart contracts, ala MEV. We'll inherently increase our existing non-determinism of type (3) and add new some non-determinism of type (2). Yes (2) requires some discussion, but if performance requires this then we should accept that and separately solve issues it creates for pre-finality grandpa. |
Yes, I know, and this will be the case, if the collators have the economic incentive to fill the block as much as possible, they have an interest in providing the smallest possible proof size to the runtime to be allowed to integrate more transactions and earn more fees.
Maybe I'm missing something, but I don't understand why it wouldn't work. It seems to me that the purpose of this RFC is only to optimize the use of the PoV/proof_size resource, not the cpu_time. With my proposal, if a collator generates more compact storage proofs, it's not a problem, because the relay and other collators and para full nodes won't need to regenerate the storage proof, they just need to read it and check its validity (as it is now). |
Incentives are irrelevant here: Hosts are upgraded at different speeds. Hosts have different hardware. etc. Adversaries can do whatever they can do which whatever non-determinism one permits, ala MEV.
A relay chain validator already sees whatever proof the collator built, so they've no need for extra numbers, unless those numbers wind up being pessimistic over estimations, akin to our current weights system. Parachain nodes cannot necessarily either see the full PoV block or compute the same storage proof, so they cannot produce correct values for the host calls, nor validate the numbers being inserted. We could simply accept the non-determinism at the parachain level since everything remains deterministic after collation at the relay chain level. We'd encapsulate this non-determinism behind opaque types which discourage miss-use. We'd warn parachains with unusual pre-collation consensus phases to either no use this mechanism, or else ensure it canno really break their consensus. |
This is very similar to what was originally proposed in the issue paritytech/polkadot-sdk#209 (comment) before we switched focus to the more "simple" solution proposed here. I guess it is an option to reevaluate the original approach.
The values delivered in the block would be used instead of the host calls. Its basically just the collator saying, "this is how much proof size was consumed at this point during block production.". Not sure how a collator would cheat by manipulating these number while going undetected. In the end, the weight during block import has to be correct and the PoV size has to be small enough for the relay to accept it. |
Alright, basti's comments there make more sense. Yes, this complicates determinism, and fancy parachain consensus, but acceptably I guess. If I understand.. We need this roughly host call in block building so that block producers know how much they've used. It'll often return non-sensical and/or non-deterministic values but that's unavoidable, and acceptable in block production. We do not need this host call after collation, aka in We're kinda stuck on what happens during parachain block imports. As @tomaka says, we do not have deterministic execution if we return meaningful values computed locally. Yet, we could theoretically make this deterministic for some storage types in the distant future, or maybe even in the near future if we handled upgrade regimes more strictly. We should not impose such discipline upon parachain teams though. We've seemingly three choices for import:
We dislike complicating fancy parachain consensus, which rules out 2 probably. Ain't much difference between 1 and 3, but 1 looks simplest, and avoids confusion. |
In your choices, 1 is just 3 without including the numbers in the block right? I don't think 1 can currently work. During import, nodes would need a way to calculate the overall consumed storage weight that goes into storage. If we don't deliver any values with the block, they will only have the benchmarked weights. This would also be a problem for the If we don't want 2 we should go with 3 IMO. |
In principle, there maybe advantages in leaving this up to the parachain team. I suppose 2 works fine for simple parachains. It's also basti's preference, or did he change his mind? If the host call returns If you do embed the values, then you could compare them with the amount read thus far during validation, so they must be correct or else the block becomes invalid. In principle, parachain nodes could simply trust one another, which risks spam, but the relay chain would never approve unless correct. Approach: You embed the values, do the comparison of read thus far in It's kinda annoying to embed a bunch of intermediate values when the relay chain only cares about the final total. |
This is not any valid argument. You could say the same about adding new host function or whatever, that they risk a netsplit. However, we still handle them gracefully. The thing that we need to change is that we make the "proof generation" of each host function a part of the specification of each host function. This means that if you want to roll out any kind of optimization, you would need to roll out a new version of the host function and then the proof generation is versioned in the same way as the host function itself. This means, as long as the runtime uses the old host function, the proof generation doesn't change. I don't see any problem in doing so.
This problem already exists right now and it doesn't change with this proposal. The only thing I'm seeing here is to double down on testing to ensure that this doesn't happen in production (or better, reducing the possibility). And as @skunert already said, the Cumulus based PVF implementation is already sharing the same implementation details as the node implementation. This would mean that if you do some cowboy change, the PVF would stop accepting these blocks on the relay chain as a small detail changed. This also means that we already have this requirement of keeping this stuff in sync/upgradeable.
While we don't need it for this RFC and we could workaround with the proposal mentioned above, there will be a point in the future when the "proof generation" will be part of the consensus. When we bring "runtime weight metering", we will need to use the exact same proof generation everywhere to know if a transaction is running over the announced limits. Thus, I don't see any reason why we should go for some other way here, if we already know that we need this in the future any way. With runtime weight metering we will also need to track the proof size to abort a transaction if it used too much proof size. This needs to be deterministic as otherwise collators will be able to "reject" transactions, but still getting paid for it because it is in the block. We could argue that you don't need this at block import and you use the recorded proof size from the block producer, but you would still need to run the same logic in
To make it clear again, this RFC is not touching any of the logic that is relevant to the relay chain. |
I agree to bastis points, the current proof algorithm is already deeply integrated and needs to stay. I think the original approach is still feasible, and its up to the parachain to decide whether to ever call this function or not. I will open this up for voting tomorrow if no blockers are raised. |
Afaik there is no problem here if doing 3, but I think @tomaka point is our "current proof algorithm" is not sufficiently deterministic for 2. That's not a show stopper, either fix the algorithm by eliminating the redundant copath elements, or do 3. In general, if you do 3 then you can adapt this to more storage schemes more easily. And our current trie should hopefully be completely replaced in the future. |
I'm personally not willing to retro-engineer Substrate in order to spec proof generation. I think it's insanely complicated, and I don't think people who have commented realize how complicated it is. |
I understand that it is complicated. But I don't think it is impossible. We could for example write some compliance tests in WASM that would ensure a host implementation matches what is expected. But that is just an idea on how to make it easier to test implementations. The goal for this RFC was to raise awareness that such a hostfunction might come and gather feedback. We got feedback and the hostfunction proposed has been improved as a result. In the end, we already established that parachain teams are free to do as they like. If they have a fancy custom storage scheme, they might decide not to call the hostfunction or not even include it. If they don't like the approach as described in paritytech/polkadot-sdk#209, they could even implement variant 3 (which would still require the presence of this hf). The solution is not perfect, but for a majority of the parachains I expect this to be a fitting mechanism. Up for vote: |
While I really need this RFC so that proof weight is not completely useless for parachains, I am also unsure if this RFC should be accepted as it is. |
We also clearly don't have enough participation on this RFC (and most other RFCs). I also expect approval from alternative node implementation teams. If they find this RFC contains enough details, well, sure, that means it is enough. |
You would need to specify what kind of alternative node implementations? The relay chain? If yes, the relay chain node is never aware of this host function as already discussed in here.
Yeah and thus we are waiting until the participation magically appears? I mean we can then also just shutdown everything and move on. |
this RFC have already identified stakeholders so we should be able to ask them to take a read but yeah I agree we will need to just continue the work after we tried to get feedbacks and failed. |
Who do you mean by this? I mean you are right, this RFC alone does not contain all the details about the storage proof to implement it in a non substrate client. But IMO the speccing of the proof generation does not need to be part of this RFC, it can be done in a separate document. As you said the proof weight has currently some limitations that should be addressed. Since all collator implementations are currently substrate based anyway we should move forward in my opinion. |
This is a snake biting its tail. This kind of RFC is exactly why it won't be possible to create another implementation. Please look at the low quality of the source code of Substrate. Do you think that Polkadot has a long term future if it keeps using this code? Every little change to the core takes years because there are literally less than 5 people in the world who are capable of working efficiently on Substrate.
I would agree if this separate document had been written before this RFC. We all know that nobody is going to make the effort speccing the proof generation. The idea of "speccing the proof generation in a separate document" is just a way to sweep the problem under the carpet and delegate it to "someone else", while we all know that there won't be a "someone else". |
This view seems overly pessimistic. Substrate has many people contributing and is solving non trivial problems. For a big project I would say it's in pretty normal shape.
To me it's pretty clear that it's on me and @bkchr to deliver this. I agree with you, no one is going to jump out of the bushes to pick this up randomly. |
/rfc process 0x6386c7812bbed74cde165d0a074e69fa1b5d6845bf8aa914276305baf6f3b240 |
/rfc process 0x6386c7812bbed74cde165d0a074e69fa1b5d6845bf8aa914276305baf6f3b240 |
The on-chain referendum has approved the RFC. |
/rfc propose |
Hey @rzadp, here is a link you can use to create the referendum aiming to approve this RFC number 0043. Instructions
It is based on commit hash 30fa392f64d9942a5d0e089dfc938de05de34073. The proposed remark text is: |
paritytech/polkadot-sdk#1462 has been open for a while and is now well-reviewed. Opening an RFC since it introduces a new host function that is relevant for parachains and light-clients.