Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test zombienet-cumulus-0005-migrate_solo_to_para #7105

Open
skunert opened this issue Jan 9, 2025 · 2 comments
Open

Failing test zombienet-cumulus-0005-migrate_solo_to_para #7105

skunert opened this issue Jan 9, 2025 · 2 comments

Comments

@skunert
Copy link
Contributor

skunert commented Jan 9, 2025

zombienet-cumulus-0005-migrate_solo_to_para has been failing consistently.

File:

Expectation

This test spawns two parachains with the same para id but different genesis, dave and eve. Node dave will produce blocks and after a few blocks output custom head_data through the PVF. This head_data contains the header that matches node eves parachain, which should now start producing blocks.

Reality

Dave starts producing blocks, but after the custom head data is submitted, eve does not start producing blocks. Instead dave continues to produce blocks.

Investigation

The culprit seems to be #4880, before that PR the test was passing consistently. The question is why the custom head data presented by dave is not set as included head.
In the logs the following messages appear a lot:

2025-01-09 17:56:06.013 DEBUG tokio-runtime-worker parachain::collator-protocol: Collation having parent head data hash 0x9fd4…10d8 is blocked from seconding. Waiting on its parent to be validated. candidate_hash=0xe666001ec78a5546c0ee6ad7723ebcb34ef3e3245fc752c5ed9bfa386c17f214 relay_parent=0x045189b697186a3ea5f0ae67aba091a1ceb7c760590b6036a5ddca6d0f593a5a traceID=306252055748700830037795645451537398963

Apparently some seconding limit was reached and the block in question not included. Modifying the lookahead collator here to build always only a single block fixes the issue and makes the test pass:

cc @tdimitrov any ideas what is happening here?

@tdimitrov
Copy link
Contributor

With the collation fairness (the PR you linked) we limit the number of collations a collator can provide up to the number of claims for its para id in the claim queue. So the collator is building two blocks per relay parent no matter what and they are built on top of each other(right?). In this case sooner or later a collation will be dropped and the rest of the collations will be unbackable, because of the missing candidate.

DQ but solo chain (Dave) means just a single collator node producing blocks on its own. Is this correct? If yes - there are no validators to throttle it and that's why it is producing blocks.

I haven't run the test locally but I'm a bit puzzled why Eve doesn't produce blocks at all. I think it should produce at least one or two blocks depending on the claim queue and async backing parameter values.

I also believe the test will work fine with --experimental-use-slot-based flag. Is this acceptable?

@skunert
Copy link
Contributor Author

skunert commented Jan 10, 2025

So the collator is building two blocks per relay parent no matter what and they are built on top of each other(right?).

The way this works is that the lookahead collator is producing two blocks per sslot until the unincluded segment is full. Then it will produce one block per relay.

DQ but solo chain (Dave) means just a single collator node producing blocks on its own. Is this correct? If yes - there are no validators to throttle it and that's why it is producing blocks.

Dave is a normal parachain here. Eve will stay at 0 and not produce blocks until dave has emitted its custom header from the pvf. This is expected. Eve does not find its own included header on the relay chain. Basically dave plants it there, then dave stops production and eve starts.

I also believe the test will work fine with --experimental-use-slot-based flag. Is this acceptable?

The main problem I am seeing is that this is a relay chain side change. So if this goes live it will cause this situation for all live parachains with the lookahead right? We don't have control when parachains will upgrade.

I am not familiar with the changes done in that PR, so the question is, will this cause problems for running collators?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants