Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZKR_DB_KEY_NOT_FOUND while proving batch 75948 on Bali #1482

Open
ToniRamirezM opened this issue Nov 15, 2024 · 9 comments
Open

ZKR_DB_KEY_NOT_FOUND while proving batch 75948 on Bali #1482

ToniRamirezM opened this issue Nov 15, 2024 · 9 comments
Assignees
Labels
bug Something isn't working High Priority

Comments

@hexoscott
Copy link
Collaborator

Some notes from the investigation so far:
The problem is with batch 75948 - looking at the batchL2Data for this batch there is no L1 info tree index change. This is important because the missing slot is related to writing the L1 Block Hash to the GER manager contract. This is only written to if we use a new index at the sequencer.

The prover logs show this as the missing information from the witness:

A=0:0:0:a40d5f56:745a118d:906a34e:69aec8c0:db1cb8fa B=0:0:0:0:0:0:0:3 C=17f78b8d:30f15ffe:e2432660:db3ab8af:ebdac70d:db400edd:b329e010:4952e40e D=0:0:0:0:0:0:0:20 E=0:0:0:0:0:0:0:58

I ran a binary search for when this slot was changed on Bali and it was updated in block 6958864. This block belongs to batch 74981, so 966 batches prior to the one the prover is having trouble with which seems strange that it would need this slot at all.

My hunch at the moment is that this issue is somehow related to the info tree index changes. The problem on the prover stems from the changeL2Block routine which seems to confirm this theory.

@Sharonbc01 Sharonbc01 added bug Something isn't working High Priority labels Nov 15, 2024
@Sharonbc01
Copy link

Agreed to close this issue on 11/19.
Francesc can't reproduce the prover issue. execution worked in prover mode.
Toni also ran it again and it worked.

@ToniRamirezM
Copy link
Author

ToniRamirezM commented Dec 11, 2024

Reopening the issue as it happened again on Bali for batch 79374. First batch on forkID 12, but this happened previously on forkID 11, so it looks like it is not related.

Prover logs and witness can be found here: https://gist.github.com/ToniRamirezM/32c38a587cbd2eae9a740caab8d9611b

@Sharonbc01
Copy link

Moving this from current milestone as not required for milestone.

@Sharonbc01 Sharonbc01 removed this from the v2.61.x (forkid.13) milestone Dec 16, 2024
@Sharonbc01
Copy link

Hi @fractasy did you gain nay more insight on this issue re thread here https://0xpolygon.slack.com/archives/C05SVKVLZ1A/p1733940581915119

cc: @ToniRamirezM

@ToniRamirezM
Copy link
Author

ToniRamirezM commented Jan 8, 2025

During OKX upgrade to Fork 13, the first batch after the upgrade caused the same error. Again, adding the L2 GER contract address in the zkevm.witness-contract-inclusion configuration param in Erigon solved it, but it generated issues due to the size and time needed to generate the witness.

I think knowing it seems to always happen after a fork upgrade gives more information an opens a new investigation path.

@hexoscott
Copy link
Collaborator

My hunch here is that the problem relates to an info tree index change after the network has been offline for a little while. Whilst the network is offline some more info tree index changes reach a finalised state so the sequencer immediately makes a jump on the first block it creates. Why this creates a problem in the witness I'm not sure of, but it's a starting point for investigation.

@ToniRamirezM
Copy link
Author

Comments from @krlosMata:

I think the issue could be related to the re-usage of the l1InfoTree Index, as @hexoscott mention.
When an l1InfoTreeIndex = A is used, the 0xa40D5f56745a118D0906a34E69aeC8C0Db1cB8fA is updated with the mapping GER <--> blockHashL1

If the same l1InfoTreeIndex = A is used again, the ROM reads the mapping GER <--> blockHashL1 and checks if blockHashL1 !=0. If it is 0, it is written. If it is not, it means that it has already been written, so it does not write it again.
I guess the witness needs that KEY to just read the mapping but somehow Erigon knows that a re-usage of a l1InfoTreeIndex does not write it, so it does not include it in the witness.

A fix could be to always include that KEY when there is a != 0 l1InfoTreeIndex, even if it is re-used

@hexoscott
Copy link
Collaborator

I believe this is the issue based on the feebback from krlosMata. the index re-use is only a thing when migrating an RPC datadir to the sequencer where the last key is used again so that the sequencer can have it's own record of where it's at. No problem protocol wise, but because we don't write it to storage the 2nd time around it's missing in the witness.

The good news is that we can fix this in place using historical data to ensure the missing key does indeed show after making some code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High Priority
Projects
None yet
Development

No branches or pull requests

5 participants