-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/tracing: state journal wrapper #30441
base: master
Are you sure you want to change the base?
Conversation
core/tracing/hooks.go
Outdated
NonceReadHook = func(addr common.Address, nonce uint64) | ||
|
||
// CodeReadHook is called when EVM reads the code of an account. | ||
CodeReadHook = func(addr common.Address, code []byte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question: should we add codeHash here to be consistent with OnCodeChange
?
Ah seems like the journal has a crasher:
|
core/tracing/CHANGELOG.md
Outdated
|
||
### New methods | ||
|
||
- `OnReorg(reverted []*types.Block)`: This hook is called when a reorg is detected. The `reverted` slice contains the blocks that are no longer part of the canonical chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here types block is very very heavy. You should at most pass headers and allow chain access to pull the blocks on demand (chain access in someconstructor, ha)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought what is the issue? it is a slice so passed by reference and the memory can be freed as soon as OnReorg
processing is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, this is annoying. So reorg in the blockchain at some point in the past used to collect blocks. Turned out that sometimes it became insanely heavy and we've switched so it operates on headers. I guess later someone refactored it back to operate on blocks again. This is an issue when you do setHead or any similar operation; of even if finality fails for a while and you have blocks reorging back and forth. It's very very bad to pull all the block in from disk IMO.
CC @holiman @rjl493456442 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I don't particularly recall switching from headers to blocks....
core/tracing/hooks.go
Outdated
@@ -194,6 +221,30 @@ type Hooks struct { | |||
OnCodeChange CodeChangeHook | |||
OnStorageChange StorageChangeHook | |||
OnLog LogHook | |||
// State reads | |||
OnBalanceRead BalanceReadHook | |||
OnNonceRead NonceReadHook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question from triage: how exactly is OnNonceRead used?
core/tracing/hooks.go
Outdated
} | ||
|
||
// Copy creates a new Hooks instance with all implemented hooks copied from the original. | ||
func (h *Hooks) Copy() *Hooks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method needed? It's not obvious to me what the side-effects are. Typically, the hooks might be closures, and the closures are still referenced as if they were not copied.
func TestHooks(t *testing.T) {
counter := 0
a := &Hooks{
OnClose: func() {
counter++
},
}
a.OnClose()
t.Logf("counter is %d", counter)
a.Copy().OnClose()
t.Logf("counter is %d", counter)
}
outputs:
hooks_test.go:13: counter is 1
hooks_test.go:15: counter is 2
I'm curious why you'd ever need to use this Copy
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because you want to copy all, but not have to specify all manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, the hooks might be closures, and the closures are still referenced as if they were not copied.
Right you are correct I had not foreseen that. After thinking a bit, it feels like for my use-case that is fine. Essentially the clone will replace some of the methods to add some pre-processing logic. The rest are supposed to execute as in the original tracer.
I have un-exported the Copy method to avoid people to shoot themselves in the foot and added a comment to clarify this point.
Edit: right what I want is to add the journal in front of the tracer and process some of the hooks first before proxying back to tracer. And this without having to iterate the list of all hooks which I find very error-prone. I have already had to fix bugs because of missing some hook in there.
@@ -172,6 +173,9 @@ type ( | |||
|
|||
// LogHook is called when a log is emitted. | |||
LogHook = func(log *types.Log) | |||
|
|||
// BlockHashReadHook is called when EVM reads the blockhash of a block. | |||
BlockHashReadHook = func(blockNumber uint64, hash common.Hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use-case is to have access to the headers of hashes that are accessed by the EVM. Alternative would be if we added a GetHeaderByHash method somewhere. But getting the hash from OnOpcode is also tricky since the hash will be put on the stack after OnOpcode is invoked.
core/tracing/journal.go
Outdated
type journal struct { | ||
entries []entry | ||
hooks *Hooks | ||
lastCreator *common.Address // Account that initiated the last contract creation | ||
|
||
validRevisions []revision | ||
nextRevisionId int | ||
revIds []int | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the linearJournal
in my PR #30660 is IMO a better base to start from. It does away with validrevisions and revIds, instead it just as a list of indexes, revisions
, which point to an entry.
type linearJournal struct {
entries []journalEntry // Current changes tracked by the linearJournal
dirties map[common.Address]int // Dirty accounts and the number of changes
revisions []int // sequence of indexes to points in time designating snapshots
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at #30660 and agree it is a better way to do journaling for tracers. The key point for me there is that there will be only 1 revert hook emitted as opposed to one for each change to a state element.
Given that #30660 seems to be still in flux I like to wait on it to be merged and implement it for tracers in a future PR as an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These points were discussed at standup:
- It looks like we will need to change the model a bit with the set-based journal as it operates on accounts, requiring also a new hook like
OnAccountReverted
and the inconsistencies there around emitting state changes on the field level and the reverts being on the account level. - The point was raised by @holiman that we are exposing a behaviour to tracers that will be hard to revert. The behaviour in question is the reverse of every state change (i.e. if Balance: A->B->C, we emit Balance: C->B->A on revert instead of just Balance: C->A).
On the second point I'd like to add my perspective: This journal is simply a wrapper around the tracers. We are not changing tracing interface semantics at all. Users can copy this file and run it themselves right now. And we are exposing every state modification, and I believe the reverse of it is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we will need to change the model a bit with the set-based journal
I don't get it. The two PRs, the two journals are unrelated. You have opted to copy-paste the legacy linear journal-implementation. I think the new linear journal-implementation is better/simpler.
You could also have chosen to copy-paste the set-based journal-implementation. I don't care which you choose, really, but I don't see any point in picking one now and switching later. If you want another, pick that one from the get-go ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can copy this file and run it themselves right now.
They can't perform WrapWithJournal
from "user-space," can they? Isn't that what makes this a big "blessed" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can't perform WrapWithJournal from "user-space," can they? Isn't that what makes this a big "blessed" ?
Right exact copy wouldn't work. They'd have to implement WrapWithJournal locally, but it's totally possible.
I don't care which you choose, really, but I don't see any point in picking one now and switching later.
I think the current journal is good, it is consistent with the existing interface, and it has been running in geth for quite a while.
core/tracing/journal.go
Outdated
validRevisions []revision | ||
nextRevisionId int | ||
revIds []int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why you need to track this. Why not just maintain a list of entries, and you hand out the id
which is the current length of the entries?
core/tracing/journal.go
Outdated
type journal struct { | ||
entries []entry | ||
hooks *Hooks | ||
lastCreator *common.Address // Account that initiated the last contract creation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a hack. I don't see how this can be accurately updated going forward and backward along the entries. I mean, an inner scope will overwrite the outer lastCreator
, and when the inner scope is reverted, the lastCreator
will not be set back correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if we're inside a creation, and inside the constructor we call ripemd to calculate a signature: we lost lastCreator.
core/tracing/hooks.go
Outdated
dstValue.Field(i).Set(field) | ||
} | ||
} | ||
return copied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not equivalent to
copied := *h
return &copied
@@ -163,6 +171,9 @@ type ( | |||
// NonceChangeHook is called when the nonce of an account changes. | |||
NonceChangeHook = func(addr common.Address, prev, new uint64) | |||
|
|||
// NonceChangeHookV2 is called when the nonce of an account changes. | |||
NonceChangeHookV2 = func(addr common.Address, prev, new uint64, reason NonceChangeReason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The V2
isn't helpful, we should use NonceChangeHokWithReason
. Function names should be explict in what they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have chosen this naming scheme explicitly. The intent is communicating which version of the hook is the newest one. When we introduce a new hook version, the old one becomes deprecated and will eventually be unsupported. Also, if we were to introduce another revision of this hook, would it be called NonceChangeHookWithReasonAndBellsAndWhistles
? The *Vx
naming scheme avoids this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the old one is removed, you can remove the WithReason
part, but in the meantime, you know at a glance why the two methods differ. I don't think there's a big risk of that function being rewritten many times and, therefore, having the names getting longer and longer. But sure, just giving my opinion on what could make the code readable, not going to hold the release for that one 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot rename the function because it is a stable, user-exposed API. If we could just change it, we wouldn't go through the trouble of having multiple versions of the hook.
Here we add some more changes for the live tracing API:
OnSystemCallStartV2
is introduced withVMContext
as parameter.GetCodeHash
is added to the state interfaceWrapWithJournal
construction helps with tracking EVM reverts in the tracer.