Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(tree): safety check for all node types #422

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

weiihann
Copy link
Contributor

@weiihann weiihann commented Dec 5, 2023

This PR adds a safety check whenever looping through an InternalNode's children.

There are 2 parts to this PR:

1. Nil check
From my understanding, an InternalNode's child should be either the following: InternalNode, LeafNode, Empty, HashedNode, or UnknownNode. Hence, a child who is nil is not accepted, and should be considered an abnormal behavior. Based on the current codebase, the only place that can change an InternalNode's children to nil is the SetChild function. Hence, we should explicitly check for them and panic as soon as they are encountered.

2. Node type switch
There are plenty of places that do something like if _, ok := c.(Empty); !ok , which implies that any node type other than Empty is true. This isn't necessarily the true intention, because we also wouldn't want to deal with HashedNode and UnknownNode. Adding a node type switch ensures that we only want to deal with the correct node type (i.e. usually InternalNode and LeafNode only).

Copy link
Collaborator

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This goes in a good direction.

I left many comments asking for some changes, and also some tangent comments for @gballet.

@@ -608,8 +613,14 @@ func (n *InternalNode) Delete(key []byte, resolver NodeResolverFn) (bool, error)
// signal that this node should be deleted
// as well.
for _, c := range n.children {
if _, ok := c.(Empty); !ok {
switch c.(type) {
case *InternalNode, *LeafNode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't HashedNode be part of the first case? If HashedNode it means there's an (unresolved) internal or leaf node. (i.e: so not part of the second case but first).

Comment on lines +620 to +621
case UnknownNode:
panic(errUnknownNodeType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't panic here but return an error. This situation happening isn't a bug, but someone trying to operate in a state tree rebuilt from a proof.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that case is a valid tree. If anything, one should try to resolve the node.

Comment on lines +622 to +623
default:
panic(errNilNodeType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points about this:

  • I don't think we should panic, but return an error. This might be debatable since this should never happen. But lately we usually prefer to return an error so the node can shutdown gracefully instead of panicking.
  • I think the error shouldn't be errNilNodeType but (the existing) errUnknownNodeType. We usually add default: like this to catch cases in the future were we add (for some reason) a new node type (e.g: OptimizedLeafNode) and we forget to handle properly in these switches. In that case, we'd be returning errNilNodeType which wouldn't be correct. (The nil type could be interpreted as a special case of errUnknownNodeType)

Comment on lines +663 to +666
case UnknownNode:
panic(errUnknownNodeType)
default:
panic(errNilNodeType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as before, I'd take UknownNode and default as errors, not panics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil is also valid, although deprecated. feel free to put a warning in that case, but it's no cause for error or panic.

@@ -329,9 +329,14 @@ func (n *InternalNode) Children() []VerkleNode {

// SetChild *replaces* the child at the given index with the given node.
func (n *InternalNode) SetChild(i int, c VerkleNode) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gballet, we should plan to remove this SetChild(...) since it opens the door for external clients to nuke a lot of invariants on how the tree is constructed.

I've looked where this is used in geth, and is for the Verkle Tree iterator to "resolve" a HashedNode and then call SetChild(...) to put the resulting Internal/Leaf node. I believe we can resolve this in a simpler way than requiring this super powerful SetChild(...) thing.

LMK your thoughts. This isn't urgent in anyway, but if makes sense we can track this by creating an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ok, PRs welcome :)

return nil, nil, nil, err
}
n.children[i] = c
case *InternalNode, *LeafNode, Empty, UnknownNode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, this UnknownNode case sounds it should be an error.

@gballet, unfortunately, I think in L920 (below) the c.Commitment() call would be incorrect. See here. An unknown node shouldn't have an empty commitment, that's not correct since we don't really know what the commitment is. (Can't assume is empty). I think we don't have other chance than panicking there. And here in this switch, separate UnkownNode case in returning an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, or at least return an error. We don't want all the nodes on the network to panic. The error allows us to quit without destroying the db.

@@ -972,8 +988,14 @@ func (n *InternalNode) Serialize() ([]byte, error) {
// Write the <bitlist>.
bitlist := ret[internalBitlistOffset:internalCommitmentOffset]
for i, c := range n.children {
if _, ok := c.(Empty); !ok {
switch c.(type) {
case *InternalNode, *LeafNode:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a mistake here, it should also include HashedNode right? Since that signals a leaf or internal node is present. (Note that the previous code contemplated this case correctly).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was previously true, but now the serialization of an internal node is simply its commitment, so it doesn't matter.

Comment on lines +995 to +998
case UnknownNode:
panic(errUnknownNodeType)
default:
panic(errNilNodeType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return errors.

Comment on lines +1748 to +1751
case UnknownNode:
panic(errUnknownNodeType)
default:
panic(errNilNodeType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch to return an error by changing this API to also return an error?
I've checked and the caller already returns an error so it can bubble it without breaking APIs for clients.

Comment on lines +1766 to +1769
case UnknownNode:
panic(errUnknownNodeType)
default:
panic(errNilNodeType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return errors.

Copy link
Member

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry that it took me forever to review this. I don't know if you're still interested in pursuing this PR, the base branch has drifted a bit, but if you are then I'll be quicker to review it.

@@ -329,9 +329,14 @@ func (n *InternalNode) Children() []VerkleNode {

// SetChild *replaces* the child at the given index with the given node.
func (n *InternalNode) SetChild(i int, c VerkleNode) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ok, PRs welcome :)

break
case Empty, HashedNode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are trying to figure out if a node is empty here, to see if it can be deleted. So Empty should definitely not panic, it should continue looping. Same thing with HashedNode, except that in this case it means that the node is not empty. So it should be part of the break case like Ignacio says.

Comment on lines +620 to +621
case UnknownNode:
panic(errUnknownNodeType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that case is a valid tree. If anything, one should try to resolve the node.

case Empty, HashedNode:
case UnknownNode:
panic(errUnknownNodeType)
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nil is a valid input, it should not panic. You are welcome to print a debug message to point out that this is deprecated, and that the code should be fixed, but panicking is too dangerous.

n.children[i] = HashedNode{}
case Empty, HashedNode:
case UnknownNode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't panic here!

  • if the node is unknown or hashed, it means that it hasn't been resolved, so we don't need to resolve it and we can safely ignore it.
  • if the node is empty, there is nothing to flush. This is fine, the tree is completely valid.

Comment on lines +663 to +666
case UnknownNode:
panic(errUnknownNodeType)
default:
panic(errNilNodeType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil is also valid, although deprecated. feel free to put a warning in that case, but it's no cause for error or panic.

return nil, nil, nil, err
}
n.children[i] = c
case *InternalNode, *LeafNode, Empty, UnknownNode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, or at least return an error. We don't want all the nodes on the network to panic. The error allows us to quit without destroying the db.

@@ -972,8 +988,14 @@ func (n *InternalNode) Serialize() ([]byte, error) {
// Write the <bitlist>.
bitlist := ret[internalBitlistOffset:internalCommitmentOffset]
for i, c := range n.children {
if _, ok := c.(Empty); !ok {
switch c.(type) {
case *InternalNode, *LeafNode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was previously true, but now the serialization of an internal node is simply its commitment, so it doesn't matter.

Comment on lines +1020 to +1022
if child == nil {
panic(errNilNodeType)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log a warning, don't panic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants