Define IBC communication #49

ethanfrey · 2023-06-06T08:20:44Z

Closes #7

Document different communication channels (control, rewards, slashing)
Define approach for reward payments
Rough description of how to handle slashing
Define IBC control message types
Define theoretical basis for serializability
Define validator set as CRDT
Add support for tendermint pubkey rotation
Define rules around valid staking transitions
Argue correctness of staking rules

docs/ibc/Staking.md

alpe · 2023-06-06T14:41:52Z

docs/ibc/Staking.md

+
+For `Unstake`, we should update a local "unstaking" value on the `(user, validator)`
+staking info, but not create a claim nor apply a diff to the validator. 
+We ensure that this "unstaking" amount can never be larger than the properly staked


can never be larger than the properly staked ... value

I think in an edge case with concurrent slashing, this could be possible. But the operation should fail on the consumer side and play this back to the provider.

Can you expand on this point?

Consumer and. provider side can become out of sync on the staked amount when slashing happens.
Slashing is originated on the consumer side and reduces the effectively staked amount. It will take some time to propagate the event to the provider side. During this period both sides are out of sync and the real stake amount is < provider side stake amount.

Oh, that is a very good point. We need a much longer section on slashing. But I will include a space to dig into this point more.

docs/ibc/Staking.md

packages/apis/src/ibc.rs

alpe · 2023-06-07T06:45:59Z

packages/apis/src/ibc.rs

+pub enum ProviderMsg {
+    /// This should be called on initialization to get current list of validators.
+    /// Any changes to the set should be sent as a ConsumerMsg::UpdateValidatorSet
+    ListValidators {},


I was wondering if we need to fetch and store a copy of the whole valset on the provider chain. With an optimistic approach, we would only store the validators that received delegates from the users (that were acked by the consumer side). That would reduce complexity in the system.
The frontend would need to read the full valset from the consumer chain directly. But this part is not clear to me.

I wrote a bit about validation in: osmosis-labs/mesh-security-sdk#28 (comment)

With an optimistic approach, we would only store the validators that received delegates from the users (that were acked by the consumer side). That would reduce complexity in the system.

You are right. We don't need the info except for ones with stake, so we can slash.

But after reading your issue on validation, I figured it was best to enforce the validator that would receive delegation before creating the packet. It's not dangerous to delegate to an invalid validator address, it will just return an error in ack. But the UI experience is much easier to show error in transaction (a few seconds after signing, when user is on the page), then error in ack (which may occur a minute or two later, maybe they closed the app already).

This is more for application design than enforcing correctness of the protocol.

maurolacy

Good docs / comments. Just some syntax / style comments.

maurolacy · 2023-06-07T07:04:56Z

docs/ibc/Overview.md

@@ -89,4 +89,5 @@ Read more about the mechanics of the [Slashing Evidence Handling](./Slashing.md)

 ## Trust Assumptions

-**TODO** list between the consumer and the provider
+The [top-level introduction covers many concerns](../README.md#common-concerns)


Move those this section? Separate common concerns from IBC specific ones?

docs/ibc/Rewards.md

docs/ibc/Slashing.md

docs/ibc/Staking.md

ethanfrey · 2023-06-07T08:46:11Z

Okay, I cleaned up the smaller issues here, maybe good starting point now.

I will make a new PR on top of this to dig deeper into correctness of messages and unordered/CRDT items

maurolacy

Partial review of Overview and Validators. Will continue during the day.

docs/ibc/Overview.md

docs/ibc/Validators.md

maurolacy

Some more comments / syntax suggestions. Will review Serializability next.

packages/apis/src/ibc.rs

docs/ibc/ControlChannel.md

docs/ibc/Staking.md

maurolacy · 2023-06-09T06:40:15Z

docs/ibc/Staking.md

+A staking operation would have the following steps and checks:
+
+* Send a lien from `vault` to `external-staking` contract
+  * Ensure there is sufficient collateral to cover max(lien)
+  * Ensure there is sufficient collateral in sum(potential slashes)
+  * Increase the lien for that given user on the `external-staking` contract
+* Add a delegation in `external-staking` contract
+  * Increase stake count for (user, validator)
+  * Increase total stake on the validator
+  * Increase the user's shares in the reward distribution
+* Send an IBC packet to inform the Consumer
+  * Guarantee we can commit all above on success
+  * Guarantee we can rollback all above on error
+
+An unstaking operation would have the following steps and checks:
+
+* Remove a delegation in `external-staking` contract
+  * Ensure stake count (user, validator) is set and greater than desired unstake amount
+    * Ensure total stake on the validator is set and greater than desired unstake amount (should always be true if above is true)
+  * Decrease stake count for (user, validator)
+  * Decrease total stake on the validator
+  * Decrease the user's shares in the reward distribution
+  * Add an entry to the (user, validator) "pending unbonding" to withdraw said tokens after the unbonding period.
+* Send an IBC packet to inform the Consumer
+  * Guarantee we can commit all above on success
+  * Guarantee we can rollback all above on error


Looks good, but we must be really sure we're not missing some scenario / combination of operations here.

Perhaps a simpler initial approach could be, to allow for only one in-flight operation at any given time, and reject new operations until the pending one is completed, either successfully or with error.

That is the locking approach.

We can't just reject other IBC packets, but any other transaction that touches the same keys.
That means, we cannot pay out rewards to that validator or make/remove another lien from that user to anyone.

That is correct, but not efficient. Please read discussion in Serializability and we take this up.

maurolacy

More syntax suggestions and comments.

docs/ibc/Serializability.md

maurolacy · 2023-06-09T08:02:10Z

docs/ibc/Serializability.md

+### Value range
+
+One idea to implement this would be to not just store one value (the balance), but a range of values, covering possible values if
+all in-flight transactions were to succeed or fail. Normal transactions (send 100 tokens) would update both values and error if either
+one violated some invariant (too high or too low). When an IBC transaction is initiated, it would execute eg. "maybe 200", which would
+attempt to decrease the min by 200 but leave the max unchanged (requiring this range to remain valid).
+
+This approach would only work by values not used by complex forumulas, such as balances of an AMM (we can't calculate prices off ranges),
+or paying out staking rewards (the value received by all other users depends on your stake, and we can't propogate this to all those accounts,
+as it would be prohibitively expensive to collapse them all when the transaction is committed or rolled-back).
+
+But for counters with comparisons, increments, and decrements, it could work well. Even enforcing logic like
+"total amount of collateral is greater than the max lien amount" could be enforced if collateral and lien amounts
+were value ranges. In this case, the max of (`lien.max`) would be compared against `collateral.min`.
+With some clever reasoning, we could possibly enforce such value ranges without actually storing multiple fields.


This looks like a good idea.

One concern I have is that value ranges will need to be cumulative. That is, a value range (or at least, part of it) of a given transaction would depend on the value range of the other un-finished transaction(s).

And if I'm not mistaken, this introduces the issue of ordering again. For each transaction, we would need a set of value ranges, reflecting each of the possible ways the sequence of current pending transactions (including this one) can finish. I mean, not only if they succeed or fail, but also, all of the different orderings or sequences in which they can succeed / fail.

And then we need to check if any new transaction "passes". That is, if the (updated) set of value ranges of all the pending txs, including the new one, is or will still be valid.

The value range set cardinality is O(N!) in the number of pending txs. So, only feasible for a small number of pending txs. This is not a big deal, as we can reject txs upfront if a given threshold (5 or so) of pending txs is reached.

The set will also need to be maintained / updated not only when a new tx arrives, but also when a pending one finishes.

Will think more about it.

We don't need a huge set, as it is just min, max for the range.

Range(100, 100) + Maybe(+200) + Maybe(-50) => Range(50, 300)

Range(50, 300) + Rollback(+200) => Range(50, 100)

Range(50, 100) + Commit(-50) => Range(50, 50)

As I went further, I think I would use these ranges mostly as theoretical constructs to evaluate and then use locks if needed.

All queries accessing the said value would need to be range-aware, which is more complex than making them lock-aware. A lock is basically "get access or error". A range means all operations (sum, comparison, etc) would need to check multiple values.

But yeah, no need for O(N!)

But maybe a good idea to have a set of (caller, Maybe) for all pending tx to validate if Rollback or Commit can be safely called.

We can keep this a tool for analysis for now and consider actual Rust implementations if needed in the protocols.

What about

Range(100, 100) + Maybe(+200) + Maybe(-150) => Range(-50, 300)

?

That would fail in one order, but succeed in the other.

Oh, I see now that your approach makes sense, as that simple approach will guarantee that all possible orderings succeed (which is what I was going after in my comment above).

Whereas

Range(100, 100) + Maybe(-50) + Maybe(-60) => Range(-10, 100)

will fail. Nice.

Balances and similar will also be undefined during pending transactions. They will be value ranges, basically.

Yes, adding the Maybe(-150) would fail, as that would enter a non-commutative state (where there exist different orderings that would provide different results)

BTW, I started #52 to implement the read/write lock primitives as actual structs that can be used in cw-storage-plus.

If you feel inspired to start modeling value ranges, feel free. (Just let me know before you start coding in case I grab it)

See #53 Man, it feels good to code again...

ethanfrey · 2023-06-09T08:39:23Z

docs/ibc/Staking.md

+validator, which iterates over all liens to find the sum of max slashing, will be blocked until the first transaction completes.
+We cannot actually block (or wait) transactions in this way, so we must return an error for the second transaction.
+
+### Idea: Approximating Locks


My main question here is if we can consider this optimization safe?
Or if we should use a full locking operation here as well?

I'd love some feedback on that

Can't review now, but will take a look later for sure.

@alpe Your comments above about optimisitic liens are the topic of this section.

Please read it and let me know how it can be improved.

alpe

I have not had a chance to read all docs but here is some feedback already

alpe · 2023-06-09T07:39:32Z

docs/ibc/ControlChannel.md

+In order to make the protocol documents more compact,
+[all theoretical foundations are described separately](./Serializability.md).
+Please read through that document and have a decent grasp of those concepts before
+digging into the sub-protocols below.


thanks for helping navigating through the documents 👍

alpe · 2023-06-09T07:53:09Z

docs/ibc/Serializability.md

+acks on the sending chain must be also be A, B, C). This is exactly the guarantee that ordered channels provide.
+
+On top of this, we have to ensure that no other transactions conflict with any open IBC transactions. Transaction A is "open" from the
+time the first logic is run on the sending chain (which will send the IBC message) until the ack is fully process and committed on


Suggested change

time the first logic is run on the sending chain (which will send the IBC message) until the ack is fully process and committed on

time the first logic is run on the sending chain (which will send the IBC message) until the ack is fully processed and committed on

alpe · 2023-06-09T08:04:27Z

docs/ibc/Serializability.md

+2. Process Packet on Receiving Chain: Aquire all read/write locks on all data, process data, release all locks. This goes from the "growing" phase to the "shrinking" phase.
+3. Process Ack on the Sending Chain: Process ack, and release all locks. This is the "shrinking" phase.
+
+If we guarantee that we never read any data that is not under a write lock, we can release the read locks at end of "start tx", as they are not needed.


This sentence was a bit hard to understand for me. It helps to read the examples below. I got confused by the motivation for write locks on any data "read only".

alpe · 2023-06-09T08:06:15Z

docs/ibc/Serializability.md

+some global invariants before initiating a transaction. However, since all changes to the given data is initiated in the provider
+blockchain, and we have a complete view of currently in-flight transactions, this approach could work for IBC protocols.
+
+We can say that ICS20 implementation does something like this. As the only invairant is that the value never goes below zero,


Suggested change

We can say that ICS20 implementation does something like this. As the only invairant is that the value never goes below zero,

We can say that ICS20 implementation does something like this. As the only invariant is that the value never goes below zero,

alpe · 2023-06-09T08:26:17Z

docs/ibc/Staking.md

+  - else `D[V] -= X`, return success ack
+
+This is a pretty straightforward counter with a lower bound of 0, along with an increment and decrement
+counters.


Good use of sharding by validator!

I was thinking that we technically could even relax the counter constraint of 0 to become negative on the consumer side.
If I have D(a, 100), U(a,50), U(a,50), the result would eventually become 0 in any order. A negative value should just be stored and not trigger any action.
This builds upon the assumption that the provider side keeps a balance of in flight data and never "over undelegates".

On a different view, this is on a validator and not user shard, therefore a buffer will be built up quickly. A hard 0 constraint can act as a circuit breaker against draining as well.

I am very hesitant to relax invariants ever, and start handling negative numbers when an epoch hits.

The provider should not even allow the U(a, 50) action to be initiated on that provider side, until it is sure the D(a, 100) is committed. What happens if the first one fails for some reason?

With full CRDTs, we assume nothing will ever be rejected, and all operations will be applied.

With this distributed transaction model, we need to ensure consistency. Minimally, no transaction can read data from an in-flight transaction - meaning the delegation doesn't even start on the provider side until it is fully committed (on ack not on send packet)

alpe · 2023-06-09T09:07:28Z

docs/ibc/Staking.md

+
+Note that a write-lock will prevent reading of the value from any other transaction. That means the same user offering a lien on another
+validator, which iterates over all liens to find the sum of max slashing, will be blocked until the first transaction completes.
+We cannot actually block (or wait) transactions in this way, so we must return an error for the second transaction.


I think that you can avoid failing the second TX by maintaining state for in flight data in both contracts.
Like credit-card companies, you could place a lock in the vault for the lien amount, that is released and the change applied on success Ack or just dropped on error Ack. The liquid balance would be reduced by the locked amounts.
Similar in external-staking, where the changes are applied only on success Ack. Or reverted on error Ack. Depending at which stage they are taken into account in the contract

Maybe I don't explan well in the document. The concept is:

I start a staking action over IBC, it is not committed, we don't actually charge the lien.

While in-flight, I withdraw collateral such that it is valid with the pre-stake lien, but invalid with the post-stake lien

We could leave the second tx in a pending state over multiple blocks til the ack from the first point comes in, but this starts breaking all kinds of assumptions about non-ibc messages succeeding or failing atomically.

The other approach is just to reject it as it might fail, which the write lock would do. Note, that a write-lock would reject any possible action that uses said value.

(Actually now going through that case, I think we store individual liens separate from combined user info (collateral, max_lien, total_slashable. We would have to lock both of those items, as a commit would write both of them)

I may not have the full picture, yet but with my current understanding, the "withdraw collateral" should fail instead. The "stake" action was started before and the vault should reflect that state in the balance. With an optimistic approach, the lien should be taken on start of the "stake" process already. It can be returned on error Ack or timeout.

I do understand the "write-lock" as "one operation only until finished". That constraint would be very hard and can be avoided IMHO. It should be possible to "withdraw collateral" and having a "stake" operation in flight as long as the balance in the vault is sufficient for both.

Yes, the locking approach is the easiest one to prove safe.

And from there, I look to reduce constraints, while showing that this can never open up a conflict, assuming the previous version had no conflicts.

I think you are not looking at the latest version, which has a section Idea: Approximating Locks discussing just what you are suggesting (and asking for feedback that this is safe as a lock)

docs/ibc/Validators.md

maurolacy

Some syntax suggestions, and a question about max cap enforcement.

docs/ibc/Staking.md

docs/ibc/Rewards.md

alpe reviewed Jun 6, 2023

View reviewed changes

ethanfrey marked this pull request as ready for review June 6, 2023 16:30

JakeHartnell reviewed Jun 7, 2023

View reviewed changes

packages/apis/src/ibc.rs Outdated Show resolved Hide resolved

JakeHartnell reviewed Jun 7, 2023

View reviewed changes

packages/apis/src/ibc.rs Outdated Show resolved Hide resolved

alpe reviewed Jun 7, 2023

View reviewed changes

maurolacy reviewed Jun 7, 2023

View reviewed changes

ethanfrey mentioned this pull request Jun 8, 2023

Make IBC protocol safe for unordered channels #51

Merged

5 tasks

maurolacy reviewed Jun 9, 2023

View reviewed changes

ethanfrey commented Jun 9, 2023

View reviewed changes

alpe reviewed Jun 9, 2023

View reviewed changes

ethanfrey mentioned this pull request Jun 9, 2023

Add lock sync primitives #52

Merged

alpe reviewed Jun 9, 2023

View reviewed changes

docs/ibc/Validators.md Outdated Show resolved Hide resolved

ethanfrey mentioned this pull request Jun 9, 2023

Add ValueRange type #53

Merged

maurolacy reviewed Jun 12, 2023

View reviewed changes

maurolacy mentioned this pull request Jun 13, 2023

External staking txs #54

Closed

alpe reviewed Jun 14, 2023

View reviewed changes

docs/ibc/Rewards.md Show resolved Hide resolved

ethanfrey added 11 commits June 15, 2023 20:58

Start defining IBC message types

1f49855

Define IBC protocol for staking

5667edf

Define reward protocol via ics20+memo

36d74ee

Link to how to validate slashing conditions

6bb342d

Cleanup from PR comments

9d76039

Start defining validator set as CRDT

876b555

Work on validator key rotation operations

d6f6e6a

Full validator sync algorithm

4f5f6d0

Pull out validator sync logic into it's own file, define message in Rust

433a6a1

Define serializability

bbd4637

Document CRDT, locking, and value ranges

5705078

ethanfrey and others added 10 commits June 15, 2023 20:58

Reorganize IBC docs

11bb9c4

Clean up Validator page to reference theory

a3d0fb5

Add consistent format to IBC md docs

f80a921

Start properly defining staking

86dcc58

Mauro's PR review, use Set in validator sync

e491f8b

Define invairants and locks for the staking protocol design

bdbc9fd

Define unstake, and options for optimizing locks around staking

ccc1836

More typos fixed by PR review

44ff911

Add IBC middleware option to reward distribution

c746a46

More typo cleanup

57621d5

ethanfrey force-pushed the define-ibc branch from 0ee8eb0 to 57621d5 Compare June 15, 2023 19:06

ethanfrey added 4 commits June 15, 2023 21:26

Added more information on rewards

7dc3aa1

Added comments on locking and value ranges as per discussions

ef5542e

Explain a bit more on locking theory

dd6f52f

Run devtools/format_md.sh to cleanup docs

bd410ac

ethanfrey force-pushed the define-ibc branch from 18e4fa2 to bd410ac Compare June 16, 2023 08:08

ethanfrey merged commit c1dbec6 into main Jun 16, 2023

ethanfrey deleted the define-ibc branch June 16, 2023 08:11

	time the first logic is run on the sending chain (which will send the IBC message) until the ack is fully process and committed on
	time the first logic is run on the sending chain (which will send the IBC message) until the ack is fully processed and committed on

	We can say that ICS20 implementation does something like this. As the only invairant is that the value never goes below zero,
	We can say that ICS20 implementation does something like this. As the only invariant is that the value never goes below zero,

Define IBC communication #49

Define IBC communication #49

Conversation

ethanfrey commented Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maurolacy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethanfrey commented Jun 7, 2023

maurolacy left a comment

Choose a reason for hiding this comment

maurolacy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maurolacy left a comment

Choose a reason for hiding this comment

maurolacy Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maurolacy Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

maurolacy Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alpe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethanfrey Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maurolacy left a comment

Choose a reason for hiding this comment

ethanfrey commented Jun 6, 2023 •

edited

Loading

maurolacy Jun 9, 2023 •

edited

Loading

maurolacy Jun 9, 2023 •

edited

Loading

maurolacy Jun 9, 2023 •

edited

Loading

ethanfrey Jun 9, 2023 •

edited

Loading