Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Express Lane Timeboost #2561

Open
wants to merge 335 commits into
base: master
Choose a base branch
from
Open

Conversation

rauljordan
Copy link
Contributor

@rauljordan rauljordan commented Aug 8, 2024

Background

At the time of writing, the Arbitrum sequencer is centralized and offers a first-come, first-serve transaction ordering policy. Txs have a current delay of approximately 250ms, which is the time the sequencer takes to produce an ordered list of txs to emit in the form of an L2 block. The current policy does not handle MEV that occurs naturally on L2, and leads to latency races offline to get faster access to the sequencer ingress server.

A new policy has been proposed, known as Express Lane Timeboost, which allows participants to bid for the rights of priority sequencing using their funds instead of hardware. In “rounds” that start at each minute mark, participants can submits bids to participate in a sealed, second-price auction for control of the next round’s “express lane”. During a round, all non-express lane txs get their first arrival timestamp delayed by some amount of time (250ms), while the express lane controller does not. The express lane controller can also choose to transfer their rights in a round.

The sequencer itself does not need to manage auctions, but simply needs to know the current round number and the address of the express lane controller for that round. From there, it can delay non-express lane txs by a nominal amount required by the protocol and validate that a tx should go through the express lane.

This PR contains the complete implementation of the system with all its components. The smart contract changes are contained within OffchainLabs/nitro-contracts/tree/express-lane-auction-all-merged.

Basic Readings

To read more about timeboost, see the AIP, the research specification, and design doc although the design doc is not fully updated yet.

Reviewing

Recommend to look at the basic readings, then look at system_tests/timeboost_test.go to understand how it all fits together. Then, look at bid validator and auctioneer. Finally, the sequencer changes.

Features

  • Bidder client that allows participants to join the auction and submit bids to a bid validator
  • Bid validator that receives bids over the internet, validates them, and inserts validated items into Redis stream
  • Auctioneer server that consumes validated bids from Redis stream.
  • Auctioneer at the 45 second mark, submits the top two bids to a privileged sequencer endpoint
  • Ability to persist validated bids to a local DB (sqlite) in the auctioneer server
  • System tests are added that assert express lane txs have an advantage in the emitted sequencer feed

Sequencer Changes

The changes to the sequencer hot path are quite simple. In a nutshell, if a transaction is received, it checks the following:
If timeboost is enabled AND there is an express lane controller set AND it is not coming from the express lane, it delays the tx's first arrival timestamp by some amount (250ms).

To determine if a transaction is a valid express lane tx, the sequencer runs a background thread called the expressLaneService, which is scraping events from the ExpressLaneAuction.sol smart contract. Express lane transactions arrive via a different sequencer endpoint than the normal one, called timeboost_sendExpressLaneTransaction. The message looks as follows:

{
  "type": "object",
  "properties": {
    "chainId": {
      "type": "bigInt",
      "description": "chain id of the target chain"
    },
    "round": {
      "type": "uint64",
      "description": "round number (0-indexed) for the round the bidder wants to become the controller of"
    },
    "auctionContractAddress": {
      "type": "address",
      "description": "hex string of the auction contract address that the bid corresponds to"
    },
    "sequenceNumber": {
      "type": "uint64",
      "description": "the per-round nonce of express lane submissions. Each submission to the express lane during a round increases this sequence number by one, and if submissions are received out of order, the sequencer will queue them for processing in order. This is reset to 0 at each round"
    },
    "transaction": {
      "type": "bytes",
      "description": "hex string of the RLP encoded transaction payload that submitter wishes to be sequenced through the express lane"
    },
    "options": {
      "type": "ArbitrumConditionalOptions",
      "description": "conditional options for Arbitrum transactions, supported by normal sequencer endpoint https://github.com/OffchainLabs/go-ethereum/blob/48de2030c7a6fa8689bc0a0212ebca2a0c73e3ad/arbitrum_types/txoptions.go#L71"
    },
    "signature": {
      "type": "bytes",
      "description": "Ethereum signature over the bytes encoding of (keccak256(TIMEBOOST_BID), padTo32Bytes(chainId), auctionContractAddress, uint64ToBytes(round), uint64ToBytes(sequenceNumber), transaction)"
    }
  },
}

The submission itself contains a tx payload, which MAY not be from the express lane controller. As long as the submission is signed by the controller, that is sufficient. Submissions have a specific nonce, called a sequence, to ensure that submissions are processed in order. This is different from the inner nonce of the payload tx. The sequencer keeps a queue of submissions and ensures it processes them in order. That is, if a submission N is received before N-1, it will get queued for submission once N arrives.

Bid Validator Architecture

Bids are limited to 5 bids per sender, but there are no limits to the number of bidders in a single round. To alleviate potential scaling concerns, we adopt a simple architecture of separating the bid validators from the auctioneer. The bid validators filter out invalid items and publish validated results to a Redis stream. In a simplified diagram, here's what it will look like:

Screenshot 2024-08-08 at 11 45 55

Dependencies Added

  • github.com/golang-jwt/jwt/v4 for the authenticated endpoint from the auctioneer to the sequencer
  • github.com/stretchr/testify for testing utilities (will probably have to remove)
  • github.com/mattn/go-sqlite3 for the bids DB
  • github.com/jmoiron/sqlx for the bids DB
  • github.com/DATA-DOG/go-sqlmock for testing the bids DB

Notes

There are several parts of this implementation that are likely not ideal:

Chicken and the egg problem in sequencer
Cannot start sequencer without express lane, but cannot deploy auction for express lane without starting sequencer. To solve this in tests, we have a separate func called StartExpressLaneService in the sequencer. In prod, we don’t have this issue because we can deploy the contracts before we upgrade the sequencer to timeboost, but what to do about tests? - In nitro-testnode we first start the sequencer without timeboost, deploy the contracts, then enable timeboost in the configuration and restart.

Janky prioritizing of auction resolution txs
The sequencer exposes an authenticated endpoint auctioneer_submitAuctionResolutionTransaction over the JWT Auth RPC for the auctioneer to use. When the auctioneer is ready to resolve an auction, it submits a tx to this endpoint, which the sequencer verifies for integrity. Then, the sequencer does the following:

log.Info("Prioritizing auction resolution transaction from auctioneer", "txHash", tx.Hash().Hex())
s.timeboostAuctionResolutionTx = tx
s.createBlock(ctx)

it immediately tries to put the item in the queue and create block. It also sets the tx as a property of the sequencer struct, and in the createBlock func, if this field is not nil, it gets put at the top of the queue. This is a bit janky in how it works and perhaps inefficient. Is there another way to prioritize a tx in the sequencer?

Sequencer opens an http connection to itself
The sequencer has a thread called expressLaneService which reads events from the auction smart contracts on L2 to determine express lane controllers. Because the sequencer does not have filtersystem API access, we instead open an RPC client against itself so we can create an ethclient to read logs and data from onchain. This doesn't seem ideal Fixed now, using contractAdapter now for direct access.

References

@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Aug 8, 2024
@rauljordan rauljordan marked this pull request as draft August 8, 2024 16:22
@rauljordan rauljordan marked this pull request as ready for review August 8, 2024 17:00
@@ -0,0 +1,221 @@
package main
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is modeled almost exactly after how cmd/val-node works

timeboost/ticker.go Outdated Show resolved Hide resolved
timeboost/ticker.go Outdated Show resolved Hide resolved
timeboost/bid_validator.go Outdated Show resolved Hide resolved
timeboost/bid_validator.go Outdated Show resolved Hide resolved
bv.bidsPerSenderInRound[bidder]++
bv.Unlock()

depositBal, err := balanceCheckerFn(&bind.CallOpts{}, bidder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may have a bit of a race condition here.

  1. At time t=0, r (round)=0 bidder A issues a withdrawal. The withdrawal will finalize at r=2
  2. At time t=120, r=2 bidder A makes a bid on round 3.
  3. The validator will do a balance check here, but if it connects to a node which is 1 second behind the validator then the node will see the r=1 and think the balance is still there
  4. Validator accepts a bid which will fail when being submitted

Some options for getting round this would be:
a. Look for withdrawal events and take those into account when calculating the balance.
or
b. Offer a balanceAtRound(uint64 round) function on the contract, then the controlling round can be supplied to the contract and always return what it will be at that round, regardless of whether the node is behind. (The node will need to be within 60 seconds of the head, but I think that's safe to expect)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

B feels the safest here that would keep the bid validator fast. Log events are expensive to scrape especially if we scale the validators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will add

execution/gethexec/sequencer.go Outdated Show resolved Hide resolved
execution/gethexec/sequencer.go Outdated Show resolved Hide resolved
execution/gethexec/sequencer.go Outdated Show resolved Hide resolved
execution/gethexec/sequencer.go Outdated Show resolved Hide resolved
execution/gethexec/express_lane_service.go Outdated Show resolved Hide resolved
timeboost/auctioneer.go Outdated Show resolved Hide resolved
@rauljordan
Copy link
Contributor Author

Update: implemented the functionality of using a logs subscription from the blockchain struct in the sequencer instead of an ethclient to read information about the express lane auction contract here https://github.com/OffchainLabs/nitro/compare/express-lane-timeboost...sub-logs-express-lane-timeboost?expand=1.
However, the logs are not received over the channel for some reason when running the system test under timeboost_test.go

Comment on lines 133 to 135
func (c *AutonomousAuctioneerConfig) Validate() error {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention for this? Is it a todo or should it be removed?

Round uint64
AuctionContractAddress common.Address
Transaction *types.Transaction
Options *arbitrum_types.ConditionalOptions `json:"options"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for having options?

select {
case <-ctx.Done():
return
case <-time.After(time.Millisecond * 250):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if time.After(time.Millisecond * 250) is sufficient and whether it’s better to stream new blocks directly from the tx feed. Otherwise, there could be a race condition where, at 249.9ms, we check and find that the latest block number hasn’t changed, forcing us to wait another 250ms.

Comment on lines 210 to 228
for {
// Get the next message in the sequence.
nextMsg, exists := es.messagesBySequenceNumber[control.sequence]
if !exists {
break
}
if err := publishTxFn(
ctx,
nextMsg.Transaction,
msg.Options,
false, /* no delay, as it should go through express lane */
); err != nil {
// If the tx failed, clear it from the sequence map.
delete(es.messagesBySequenceNumber, msg.Sequence)
return err
}
// Increase the global round sequence number.
control.sequence += 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor because we break, but we might want to check for context timeout here, and the caller of sequenceExpressLaneSubmission should set a deadline for when the round ends.

execution/gethexec/sequencer.go Outdated Show resolved Hide resolved
execution/gethexec/sequencer.go Outdated Show resolved Hide resolved
execution/gethexec/sequencer.go Outdated Show resolved Hide resolved
options: nil,
resultChan: make(chan error, 1),
returnedResult: &atomic.Bool{},
ctx: context.TODO(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be s.GetContext()

// Just to be safe, make sure we don't run over twice the queue timeout
abortCtx, cancel := ctxWithTimeout(parentCtx, queueTimeout*2)
defer cancel()
if s.config().Timeboost.Enable && s.expressLaneService != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving this part to PublishTransaction could be nice.
Then in this function as well as in the TxQueue "isExpressLane" is only used for block metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but is it ok to delay the normal txs before validating them?

execution/gethexec/sequencer.go Show resolved Hide resolved
Comment on lines +352 to +354
// Queued txs cannot use this message's context as it would lead to context canceled error once the result for this message is available and returned
// Hence using context.Background() allows unblocking of queued up txs even if current tx's context has errored out
var queueCtx context.Context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you store each message's context in the queue when sequenceExpressLaneSubmission is called, and just use that?

Copy link
Contributor

@ganeshvanahalli ganeshvanahalli Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is a controller can submit say a tx with a seq number way into the future which wont be processed until all the ones with lower seq numbers are processed. in that case that ctx would expire either by the caller's timeout or by abortCtx. Hence the need for deriving timeout contexts from non-expiring context.Background() for future seq num txs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that okay for it to expire? They shouldn't submit ones that far in the future then.

Copy link
Contributor

@ganeshvanahalli ganeshvanahalli Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry, edited my previous comment)
the context (queueCtx) is only created when we process this future seq num tx, and this queueCtx does expire though, after queueTimeout duration has passed.

Instead of deriving a timeout context from current ctx, we just derive it from a non-expiring one (to not be dependent on other tx's ctx)

execution/gethexec/express_lane_service.go Outdated Show resolved Hide resolved
@@ -570,29 +498,25 @@ func (s *Sequencer) publishTransactionImpl(parentCtx context.Context, tx *types.
err := abortCtx.Err()
if parentCtx.Err() == nil {
// If we've hit the abort deadline (as opposed to parentCtx being canceled), something went wrong.
log.Warn("Transaction sequencing hit abort deadline", "err", err, "submittedAt", queueItem.firstAppearance, "queueTimeout", queueTimeout, "txHash", tx.Hash())
log.Warn("Transaction sequencing hit abort deadline", "err", err, "submittedAt", now, "queueTimeout", queueTimeout*2, "txHash", tx.Hash())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "submittedAt" changed to now here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont have access to txQueueItem, so now is the next best value to use for when the tx was itself queued

// Check if a duplicate submission exists already, and reject if so.
if _, exists := roundInfo.msgAndResultBySequenceNumber[msg.SequenceNumber]; exists {
return timeboost.ErrDuplicateSequenceNumber
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in these two we're handling an already-existing sequence number. I think we don't want to return an error if it's the same message. Compare the new message to the existing one by comparing signatures. If they are the same - return nil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants