Skip to content

Commit

Permalink
Merge pull request #777 from onflow/jord/dynamic-bootstrap
Browse files Browse the repository at this point in the history
Improve Dynamic Bootstrapping documentation
  • Loading branch information
jordanschalm authored May 17, 2024
2 parents 59ed6d5 + 2cf7e3b commit 70bf3ad
Show file tree
Hide file tree
Showing 3 changed files with 163 additions and 25 deletions.
11 changes: 2 additions & 9 deletions docs/networks/node-ops/node-operation/node-bootstrap.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,28 +294,21 @@ This is the recommended way to start your node for the first time.
4. Start your node (see [guide](./node-setup#start-the-node))

<Callout type="info">
Once the node has bootstrapped, these flags may be removed.
Once the node has bootstrapped, these flags will be ignored and may be removed.
</Callout>

### Manually Provisioned Root Snapshot

You can also provision the root snapshot file manually, then start the node without configuring Dynamic Startup.
A snapshot can be obtained from any Access Node using [Flow CLI](../../../tools/flow-cli/index.md).
```shell RetrieveRootSnapshot
flow snapshot save /path/to/root/snapshot --network mainnet
```
See [here](./protocol-state-bootstrap.md) for the available options to provision a Root Snapshot.

<Callout type="warning">

The snapshot must be within the `Epoch Setup Phase`.

</Callout>

<Callout type="warning">

Since Collection and Consensus Nodes must start up in the first ~30mins of the `Epoch Setup Phase` (see [Timing](./node-bootstrap.md#timing)),
the snapshot must be provisioned within this time window.

</Callout>

Once a valid root snapshot file is downloaded to the node's bootstrap folder, it can be started (see [guide](./node-setup.md#start-the-node))
147 changes: 147 additions & 0 deletions docs/networks/node-ops/node-operation/protocol-state-bootstrap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
title: Protocol State Bootstrapping
description: How to bootstrap a new or existing node
---

When a node joins the network, it bootstraps its local database using a trusted initialization file, called a Root Snapshot.
Most node operators will use the `Spork Root Snapshot` file distributed during the [spork process](./spork.md).
This page will explain how the bootstrapping process works and how to use it in general.

For guides covering specific bootstrapping workflows, see:
- [Node Bootstrap](./node-bootstrap.md) for bootstrapping a newly joined node.
- [Reclaim Disk](./reclaim-disk.md) for bootstrapping from a recent snapshot to recover disk space.

<Callout type="info">
This page covers only Protocol State bootstrapping and applies to Access, Collection, Consensus, & Verification Nodes.
Execution Nodes also need to bootstrap an Execution State database, which is not covered here.
</Callout>

## Node Startup

When a node starts up, it will first check its database status.
If its local database is already bootstrapped, it will start up and begin operating.
If its local database is not already bootstrapped, it will attempt to bootstrap using a Root Snapshot.

There are two sources for a non-bootstrapped node to obtain a Root Snapshot:
- Root Snapshot file in the `bootstrap` folder, which is used first if it exists.
- Dynamic Startup flags, which are only used if no Root Snapshot file exists.

## Using a Root Snapshot File

Using a Root Snapshot file is more flexible but more involved for operators compared to Dynamic Startup.

A file in `$BOOTDIR/public-root-information` named `root-protocol-state-snapshot.json` will be read and used as the Root Snapshot for bootstrapping the database.

### Instructions

1. Obtain a Root Snapshot file (see below for options)
2. Ensure your node is stopped and does not already have a bootstrapped database.
3. Move the Root Snapshot file to `$BOOTDIR/public-root-information/root-protocol-state-snapshot.json`, where `$BOOTDIR` is the value passed to the `--bootstrapdir` flag.
4. Start your node.

### Obtain Root Snapshot File using Flow CLI

[Flow CLI](../../../tools/flow-cli/index.md) supports downloading the most recently sealed Root Snapshot from an Access Node using the [`flow snapshot save`](../../../tools/flow-cli/utils/snapshot-save.md) command.

When using this method:
- ensure you connect to an Access Node you operate or trust
- ensure you use the [`--network-key`](../../../tools/flow-cli/utils/snapshot-save#network-key) flag so the connection is encrypted

### Obtain Root Snapshot File from Protocol database

If you have an existing node actively participating in the network, you can obtain a Root Snapshot using its database.

1. Obtain a copy of the Flow `util` tool and ensure it is in your `$PATH`. This tool is distributed during sporks, or you can build a copy from [here](https://github.com/onflow/flow-go/tree/master/cmd/util).
2. Stop the existing node.
3. Construct a Root Snapshot using the `util` tool. The tool will print the JSON representation to STDOUT, so you can redirect the output to a file.

Replace `$DATADIR` with the value passed to the `--datadir` flag. You can specify the desired reference block for the snapshot.

Retrieve the snapshot for the latest finalized block:
```sh
util read-protocol-state snapshot -d $DATADIR --final > latest-finalized-snapshot.json
```

Retrieve the snapshot for a specific finalized block height:
```sh
util read-protocol-state snapshot -d $DATADIR --height 12345 > specific-height-snapshot.json
```

## Using Dynamic Startup

Dynamic Startup is a startup configuration where your node will download a Root Snapshot and use it to bootstrap its local database.
Dynamic Startup is designed for nodes which are newly joining the network and need to [bootstrap from within a specific epoch phase](./node-bootstrap#timing), but can be used for other use-cases.

<Callout type="info">
If your node already has a bootstrapped database, or has a Root Snapshot file in the `$BOOTSTRAPDIR` folder,
these will take precedence and Dynamic Startup flags will be ignored.
</Callout>

When using Dynamic Startup, we specify:
1. An Access Node to retrieve the snapshot from.
2. A target epoch counter and phase to wait for.

After startup, your node will periodically download a candidate Root Snapshot from the specified Access Node.
If the Root Snapshot's reference block is either **within or after** the specified epoch phase, the node will bootstrap using that snapshot.
Otherwise the node will continue polling until it receives a valid Root Snapshot.

See the [Epochs Schedule](./../../staking/03-schedule.md) for additional context on epoch phases.

### Specifying an Access Node

Two flags are used to specify which Access Node to connect to:
- `--dynamic-startup-access-address` - the Access Node's secure GRPC server address
- `--dynamic-startup-access-publickey` - the Access Node's networking public key

Select an Access Node you operate or trust to provide the Root Snapshot, and populate these two flags.

For example, to use the Access Node maintained by the Flow Foundation for Dynamic Startup, specify the following flags:
```shell ExampleDynamicStartupFlags
... \
--dynamic-startup-access-address=secure.mainnet.nodes.onflow.org:9001 \
--dynamic-startup-access-publickey=28a0d9edd0de3f15866dfe4aea1560c4504fe313fc6ca3f63a63e4f98d0e295144692a58ebe7f7894349198613f65b2d960abf99ec2625e247b1c78ba5bf2eae
```

### Specifying an Epoch Phase

Two flags are used to specify when to bootstrap:
- `--dynamic-startup-epoch-phase` - the epoch phase to start up in (default `EpochPhaseSetup`)
- `--dynamic-startup-epoch` - the epoch counter to start up in (default `current`)

> You can check the current epoch phase of the network by running [this](https://github.com/onflow/flow-core-contracts/blob/master/transactions/epoch/scripts/get_epoch_phase.cdc) script. Alternatively, you can also check the current epoch phase [here](https://app.metrika.co/flow/dashboard/network-overview) under Epoch Phase.
#### Bootstrapping Immediately

If you would like to bootstrap immediately, using the first Root Snapshot you receive, then specify a past epoch counter:
```shell ExampleDynamicStartupFlags
... \
--dynamic-startup-epoch-phase=1
```
You may omit the `--dynamic-startup-epoch-phase` flag.

### Instructions

#### Example 1
Use Dynamic Startup to bootstrap your node at the `Epoch Setup Phase` of the current epoch (desired behaviour for newly joining nodes):
1. Ensure your database is not already bootstrapped, and no Root Snapshot file is present in the `$BOOTSTRAPDIR` folder.
2. Add necessary flags to node startup command.
For example, using the Flow Foundation Access Node:
```sh
... \
--dynamic-startup-access-address=secure.mainnet.nodes.onflow.org:9001 \
--dynamic-startup-access-publickey=28a0d9edd0de3f15866dfe4aea1560c4504fe313fc6ca3f63a63e4f98d0e295144692a58ebe7f7894349198613f65b2d960abf99ec2625e247b1c78ba5bf2eae
```
3. Start your node.

#### Example 2
Use Dynamic Startup to bootstrap your node immediately, using the most recent Root Snapshot:
1. Ensure your database is not already bootstrapped, and no Root Snapshot file is present in the `$BOOTSTRAPDIR` folder.
2. Add necessary flags to node startup command.
For example, using the Flow Foundation Access Node:
```sh
... \
--dynamic-startup-access-address=secure.mainnet.nodes.onflow.org:9001 \
--dynamic-startup-access-publickey=28a0d9edd0de3f15866dfe4aea1560c4504fe313fc6ca3f63a63e4f98d0e295144692a58ebe7f7894349198613f65b2d960abf99ec2625e247b1c78ba5bf2eae \
--dynamic-startup-epoch=1
```
3. Start your node.
30 changes: 14 additions & 16 deletions docs/networks/node-ops/node-operation/reclaim-disk.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,31 +28,29 @@ Hence, as a node operator, please make sure to do the following:

### Access, Collection, Consensus and Verification node

If you are running any node other than an execution node and the node is close to running out of disk space or has already exhausted all of its disk, you can do the following to reclaim disk space:
If you are running any node other than an execution node and the node is close to running out of disk space or has already exhausted all of its disk, you can re-bootstrap the node's database. This frees up disk space by discarding historical data past a certain threshold.

1. Stop the node.

2. Back up the data folder to a tmp folder in case it is required to revert this change. The default location of the data folder is `/var/flow/data` unless overridden by the `--datadir` flag.
```sh
mv /var/flow/data /var/flow/data_backup
```

2. Setup the node to use **dynamic bootstrapping** by specifying the dynamic startup flags described [here](./node-bootstrap.md#dynamic-startup).
3. Configure the node to bootstrap from a new, more recent Root Snapshot. You may use either of the two methods described [here](./protocol-state-bootstrap.md) to configure your node.

4. Start the node. The node should now recreate the data folder and start fetching blocks.

4. Move the data folder to a tmp folder incase it is required to revert this change. The default location of the data folder is `/var/flow/data` unless overridden by the `data-dir` parameter.
5. If the node is up and running OK, delete the `data_backup` folder created in step 2.
```sh
rm -rf /var/flow/data_backup
```

``` mv /var/flow/data /var/flow/data_backup```
#### Limitation for Access Node

Re-bootstrapping allows the node to be restarted at a particular block height by deleting all the previous state.

5. Start the node. The node should now recreate the data folder and start fetching blocks.


6. If the node is up and running OK, delete the `data_backup` folder created in step 4.

``` rm -rf /var/flow/data_backup```

#### Limitation for Access node

Dynamic boostrap allows the node to be restarted at a particular block height by deleting all the previous state.

For an **access** node, this results in the node not being able to serve any API request before the height at which the node was dynamically bootstrapped.
For an **Access Node**, this results in the node not being able to serve any API request before the height at which the node was re-bootstrapped.

_Hence, if you require the access node to serve data from the start of the last network upgrade (spork), do not use this method of reclaiming disk space. Instead provision more disk for the node._

Expand Down

0 comments on commit 70bf3ad

Please sign in to comment.