-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applies multinodessuite in twonodessuite #1031
Conversation
I've gone through several steps to isolate the issue this PR addresses. Read the story: https://hackmd.io/@ThatBen/S15F3DY4Jl TLDR: |
…m/codex-storage/nim-codex into feature/transfer-reliability-test
# Conflicts: # tests/integration/multinodes.nim # tests/integration/nodeprocess.nim
This PR addresses point 2 of Mark's: #746 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! The main question is: do all these tests pass reliably under the conditions in https://hackmd.io/@ThatBen/S15F3DY4Jl? If so, awesome, let's get this in, so you can be unblocked on the blockexchange 👍
@@ -311,7 +311,7 @@ template multinodesuite*(name: string, body: untyped) = | |||
role: Role.Client, | |||
node: node | |||
) | |||
if clients().len == 1: | |||
if running.len == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm sure there was a good reason for this, but could you please explain why? We don't want the running node to be a HardhatNode, which wouldn't give us what we want (bootstrap info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is behind if var clients =? nodeConfigs.clients:
and nodeConfigs.clients is of type ?CodexConfigs
. So these should always be Codexes. I changed this because using clients()
assumes that the config for this test has at least 1 client configured and that the client will always be started before any providers/validators (true because of the ordering of the code). If a user creates a config with only providers or validators, none of them would be used for bootstrap. With this change, the first node regardless of role will be the bootstrap node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user creates a configuration that runs a single hardhat node and at least one client node, running.len == 1
will never be true when processing the first client node. We don't want that.
Perhaps a better option would be to check that clients().len == 1 or providers.len() == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user creates a config with only providers or validators, none of them would be used for bootstrap. With this change, the first node regardless of role will be the bootstrap node.
Without a client config, there will be no bootstrap nodes, since this is nested under the iteration of clients.configs
.
I've made some changes that add each started client and validator to a seq of bootstrap nodes, that are used to start the next node. That means each subsequently started node should be more connected to peers than the last. This also fixes the case when no clients were started, that bootstrap SPRs are still used. PR #1048
multinodesuite "Marketplace": | ||
let marketplaceConfig = NodeConfigs( | ||
clients: CodexConfigs.init(nodes=1).some, | ||
providers: CodexConfigs.init(nodes=1).some, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use marketplacesuite
here (and in others below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it. It makes very little difference technically, but makes slightly more sense. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have approved the last one. Apart from the running.len == 1
fix, this is mostly good to go, and so I'll approve to get you unblocked
I can't say for certain that the tests are reliable after this PR. I can only say they are more reliable than before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you @benbierens!
# Conflicts: # tests/integration/nodeprocess.nim
After a change in PR #1031, bootstrap node sprs may not work when Hardhat nodes are started with the tests. This fixes it by appending all started client's and provider's SPR to a sequence, and using that sequence of SPRs to start the next node. This means all subsequently started nodes will be connected to its previously started peers. This also fixes the case when bootstrap SPRs would not be present if no clients were started.
After a change in PR #1031, bootstrap node sprs may not work when Hardhat nodes are started with the tests. This fixes it by appending all started client's and provider's SPR to a sequence, and using that sequence of SPRs to start the next node. This means all subsequently started nodes will be connected to its previously started peers. This also fixes the case when bootstrap SPRs would not be present if no clients were started.
During the work on #1019 it was found that the integration tests for block exchange are not covering an aspect of reliability that other (marketplace) integration tests are relying on. In order to help focus debugging efforts, this adds a transfer test that checks this aspect.
The instability caused by the old process-handling code of the twonodessuite has been updated to use the new multinodessuite fixture.
Additionally: sets http timeout for the codex client. (Was infinite. Now 1 minute.)