Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: fix flaky test when leader peer terminates #981

Merged
merged 2 commits into from
Oct 31, 2023

Conversation

milmazz
Copy link
Contributor

@milmazz milmazz commented Oct 31, 2023

This commit should fix a flaky test that happens when the leadership changes because the previous leader terminates for a given reason.

The issue happens because the signal emitted in the unit test is a GenServer.stop/1, but the peer process is under the test supervisor, so, in certain circumstances, the supervisor is fast enough to spawn a new process. That process ends as the new leader.

A way to reproduce the flaky test locally could be like this:

for i in (seq 1 10)
  mix test --seed 856757

  if test $status -eq 2
    break
  end
end

I'm using the fish shell here; the previous script might not work on your favorite shell.

The previous command should produce an output like:

Excluding tags: [:skip]

...................................................................................................................................................................................................................................................................................................................
Finished in 4.2 seconds (1.4s async, 2.7s sync)
24 doctests, 7 properties, 278 tests, 0 failures, 2 excluded

Randomized with seed 856757
Excluding tags: [:skip]

................................................................................................................................................................................................................................................................................................

  1) test leadership changes when a peer terminates (Oban.Peers.GlobalTest)
     test/oban/peers/global_test.exs:24
     Expected truthy, got nil
     code: assert Enum.find([peer_1, peer_2] -- [leader], &Global.leader?/1)
     arguments:

         # 1
         [#PID<0.3478.0>]

         # 2
         &Oban.Peers.Global.leader?/1

     stacktrace:
       test/oban/peers/global_test.exs:44: anonymous fn/3 in Oban.Peers.GlobalTest."test leadership changes when a peer terminates"/1
       (oban 2.16.3) test/support/case.ex:73: Oban.Case.with_backoff/4
       test/oban/peers/global_test.exs:43: (test)

..................
Finished in 5.3 seconds (1.4s async, 3.8s sync)
24 doctests, 7 properties, 278 tests, 1 failure, 2 excluded

Randomized with seed 856757

Here, you can see that the PID of the new leader is not in the target list (Enum.find), so, as I mentioned before, one theory is that the test supervisor spawned a new process, and that process ended being the leader.

This commit should fix a flaky test that happens when the leadership
changes because the previous leader terminates for a given reason.

The issue happens because the signal emitted in the unit test is a
`GenServer.stop/1`, but the peer process is under the test supervisor,
so, in certain circumstances, the supervisor is fast enough to spawn a
new process. That process ends as the new leader.

A way to reproduce the flaky test locally could be like this:

```console
for i in (seq 1 10)
  mix test --seed 856757

  if test $status -eq 2
    break
  end
end
```

I'm using the fish shell here; the previous script might not work on
your favorite shell.

The previous command should produce an output like:

```console
Excluding tags: [:skip]

...................................................................................................................................................................................................................................................................................................................
Finished in 4.2 seconds (1.4s async, 2.7s sync)
24 doctests, 7 properties, 278 tests, 0 failures, 2 excluded

Randomized with seed 856757
Excluding tags: [:skip]

................................................................................................................................................................................................................................................................................................

  1) test leadership changes when a peer terminates (Oban.Peers.GlobalTest)
     test/oban/peers/global_test.exs:24
     Expected truthy, got nil
     code: assert Enum.find([peer_1, peer_2] -- [leader], &Global.leader?/1)
     arguments:

         # 1
         [#PID<0.3478.0>]

         # 2
         &Oban.Peers.Global.leader?/1

     stacktrace:
       test/oban/peers/global_test.exs:44: anonymous fn/3 in Oban.Peers.GlobalTest."test leadership changes when a peer terminates"/1
       (oban 2.16.3) test/support/case.ex:73: Oban.Case.with_backoff/4
       test/oban/peers/global_test.exs:43: (test)

..................
Finished in 5.3 seconds (1.4s async, 3.8s sync)
24 doctests, 7 properties, 278 tests, 1 failure, 2 excluded

Randomized with seed 856757
```

Here, you can see that the PID of the new leader is not in the target
list (`Enum.find`), so, as I mentioned before, one theory is that the
test supervisor spawned a new process, and that process ended being the
leader.
@milmazz milmazz force-pushed the chore/fix-flaky-test branch from 10feb3f to 9bdf8e2 Compare October 31, 2023 14:08
Copy link
Member

@sorentwo sorentwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! Thanks for tracking this down 👏

test/oban/peers/global_test.exs Outdated Show resolved Hide resolved
@sorentwo sorentwo merged commit 38a93d2 into oban-bg:main Oct 31, 2023
2 checks passed
@sorentwo sorentwo added area:oss Related to Oban OSS kind:chore Internal improvements labels Oct 31, 2023
@milmazz milmazz deleted the chore/fix-flaky-test branch October 31, 2023 15:35
sorentwo pushed a commit that referenced this pull request Nov 3, 2023
This commit should fix a flaky test that happens when the leadership
changes because the previous leader terminates for a given reason.

The issue happens because the signal emitted in the unit test is a
`GenServer.stop/1`, but the peer process is under the test supervisor,
so, in certain circumstances, the supervisor is fast enough to spawn a
new process. That process ends as the new leader.

A way to reproduce the flaky test locally could be like this:

```console
for i in (seq 1 10)
  mix test --seed 856757

  if test $status -eq 2
    break
  end
end
```

I'm using the fish shell here; the previous script might not work on
your favorite shell.

The previous command should produce an output like:

```console
Excluding tags: [:skip]

...................................................................................................................................................................................................................................................................................................................
Finished in 4.2 seconds (1.4s async, 2.7s sync)
24 doctests, 7 properties, 278 tests, 0 failures, 2 excluded

Randomized with seed 856757
Excluding tags: [:skip]

................................................................................................................................................................................................................................................................................................

  1) test leadership changes when a peer terminates (Oban.Peers.GlobalTest)
     test/oban/peers/global_test.exs:24
     Expected truthy, got nil
     code: assert Enum.find([peer_1, peer_2] -- [leader], &Global.leader?/1)
     arguments:

         # 1
         [#PID<0.3478.0>]

         # 2
         &Oban.Peers.Global.leader?/1

     stacktrace:
       test/oban/peers/global_test.exs:44: anonymous fn/3 in Oban.Peers.GlobalTest."test leadership changes when a peer terminates"/1
       (oban 2.16.3) test/support/case.ex:73: Oban.Case.with_backoff/4
       test/oban/peers/global_test.exs:43: (test)

..................
Finished in 5.3 seconds (1.4s async, 3.8s sync)
24 doctests, 7 properties, 278 tests, 1 failure, 2 excluded

Randomized with seed 856757
```

Here, you can see that the PID of the new leader is not in the target
list (`Enum.find`), so, as I mentioned before, one theory is that the
test supervisor spawned a new process, and that process ended being the
leader.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:oss Related to Oban OSS kind:chore Internal improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants