Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release] Introduce --batch-size-to-kill to release chaos test framework #48765

Merged
merged 17 commits into from
Dec 12, 2024

Conversation

jjyao
Copy link
Collaborator

@jjyao jjyao commented Nov 16, 2024

Why are these changes needed?

Currently the chao test framework only kills one node at a time but that's not stressful enough to uncover some bugs. This PR introduces --batch-size-to-kill to allow multiple nodes being killed at the same time.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Nov 16, 2024
jjyao added 11 commits December 10, 2024 14:21
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
@jjyao
Copy link
Collaborator Author

jjyao commented Dec 11, 2024

batch_inference_chaos release test succeeded: https://buildkite.com/ray-project/release/builds/27890#0193b7fe-1114-4f40-ad8a-2f0f32af2147

Total chaos killed: {'79e3395993aa90570e383b9169e7d88cd2f5f020e90f6dae3c125e02', '56e5b40eee84b3a7b894d5817fcc0387bb77640deff5149ea5565627', 'd3f85384b2f036dee72798baf31ac894abaa810707cc56675e03f611', '66804584692eccb2e6b9ac9762c5f8e3735e363fca4f38bf08777fb9', '1018025a3232d3b7794ffd96a8566ad734538bbf15c9d2dfa440ca48', 'bacb1b33e6a3294838436ef9bfe89ad0710862d7457adda896ef6381'}

Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
@jjyao jjyao merged commit e38a43d into ray-project:master Dec 12, 2024
4 of 5 checks passed
@jjyao jjyao deleted the jjyao/chaoooos branch December 12, 2024 22:10
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants