Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/verbs: device max_cqe is not used to limit CQ size, leading to CQ creation failures #10692

Open
jakemoroni opened this issue Jan 9, 2025 · 0 comments
Labels

Comments

@jakemoroni
Copy link

The verbs provider doesn't take into account the device's max_cqe, so in some workloads, it will try to create a CQ larger than what is allowed by the device, leading to an error.

To Reproduce

  1. Configure RXE.
  2. Run an Intel MPI test (IMB-MPI1) at larger and larger PPN until a CQ creation failure is observed.

Expected behavior
The workload will run without CQ creation failures.

As per this comment, it seems like there was some consideration of this in the past.

For context: While RXE probably won't offer the best performance, it is very useful for performing A/B comparisons when debugging real RDMA hardware.

@jakemoroni jakemoroni added the bug label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant