Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/verbs: Fix data race vrb_open_ep function #10571

Merged
merged 1 commit into from
Dec 2, 2024
Merged

Conversation

piotrchmiel
Copy link
Contributor

@piotrchmiel piotrchmiel commented Nov 21, 2024

Race Fix: Resolved a data race in the vrb_open_ep function of the verbs provider caused by concurrent modifications to the global variable vrb_ep_ops. This issue violated the FI_THREAD_SAFE threading model, leading to unpredictable behavior when creating endpoints from multiple threads.

Detailed race description: #10569

Signed-off-by: Piotr Chmiel [email protected]

@piotrchmiel piotrchmiel force-pushed the 10569 branch 4 times, most recently from e1b235a to 06eeddb Compare November 21, 2024 14:28
@shefty
Copy link
Member

shefty commented Nov 22, 2024

Please change the commit message header to: prov/verbs: Fix race in ..., then describe the actual race in the message.

@piotrchmiel piotrchmiel changed the title Fix data race vrb_open_ep function prov/verbs: Fix data race vrb_open_ep function Nov 27, 2024
Resolved a data race in the vrb_open_ep function of the verbs provider caused
by concurrent modifications to the global variable vrb_ep_ops. This issue
violated the FI_THREAD_SAFE threading model, leading to unpredictable behavior
when creating endpoints from multiple threads.

Signed-off-by: Piotr Chmiel <[email protected]>
@piotrchmiel
Copy link
Contributor Author

piotrchmiel commented Nov 27, 2024

Please change the commit message header to: prov/verbs: Fix race in ..., then describe the actual race in the message.

Done. @shefty Could one of the maintainers kindly review this?

Copy link
Member

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - The current code is incorrect, and this looks like the right fix.

@piotrchmiel
Copy link
Contributor Author

@shefty Thanks for the review! I appreciate your feedback, and I’m glad the change looks correct now.
Could you clarify what the next steps are in the merging process? I noticed that the AWS CI check is failing with the message “This commit cannot be built.” However, when I click on the workflow details, the page doesn’t load, and I’m unsure if it's a permissions issue on my side.

@shefty
Copy link
Member

shefty commented Dec 2, 2024

bot:aws:retest

@shefty
Copy link
Member

shefty commented Dec 2, 2024

@piotrchmiel - This PR should eventually be merged. I restarted the AWS CI, but I don't believe the previous failure was related.

@shijin-aws
Copy link
Contributor

we disabled verbs in aws ci, please ignore then

@shefty
Copy link
Member

shefty commented Dec 2, 2024

@shijin-aws - Is there a way for it to ignore or skip the testing, but report success? I believe other CI's do this.

@shefty shefty merged commit d2f7028 into ofiwg:main Dec 2, 2024
12 of 13 checks passed
@shijin-aws
Copy link
Contributor

@shefty That is a good call, we will evaluate it.

@shijin-aws
Copy link
Contributor

How does other CI handle it as scripts? is there an example?

@shefty
Copy link
Member

shefty commented Dec 2, 2024

See:

@piotrchmiel piotrchmiel deleted the 10569 branch December 17, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants