Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/efa: set ep_attr->max_msg_size according to hints #9388

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

wenduwan
Copy link
Contributor

@wenduwan wenduwan commented Oct 1, 2023

ep_attr->max_msg_size is currently set to UINT64_MAX. As a result the application cannot enable zero-copy recv.

@wenduwan wenduwan marked this pull request as ready for review October 2, 2023 00:52
@wenduwan wenduwan requested a review from a team October 2, 2023 00:52
@wenduwan wenduwan marked this pull request as draft October 2, 2023 14:17
@wenduwan
Copy link
Contributor Author

wenduwan commented Oct 2, 2023

bot:aws:retest

@shijin-aws
Copy link
Contributor

is it a regression? I don't know if we ever set it correctly, does that mean we never support zero-copy receive?

@wenduwan wenduwan marked this pull request as ready for review October 2, 2023 21:58
@wenduwan
Copy link
Contributor Author

wenduwan commented Oct 2, 2023

It is a regression. I haven't bisected git history yet but zcpy recv worked on 1.18.x last time I checked.

@shijin-aws
Copy link
Contributor

shijin-aws commented Oct 2, 2023

how was max_msg_size parsed in v1.18.x?

@shijin-aws
Copy link
Contributor

and also remind me how this impact zero copy receive?

@wenduwan
Copy link
Contributor Author

wenduwan commented Oct 3, 2023

It is a regression.

I take that back. It has always been there, but it didn't cause problem because the only zcpy use case is doing odd stuff around fi_getinfo https://github.com/aws/aws-cdi-sdk/blob/47a3063218c4af43d909a34e703f18b922bc853a/src/cdi/adapter_efa.c#L365-L378 (maybe that's because they know it's a libfabric bug).

The application modifies fi_info object after calling fi_getinfo, which I think is illegal.

max_msg_size is used to determine if zero-copy receive can be safely used in efa_rdm_ep_set_use_zcpy_rx

I found this bug while trying to add a fabtest to cover the code path.

@shijin-aws
Copy link
Contributor

I am not sure max_msg_size is used in this way. Look at fi_alter_ep_attr(), max_msg_size is not updated from the one in hints.

I think it is an attribute that provider returns to the application in info that indicates the max msg size and the application must comply with that. It's not something that application should customize.

@shijin-aws
Copy link
Contributor

The application modifies fi_info object after calling fi_getinfo, which I think is illegal.

Right, modifying info after fi_getinfo is illegal.

@wenduwan
Copy link
Contributor Author

wenduwan commented Oct 3, 2023

I think it is an attribute that provider returns to the application in info that indicates the max msg size and the application must comply with that. It's not something that application should customize.

Hmmmmmmm let me poke the awesome @shefty :

It's indeed ambiguous from the doc whether fi_info->ep_attr->max_msg_size is read-only or can be set by the application. Could you help clarify?

@shefty
Copy link
Member

shefty commented Oct 3, 2023

It's not something the app can change. The app can request a specific value when calling fi_getinfo, and if the provider can't support that size, it can fail the request. But the expectation is that this value maps directly to the transport limitation.

@wenduwan
Copy link
Contributor Author

wenduwan commented Oct 3, 2023

@shefty Thanks for chiming in. Cool in this case I think the application in question is abusing/misusing the attribute for EFA provider.

@wenduwan wenduwan closed this Oct 3, 2023
@shijin-aws shijin-aws reopened this Apr 2, 2024
@shijin-aws
Copy link
Contributor

shijin-aws commented Apr 2, 2024

Reopen this PR, as further talking with @shefty shows that provider can choose to use a value of max_msg_size that is equal to what application requests in hints.

ep_attr->max_msg_size is currently set to UINT64_MAX. As a result the
application cannot enable zero-copy recv.

Signed-off-by: Wenduo Wang <[email protected]>
@wenduwan
Copy link
Contributor Author

wenduwan commented Apr 2, 2024

Rebased branch and running AWS CI

@shijin-aws shijin-aws merged commit 642fc3a into ofiwg:main Apr 3, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants