-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to untagged send/recv and remove SAS ordering requirement in the RDMA protocol #361
Conversation
Thinking about this some more, I don't think the early ctrl special case is needed. In function So I think we should just handle the early ctrl the same way we handle all cases where the receiver starts before sender, and not have this special case. |
12c0be4
to
1e1ffc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level nit: there are still a few places in rdma.c that refer to "tag" (besides the arguments to send
and recv
, which refers to the multi-recv tag rather than the Libfabric tag).
Removed unused lcomm parameter from prepare_recv_comm, and fixed a typo. Signed-off-by: Amedeo Sapio <[email protected]>
1e1ffc9
to
e3f9fd4
Compare
e3f9fd4
to
15d0adf
Compare
@rauteric I tested it and confirmed that you are right, if the control message is received before connect response, the control message is added to the msgbuf and retrieved when the |
b4e1ec6
to
a6ce048
Compare
Reduced size of communicator ID (both for SENDRECV and RDMA) to 32 bits, since the communicator ID is transmitted on the wire and 32 bits are more than enough. Signed-off-by: Amedeo Sapio <[email protected]>
Switching to using untagged send/recv operations. Tags were used for connection request and response, while control messages and eager messages used tag 0. The data previously present in the tag has been moved to the message itself. In addition, the control message is now using send() instead of senddata(), and the data is carried directly in the message. Switching to untagged ops means that libfabric won't pre-select the message provided to the recv, so all SENDs are received into bounce buffers. Signed-off-by: Amedeo Sapio <[email protected]>
In case of error, accept() tries to close the communicator, and that operation was overwriting the return value (i.e., the original error code). This means that in case of error, if the close is successful, accept didn't return an error code, leading to infinite retries. This also fixes a typo. Signed-off-by: Amedeo Sapio <[email protected]>
Added static asserts to check packing of messages sent on the wire. For SENDRECV protocol the only message is connect. For RDMA protocol the messages are connect, connect response and control. Signed-off-by: Amedeo Sapio <[email protected]>
There is no case in which not having Send-After-Send ordering causes issues, so it is safe to remove it. The only possible reordering that can happen is if a ctrl message is received before a connect response message, but we already deal with this case correctly, since the control message is added to the msgbuf and retrieved when isend is called. Signed-off-by: Amedeo Sapio <[email protected]>
a6ce048
to
1330ae5
Compare
Switching to using untagged send/recv operations in the RDMA protocol. Tags were used for connection request and response, while control messages and eager messages used tag 0. The data previously present in the tag has been moved to the message itself.
In addition, the control message is now using send() instead of senddata(), and the data is carried directly in the message.
Switching to untagged ops means that libfabric won't pre-select the message provided to the recv, so all SENDs are received into bounce buffers now.
This PR also removes the Send-After-Send ordering requirement from libfabric.
The only case where out-of-order sends can be a problem is if a receiver sends a connect response followed by a control message, and they arrive in the opposite order on the sender side. To deal with this rare case, we add a list of control messages in the send communicator to store those early control messages, and process them after the connect response is received.This also fixes a bug in the accept() error path, that caused accept() to return "success" even in case of errors.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.