-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch3: use group to build communicator vc tables #7242
base: main
Are you sure you want to change the base?
Conversation
1ab1165
to
9cc9027
Compare
706bc6a
to
033a063
Compare
3a44553
to
27cdee9
Compare
test:mpich/ch3/most |
Miscellenous typo fixes to appease the spellchecker.
This test requires to access MPICH internals, thus won't be used with the current design.
We no longer use this file.
Hide the internal fields of MPIR_Group from unnecessary access. Outside group_util.c and group_impl.c, it only need assume the MPIR_Lpid integer type, creation routines based on lpid map or lpid stride description, and access routine to look up lpid from a group rank.
For most external usages, we only need MPIR_Group_rank_to_lpid.
Avoid access group internal fields.
Group similar functions together to facilitate refactoring. There is no changes in this commit other than moving functions around. The 4 incl/excl functions are very similar. The 3 difference/intersection/union functions are very similar.
Use MPIR_Group_{rank_to_lpid,lpid_to_rank} to avoid directly access MPIR_Group internal fields. For most group creation routines, just populate an lpid lookup map and call MPIR_Group_create_map to create the group.
* add option to use stride to describe group composition * remove the linked list design
This is the same as MPID_Comm_get_lpid. NOTE: we'll will remove MPID_Comm_get_lpid as well once we move the ownership of lpid to the MPIR-layer.
There is no real difference between lpid and gpid. Thus rename gpid in the device layer to lpid for clarification. Replace the usage of uint64_t as the type of lpid to MPIR_Lpid. This improves consistency.
We need a device-independent way of identifying processes. One way is to use the combination of (world_idx, world_rank). Thus, we need maintain a list of worlds so that the world_idx points to the world record. This may not fit in the concept of MPI group, but since the group need a ways of id processes, thus it seems most closely related. The first world, world_idx 0, is always initialized at init. Due to session re-init, we need make sure to reset num_worlds to 0 at finalize. New worlds will be added upon spawning or connecting dynamic processes (to-be-implemented).
We need reset num_worlds so that Session re-init will work.
Add builtin MPIR_GROUP_WORLD and MPIR_GROUP_SELF, so we can create builtin communicators from builtin groups.
Internally the only reason to duplicate a group is to copy from NULL session to a new session. Otherwise, we can just use the same group and increment the reference count.
Since builtin groups can be returned to users, they should be allowed to free. They are reference counted anyway.
To make MPI group a first-class citizen, we will always have group before creating communicators, so that when device layer activate communiators, e.g. in MPID_Comm_commit_pre_hook, it can rely on the group to look up the involved processes. It also removes the necessity to maintain any other process addressing schemes.
Many places we just return MPIR_Group_empty without increment the ref_count. This is fixable. But for now, let's avoid freeing it.
The init_comm does the release manually.
Add assertions to make sure the local_group and remote_group (for inter communicators) are always set before MPID_Comm_commit_pre_hook.
Otherwise, the MPI_T functions may not able to convert builtin datatypes.
When we run tests as functions, the stray output in MPI_Finalize, such as the debug messages in debug builds, are not captures previously. This patch make sure we report such stray output as failures.
Now that we always have group inside a communicator, we can simply return the lpid from the group. Because this will be used in the hot path, make it inline.
Add the following macros: MPIR_LPID_WORLD_INDEX MPIR_LPID_WORLD_RANK MPIR_LPID_FROM
Fix a typo in setting the size of MPIR_GROUP_SELF. Add ref_count if we return MPIR_GROUP_EMPTY to prevent freeing the builtin when it is released internally. Unfortunately, since user can directly use MPI_GROUP_EMPTY, we can't keep ref_count accurate. But at least we can keep it positive to prevent an actual free.
The builtin groups are in session NULL. We need duplicate the groups in MPIR_Group_from_session_pset_impl to return a group in the correct session.
Group are a natural place to host vcrt (virtual connection reference table). When communicators are duplicated, groups are simply inherited and reference counted. Thus we won't end up with duplication of vcrt.
Because the tmp_comm uses a temporary vc that doesn't belong to any pg, it is incompatible to the new comm init process (that relies on lpid lookup to construct vcrt tables). Turns out we only need tmp_comm to perform basic send/recv (MPIC_Sendrecv) and we don't need most of the facility of a normal communicator. Shortcut the tmp_comm construction and destroy greatly simplifies the code.
Replace the usage of mapper with comm->local_group and comm->remote_group in MPIDI_CH3I_Comm_commit_pre_hook.
The only logic for whether to release a vc is whether this vc is for a dynamic process. It has nothing to do with the whether MPI_Comm_disconnect is called. The semantics of MPI_Comm_disconnect is just to wait for all communication complete. It is orthogonal to how the comm is destroyed.
In MPIR_Comm_create_inter, we know whether the remote group is empty after the exchange, thus it is unnecessary to create and commit the intercomm then delete it later. Simply don't create it in the first place. The device layer is not necessarily equipped to handle intercomm commit with empty groups.
@@ -81,6 +81,11 @@ int MPIR_find_world(const char *namespace); | |||
*/ | |||
typedef uint64_t MPIR_Lpid; | |||
|
|||
#define MPIR_LPID_WORLD_INDEX(lpid) ((lpid) >> 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the bit for DYNAMIC be cleared before converting to world index?
(*new_group_ptr)->size = old_group->size; | ||
(*new_group_ptr)->rank = old_group->rank; | ||
MPIR_Group_set_session_ptr(*new_group_ptr, session_ptr); | ||
memcpy(&(*new_group_ptr)->pmap, &old_group->pmap, sizeof(struct MPIR_Pmap)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to comment on map reusing in PR #7235, should we add refcount to MPIR_Pmap
so that it can be easily shared in most of the scenario. This would reduce memory usage due to these linear data structures.
@@ -89,6 +89,7 @@ int MPIR_Group_init(void) | |||
int MPIR_Group_finalize(void) | |||
{ | |||
num_worlds = 0; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove blank line. Only updating code format if this is update to code logic. This would reduce potential merge conflict.
Pull Request Description
Based on #7235, #7237
Now the
local_group
andremote_group
inMPIR_Comm
can fully replace the functions of mapper, refactor ch3 to use group instead of mapper inMPIDI_CH3I_Comm_commit_pre_hook
.[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.