You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When initializing the RDMA data structure and process/NIC table at scale (256+ nodes) the time is about 80s. This process is mostly serialized at node 0 and should be re-worked with better collectives.
The text was updated successfully, but these errors were encountered:
While likely not the complete solution we want for version 0.3, I changed the serial handshaking that was going on to instead use a libPMI based exchange.
This should help a bit during the init process, and during creating of new RMA regions.
If using with Lamellar, the impact may be minimal for versions <=0.5 as I found another portion of init process that was also inefficient after changing Rofi, my recommendation is to use lamellar >=0.6
When initializing the RDMA data structure and process/NIC table at scale (256+ nodes) the time is about 80s. This process is mostly serialized at node 0 and should be re-worked with better collectives.
The text was updated successfully, but these errors were encountered: