You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Measured improvement (on branch): flatten try advance switch-case statements merge the advance and completion checks (big savings of ~150-170ns per outer loop iteration (once for every command or packet, whichever is more frequent) which can represent 15-20% improvement
(on branch) User "one_packet" data flow APIs
Estimated improvements: save 50-100ns per CB packet
Allow multiple open packets in flight to avoid blocking read/write barriers for commands
Optimize packet header initialization
a surprising amount of time is spent on this
enable caching
simplify packet structure (see Fabric EDM Optimizations)
Fabric EDM Optimization:
Merge command type and noc command type in packet header.
CCL Add New Commands for Perf
Reduce Scatter
All Reduce
March to 12GBps per link
CCL Backend Kernels Optimization:
Optimize main loop and noc burst commands:
Fabric EDM Optimization:
Perf Reporting
The text was updated successfully, but these errors were encountered: