"Async" CCLs Support Master Issue #14562

SeanNijjar · 2024-11-01T13:57:53Z

Phase 1: Initial Async CCLs (TG Llama configs only ini initial support list, full tensor granularity for synchronization)

EDM Fabric

CCL Command Support

Bugs

[Bug] Reduce Scatter (Cluster-axis only) fails when running with persistent fabric #16390
All-gather v2 hangs when running with cluster axis API on persistent fabric #16391

Perf: Parallel with end of Phase 1 and Phase 2

EDM Fabric tuning (optimize EDM)
Worker count tuning (May need to add more workers)
- Add fabric endpoint to merge incoming traffic
Convert tensor level commands to resolved addresses/page IDs (improved perf due to lower SW overheads) - all gather #16395
Enable CCL V2 (Fused) all-reduce

Phase 2: Add finer granularity tensor synchronization

Expose slicing to model developer, inject atomic incs after each one
- Model dev/op caller gets full control over granularity and schedule

The text was updated successfully, but these errors were encountered:

SeanNijjar added P1 feature op_cat: ccl perf for issues tracking performance problems/improvements tg-llama labels Nov 1, 2024

SeanNijjar self-assigned this Nov 1, 2024

SeanNijjar mentioned this issue Nov 10, 2024

add initial fabric erisc data mover (EDM) impl #14923

Merged

6 tasks

caixunshiren self-assigned this Nov 14, 2024

caixunshiren mentioned this issue Nov 14, 2024

New All Gather #15060

Open

SeanNijjar mentioned this issue Dec 16, 2024

Initial CCL Rewrite Push (Unblocks Parallelization of Efforts and Some TG Llama integration) #16026

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Async" CCLs Support Master Issue #14562

"Async" CCLs Support Master Issue #14562

SeanNijjar commented Nov 1, 2024 •

edited

Loading

"Async" CCLs Support Master Issue #14562

"Async" CCLs Support Master Issue #14562

Comments

SeanNijjar commented Nov 1, 2024 • edited Loading

Phase 1: Initial Async CCLs (TG Llama configs only ini initial support list, full tensor granularity for synchronization)

EDM Fabric

CCL Command Support

Bugs

Perf: Parallel with end of Phase 1 and Phase 2

Phase 2: Add finer granularity tensor synchronization

SeanNijjar commented Nov 1, 2024 •

edited

Loading