We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bringup CCL command "CB_to_Tensor" (name to be workshopped :) ) in local mode
fabric write
Bringup "Tensor_to_CB"
Bringup semaphore inc
Bringup semaphore wait
Enable offset reads/writes into pages for Tensor stream commands #16403
(deprioritized - nice to have) Bringup CCL command "ccl shift tensor" for local writes
fabric_writes
Enable New All Gather #15060
Enable new reduce-scatter (For Llama demo level generality): #15006
Enable CCL V2 (composite) all-reduce #16400
Persistent Fabric Support
Integrate CCL V2 into TG Llama #16607
ttnn/cpp/ttnn/operations/ccl/common/kernels/ccl_send_reader_two_input.cpp
Construct reduce-scatter global semaphores outside of op invocation and pass in to reduce-scatter #16398
EDM Fabric tuning (optimize EDM)
Worker count tuning (May need to add more workers)
Convert tensor level commands to resolved addresses/page IDs (improved perf due to lower SW overheads) - all gather #16395
Enable CCL V2 (Fused) all-reduce
The text was updated successfully, but these errors were encountered:
SeanNijjar
caixunshiren
No branches or pull requests
Phase 1: Initial Async CCLs (TG Llama configs only ini initial support list, full tensor granularity for synchronization)
EDM Fabric
CCL Command Support
Bringup CCL command "CB_to_Tensor" (name to be workshopped :) ) in local mode
fabric write
s (unicast) Implement mcast on sender side to start after EDM fabric bringupBringup "Tensor_to_CB"
Bringup semaphore inc
Bringup semaphore wait
Enable offset reads/writes into pages for Tensor stream commands #16403
(deprioritized - nice to have) Bringup CCL command "ccl shift tensor" for local writes
fabric_writes
after EDM fabric availabilityEnable New All Gather #15060
Enable new reduce-scatter (For Llama demo level generality): #15006
Enable CCL V2 (composite) all-reduce #16400
Persistent Fabric Support
Integrate CCL V2 into TG Llama #16607
ttnn/cpp/ttnn/operations/ccl/common/kernels/ccl_send_reader_two_input.cpp
) #16608Construct reduce-scatter global semaphores outside of op invocation and pass in to reduce-scatter #16398
Bugs
Perf: Parallel with end of Phase 1 and Phase 2
EDM Fabric tuning (optimize EDM)
Worker count tuning (May need to add more workers)
Convert tensor level commands to resolved addresses/page IDs (improved perf due to lower SW overheads) - all gather #16395
Enable CCL V2 (Fused) all-reduce
Phase 2: Add finer granularity tensor synchronization
The text was updated successfully, but these errors were encountered: