Skip to content

Commit

Permalink
VITIS-11806 Support runlist submission as chained command objects (#8194
Browse files Browse the repository at this point in the history
)

* VITIS-11806 Support runlist submission as chained command objects

Add new ERT_CMD_CHAIN opcode for chained command submssion of
a runlist.  A runlist is broken into multiple chained commands if
necessary.

This PR reworks #8171 and replaces disjoint set submission of individual
command buffers with submission of a single ert_packet command buffer
that chains individual commands.

The chained command is represented as an `ert_packet` where the payload is
interpreted as `ert_cmd_chain_data` per ert.h.  The pay holds an array
of buffer_handles that are to be submitted atomically for execution.

This chained command submission is somewhat easier to manage and is a
step towards supporting command fusing where multiple run objects are
stiched together into a single command without managing individual
run objects.

This PR also parameterizes bo_cache with the size of bos needed. This
is done because runlist will not currently need a 4K buffer for the
chained ert packet.  This may change in future when runlist fuses the
individual commands.  It is understood that 4K may in fact be the
minimum underlying allocation size, but that is unbeknowst to the code
that uses the bo cache.

Signed-off-by: Soren Soe <[email protected]>

* Move chained command creation to runlist_impl::add(run)

This will keep some work out the time critical section.

Also populate chain_data payload from new run_bo property `kmhdl`
which is controlled by shim to be the kernel mode handle or address
of run_bo depending on how command chaining is implemented by shim.

Signed-off-by: Soren Soe <[email protected]>

* Redo error handling if bind_at throws

There is one caveat here and that is the chain_data command_count is
one short of what is added to the array.  This is to avoid state
change if bind_at throws, but is of course no good if bind_at expects
argument `idx` to match up with `command_count`.

Signed-off-by: Soren Soe <[email protected]>

---------

Signed-off-by: Soren Soe <[email protected]>
  • Loading branch information
stsoe authored May 23, 2024
1 parent 87c0bfc commit cd2f2f7
Show file tree
Hide file tree
Showing 9 changed files with 280 additions and 139 deletions.
11 changes: 11 additions & 0 deletions src/runtime_src/core/common/api/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright (C) 2022 Advanced Micro Devices, Inc. All rights reserved.

# if (CMAKE_BUILD_TYPE STREQUAL "Debug")
# find_program(CLANG_TIDY "clang-tidy" HINT /home/stsoe/git-nobkup/llvm-project/build)
# if(NOT CLANG_TIDY)
# message(WARNING "-- clang-tidy not found, cannot enable static analysis")
# else()
# message("-- Enabling clang-tidy")
# set(CMAKE_CXX_CLANG_TIDY "/home/stsoe/git-nobkup/llvm-project/build/bin/clang-tidy")
# endif()
# endif()

add_library(core_common_api_library_objects OBJECT
context_mgr.cpp
hw_queue.cpp
Expand Down
41 changes: 33 additions & 8 deletions src/runtime_src/core/common/api/hw_queue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -360,9 +360,13 @@ class hw_queue_impl : public command_manager::executor
hw_queue_impl& operator=(const hw_queue_impl&) = delete;
hw_queue_impl& operator=(hw_queue_impl&&) = delete;

// Submit list of commands for execution as atomic unit
// Submit single raw command for execution
virtual void
submit(const xrt_core::span<xrt_core::buffer_handle*>& runlist) = 0;
submit(xrt_core::buffer_handle* cmd) = 0;

// Wait for single raw command to complete
virtual std::cv_status
wait(xrt_core::buffer_handle* cmd, size_t timeout_ms) const = 0;

// Submit command for execution
virtual void
Expand Down Expand Up @@ -455,9 +459,17 @@ class qds_device : public hw_queue_impl
}

void
submit(const xrt_core::span<xrt_core::buffer_handle*>& runlist) override
submit(xrt_core::buffer_handle* cmd) override
{
m_qhdl->submit_command(cmd);
}

std::cv_status
wait(xrt_core::buffer_handle* cmd, size_t timeout_ms) const override
{
m_qhdl->submit_command(runlist);
return m_qhdl->wait_command(cmd, static_cast<uint32_t>(timeout_ms))
? std::cv_status::no_timeout
: std::cv_status::timeout;
}

void
Expand Down Expand Up @@ -639,9 +651,15 @@ class kds_device : public hw_queue_impl
}

void
submit(const xrt_core::span<xrt_core::buffer_handle*>&) override
submit(xrt_core::buffer_handle*) override
{
throw std::runtime_error("kds_device::submit(runlist) not implemented");
throw std::runtime_error("kds_device::submit(buffer_handle) not implemented");
}

std::cv_status
wait(xrt_core::buffer_handle*, size_t) const override
{
throw std::runtime_error("kds_device::wait(buffer_handle) not implemented");
}

void
Expand Down Expand Up @@ -808,9 +826,16 @@ unmanaged_start(xrt_core::command* cmd)

void
hw_queue::
submit(const xrt_core::span<xrt_core::buffer_handle*>& runlist)
submit(xrt_core::buffer_handle* cmd)
{
get_handle()->submit(cmd);
}

std::cv_status
hw_queue::
wait(xrt_core::buffer_handle* cmd, const std::chrono::milliseconds& timeout) const
{
get_handle()->submit(runlist);
return get_handle()->wait(cmd, timeout.count());
}

// Wait for command completion for unmanaged command execution
Expand Down
13 changes: 8 additions & 5 deletions src/runtime_src/core/common/api/hw_queue.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
#include "experimental/xrt_fence.h"
#include "experimental/xrt_kernel.h"

#include "core/common/span.h"
#include "core/common/shim/buffer_handle.h"

#include <chrono>
Expand Down Expand Up @@ -57,9 +56,13 @@ class hw_queue : public xrt::detail::pimpl<hw_queue_impl>
void
unmanaged_start(xrt_core::command* cmd);

// Submit a runlist for execution
// Submit a raw cmd for execution
void
submit(const xrt_core::span<xrt_core::buffer_handle*>& runlist);
submit(xrt_core::buffer_handle* cmd);

// Wait for raw cmd to complete
std::cv_status
wait(xrt_core::buffer_handle* cmd, const std::chrono::milliseconds& timeout) const;

// Wait for command completion. Supports both managed and unmanaged
// commands.
Expand All @@ -70,7 +73,7 @@ class hw_queue : public xrt::detail::pimpl<hw_queue_impl>
// Wait for command completion with timeout. Supports both managed
// and unmanaged commands.
std::cv_status
wait(const xrt_core::command* cmd, const std::chrono::milliseconds& timeout_ms) const;
wait(const xrt_core::command* cmd, const std::chrono::milliseconds& timeout) const;

// Enqueue a command dependency
void
Expand All @@ -84,7 +87,7 @@ class hw_queue : public xrt::detail::pimpl<hw_queue_impl>
// some command completing or from a timeout.
XRT_CORE_COMMON_EXPORT
static std::cv_status
exec_wait(const xrt_core::device* device, const std::chrono::milliseconds& timeout_ms);
exec_wait(const xrt_core::device* device, const std::chrono::milliseconds& timeout);

// Cleanup after device object is no longer valid
// Static data is cached per xrt_core::device object, this function
Expand Down
Loading

0 comments on commit cd2f2f7

Please sign in to comment.