Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream #113

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
17 changes: 14 additions & 3 deletions nvbench/cuda_stream.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,13 @@
#pragma once

#include <nvbench/cuda_call.cuh>
#include <nvbench/detail/device_scope.cuh>
#include <nvbench/device_info.cuh>

#include <cuda_runtime_api.h>

#include <memory>
#include <optional>

namespace nvbench
{
Expand All @@ -42,10 +45,18 @@ struct cuda_stream
* Constructs a cuda_stream that owns a new stream, created with
* `cudaStreamCreate`.
*/
cuda_stream()
: m_stream{[]() {
cuda_stream(std::optional<nvbench::device_info> device)
elstehle marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs should be updated to explain the semantics of the new device parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated docs. Could you please check if it's understandable?

: m_stream{[device]() {
cudaStream_t s;
NVBENCH_CUDA_CALL(cudaStreamCreate(&s));
if (device.has_value())
{
nvbench::detail::device_scope scope_guard{device.value().get_id()};
NVBENCH_CUDA_CALL(cudaStreamCreate(&s));
}
else
{
NVBENCH_CUDA_CALL(cudaStreamCreate(&s));
}
return s;
}(),
stream_deleter{true}}
Expand Down
3 changes: 2 additions & 1 deletion nvbench/state.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,6 @@ private:
std::optional<nvbench::device_info> device,
std::size_t type_config_index);

nvbench::cuda_stream m_cuda_stream;
std::reference_wrapper<const nvbench::benchmark_base> m_benchmark;
nvbench::named_values m_axis_values;
std::optional<nvbench::device_info> m_device;
Expand All @@ -277,6 +276,8 @@ private:
nvbench::float64_t m_skip_time;
nvbench::float64_t m_timeout;

nvbench::cuda_stream m_cuda_stream;

// Deadlock protection. See blocking_kernel's class doc for details.
nvbench::float64_t m_blocking_kernel_timeout{30.0};

Expand Down
2 changes: 2 additions & 0 deletions nvbench/state.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ state::state(const benchmark_base &bench)
, m_max_noise{bench.get_max_noise()}
, m_skip_time{bench.get_skip_time()}
, m_timeout{bench.get_timeout()}
, m_cuda_stream{std::nullopt}
{}

state::state(const benchmark_base &bench,
Expand All @@ -58,6 +59,7 @@ state::state(const benchmark_base &bench,
, m_max_noise{bench.get_max_noise()}
, m_skip_time{bench.get_skip_time()}
, m_timeout{bench.get_timeout()}
, m_cuda_stream{m_device}
{}

nvbench::int64_t state::get_int64(const std::string &axis_name) const
Expand Down