Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tracing): add nvtx provider #363

Closed
wants to merge 6 commits into from
Closed

Conversation

aws-nslick
Copy link
Contributor

Description of changes:

6dcdfc8 !wip feat(tracing): add nvtx provider
87d76fe !wip feat(tracing): support scope pops
45b90d0 refactor(trace/lttng): move into tracing_impl
52c864d build(tracing/nvtx): m4: add CHECK_PKG_NVTX
d35178b build(lttng): m4: stop setting HAVE_CUDA

8 files changed, 449 insertions(+), 296 deletions(-)
configure.ac                 |   1 +
include/tracepoint.h         | 341 +++++++++----------------------------------
include/tracing_impl/lttng.h | 266 +++++++++++++++++++++++++++++++++
include/tracing_impl/nvtx.h  |  25 ++++
m4/check_pkg_lttng.m4        |   2 +-
m4/check_pkg_nvtx.m4         |  46 ++++++
src/nccl_ofi_rdma.c          |  52 ++++---
src/nccl_ofi_sendrecv.c      |  12 +-

Please see notes on !wip commits.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This doesn't belong here and is likely a copy/paste mistake. It's
unclear why this didn't break anything: this could potentially cause
very bizarre header inclusions under neuron when tracing is enabled.
Resolve nvToolsExt and hook into configuration. Implementation follows
in future commits.
Separate each tracepoint/profiling provider (currently, just the one)
into its own header.
include/tracing_impl/lttng.h Show resolved Hide resolved
m4/check_pkg_nvtx.m4 Show resolved Hide resolved
src/nccl_ofi_rdma.c Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
Add an api `NCCL_OFI_TRACE_POP()' for closing a tracing scope.

WIP: needs to accept a scope name to close.
Hook nvtx_push()/nvtx_pop() on existing lttng macros.

WIP: Currently, this builds correctly but is untested.

Will remain WIP until we figure out how to structure this in a way that
aligns the required usages of NCCL_OFI_TRACE_POP() for nvtx with cases
like NCCL_OFI_TRACE_SEND_WRITE_SEG COMPLETE/START. Probably, we should
just redo all the lttng macros so that they all wrap a workload, rather
than today where the majority just signal that an event took place.

I would also like to support a separate course-grained type of probe
definition within this module. lttng and nvtx are best-suited for
fine-grained/range-based eventing around program behavior (not quite
what we have today, but where we want to get: things like wrapping an
entire event and/or supplying rich metadata around that event.)

For this, we need to support:

    1. NVTX because of the ecosystem this plugin exists in.

    2. Something that's cheaper than userspace uprobe (see bpftime
       below) and in-process or nearly so. Some candidates: perfetto,
       redoing the existing lttng support, etc.

Separate from this, we should also support builds with course entry/exit
USDT probes for basically all nontrivial functions. This can be a lot
more useful for development and for building debug tools. Some tooling
that this would enable:
 + very generic and allows for cross-dependency analysis
 + https://github.com/eunomia-bpf/bpftime + bpftrace or bcc makes this cheap
 + certain `linux perf` calls can benefit from this.
 + potential to profile kernel via kprobes in the same script.
 + offcpu analysis

These are just nop sleds and have zero runtime overhead; so they can be
enabled on default/release builds. (See: [1] for how others use this)

It's surprisingly difficult to do this in a way that does not require
code changes. Can potentially do this with a small out-of-tree llvm
pass (and/or a gcc equivilent, see "gcc python plugin" on github) that
piggy-backs on -finstrument-functions's __cyg_profile_func_exit and
__cyg_profile_func_entry calls. Putting the USDT probe in the
__cyg_profile_func_exit impl itself is not viable. Need to dig more.

[1]: https://www.brendangregg.com/Slides/reInvent2019_BPF_Performance_Analysis/
@rauteric
Copy link
Contributor

rauteric commented Apr 4, 2024

Closing this one in favor of #377.

@rauteric rauteric closed this Apr 4, 2024
@aws-nslick aws-nslick deleted the nvtx-poc branch May 15, 2024 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants