Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser] single threaded diagnostic server #111910

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

pavelsavara
Copy link
Member

@pavelsavara pavelsavara commented Jan 28, 2025

This is split from #110818 to make code review easier

Enabled by <FeaturePerfTracing>true</FeaturePerfTracing>

Single thread

  • ep_rt_thread_create and ep_rt_thread_mono_start_func
    • it will run short, non-blocking jobs on the browser event loop instead of on dedicated threads
    • streaming_loop_tick refactored from streaming_thread
    • server_loop_tick from refactored server_thread
  • mono_wasm_ds_exec to process on tick on event loop
  • mono_main_thread_schedule_ds_job will register callback into queue for above
  • NOOP implementation of ep_rt_wait_event_wait
  • new FEATURE_PERFTRACING_SINGLE_THREADED -> PERFTRACING_MULTI_THREADED, PERFTRACING_SINGLE_THREADED
  • ds_ipc_stream_factory_get_next_available_stream only polls once and never blocks

Sampling

  • disabled sampling_thread because we can't do "stop-the-world and scan other threads"
  • new events ep_rt_sample_profiler_enabled and ep_rt_sample_profiler_disabled
  • new ep_rt_mono_sampling_provider_component_init
    • install mono profiler instrumentation MONO_PROFILER_CALL_INSTRUMENTATION_ENTER
    • this needs to happen before any managed code is executed
  • it has negative performance impact when workload & re-linked with FeaturePerfTracing
    • this is just single condition per method enter (and later per safe-point)
    • we could introduce separate FeaturePerfTracingSampling to enable/disable this if necessary

Web socket

  • FEATURE_PERFTRACING_PAL_WS
  • ds-ipc-pal-websocket.c and ds-ipc-pal-websocket.h they just forward to JavaScript
  • empty for now ds_rt_websocket_create, ds_rt_websocket_send, ds_rt_websocket_poll, ds_rt_websocket_recv, ds_rt_websocket_close

Other

  • include diagnostics_tracing-static.a in in-tree builds
  • Microsoft.NETCore.App.Runtime.Mono.perftrace.**RID** cleanup
  • fix SystemNative_GetCpuUtilization
  • fix crash dump file permissions for CI/helix

Next PRs

  • the rest of the implementation in JavaScript, for WebSocket and JS client
  • better sampling instrumentation via MINT_SDB_SEQ_POINT or similar
  • tests

Out of scope

  • pause runtime during startup
  • multi-threaded EP on browser - this will need some cross thread marshaling

@pavelsavara pavelsavara added arch-wasm WebAssembly architecture os-browser Browser variant of arch-wasm labels Jan 28, 2025
@pavelsavara pavelsavara added this to the 10.0.0 milestone Jan 28, 2025
@pavelsavara pavelsavara self-assigned this Jan 28, 2025
Copy link
Contributor

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

@pavelsavara pavelsavara changed the title [browser] single threaded diagnosti server [browser] single threaded diagnostic server Jan 28, 2025
@pavelsavara
Copy link
Member Author

@lateralusX please review this first step

@@ -135,7 +135,7 @@
<ILLinkDescriptorsXmls Include="$(ILLinkDirectory)ILLink.Descriptors.xml" />
<ILLinkDescriptorsXmls Include="$(ILLinkDirectory)ILLink.Descriptors.OSX.xml" Condition="'$(TargetsOSX)' == 'true' or '$(TargetsMacCatalyst)' == 'true' or '$(TargetsiOS)' == 'true' or '$(TargetstvOS)' == 'true'" />
<ILLinkDescriptorsXmls Include="$(CoreLibSharedDir)ILLink\ILLink.Descriptors.Shared.xml" />
<ILLinkDescriptorsXmls Condition="'$(FeaturePerfTracing)' == 'true'" Include="$(CoreLibSharedDir)ILLink\ILLink.Descriptors.EventSource.xml" />
<ILLinkDescriptorsLibraryBuildXml Condition="'$(FeaturePerfTracing)' == 'true'" Include="$(CoreLibSharedDir)ILLink\ILLink.Descriptors.EventSource.xml" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved to https://github.com/dotnet/runtime/blob/bfa02766bdde4657ddd1f734e274d43e6ba3c036/src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems#L67 where others are?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what is rooting it for x64 NAOT, when trimmed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make sure we don't link out the InitializeDefaultEventSources method called during Mono startup to setup runtime specific event sources. Mono root this function through above file and CoreClr does a similar thing, but CoreClr uses additional tasks to root methods called by runtime by scanning different source files, like corlib.h and generate a ILLink.Descriptors.xml as part of build. NativeAOT uses ModuleInitializer preventing this function from being linked out when running ilc on the published app but not exactly sure how its rooting it as part of S.P.C build, but I assume it ends up with similar mechanism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, why do we need to change fro ILLinkDescriptorsXmls to ILLinkDescriptorsLibraryBuildXml for just his specific item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run ILLink again during publish build step on customer dev machine. This file needs to be there to protect it from trimming. With ILLinkDescriptorsXmls it didn't work for me. Is this the right way ?

Copy link
Member

@lateralusX lateralusX Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For publish on dev machines, EventSource related logic is kept using linker msbuild property, EventSourceSupport. That is how it works on all other platforms so I assume it should work the same on WASM. It should keep all EventSource related managed code.

@@ -362,6 +363,8 @@
<EmscriptenEnvVars Include="ENABLE_JS_INTEROP_BY_VALUE=0" Condition="'$(WasmEnableJsInteropByValue)' != 'true'" />
<EmscriptenEnvVars Include="WASM_ENABLE_SIMD=1" Condition="'$(WasmEnableSIMD)' != 'false'" />
<EmscriptenEnvVars Include="WASM_ENABLE_SIMD=0" Condition="'$(WasmEnableSIMD)' == 'false'" />
<EmscriptenEnvVars Include="FEATURE_PERFTRACING=1" Condition="'$(FeaturePerfTracing)' != 'false'" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in contradiction with default value set on L6?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is just passing the value (which doesn't have to be same as default) to emscripten as env variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration=Debug => FeaturePerfTracing=true => FEATURE_PERFTRACING=1
Configuration=Release => FeaturePerfTracing='' => FEATURE_PERFTRACING=1

What is the reason to set FeaturePerfTracing based on Configuration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreCLR includes EP always. In Browser, I would like to include it when workload+Debug. Unless user specified FeaturePerfTracing explicitly. I missed the Release -> empty branch. Fixed now.

Copy link
Member

@lateralusX lateralusX Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Mono we have components meaning that an app can either statically or dynamically enable parts of the runtime when linking the app, using the same runtime build. EventPipe uses this (hot reload, debugger as well) and our mobile platforms uses it to enable/disable EventPipe using msbuild flags when building the app. Recommendation is to exclude it for release builds that should be pushed to app stores, but include it for debug and profiling builds of the apps used for debugging or testing.

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly looked at EventPipe and Mono native code in this first review pass.

#ifdef FEATURE_PERFTRACING_SINGLE_THREADED
#define PERFTRACING_SINGLE_THREADED
#else
#define PERFTRACING_MULTI_THREADED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we now will have multiple different defines controlling this, in Mono it seems that we have an older DISABLE_THREADS and here we get two more. Do we really need to define PERFTRACING_MULTI_THREADED ? That should be the default for event pipe and its PERFTRACING_SINGLE_THREADED that should be the special scenarios. Since we already have the concept of DISABLE_THREADS in Mono, maybe we should use similar naming for eventpipe, like FEATURE_PERFTRACING_DISABLE_THREADS and PERFTRACING_DISABLE_THREADS?


if (DISABLE_THREADS)
set_target_properties(
eventpipe-mono-objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the wider Mono specific DISABLE_THREADS here or is this just needed since we have some checks in diagnostics_server.c that currently uses it? If so, then maybe we should change that code to use the eventpipe define for single threaded/disabled threads. Alternative could be to define this based on the more generic event pipe as part of mono specific ds or ep config's.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are passing it to src\native\eventpipe\CMakeLists.txt so that it could set FEATURE_PERFTRACING_SINGLE_THREADED

@@ -25,8 +25,14 @@ extern EVENTPIPE_TRACE_CONTEXT MICROSOFT_WINDOWS_DOTNETRUNTIME_RUNDOWN_PROVIDER_
#define NUM_NANOSECONDS_IN_1_MS 1000000

// Sample profiler.
#if defined(PERFTRACING_MULTI_THREADED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just make this comment once, but it applies through all changes and is basically an effect of the last comment in this review. Multithreaded is the default, so we shouldn't need an explicit define for multithreaded support, instead we should use one define disabling threading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you prefer #if !defined(PERFTRACING_SINGLE_THREADED) everywhere, right ?

#else // !PERFTRACING_MULTI_THREADED

bool
ep_rt_mono_sample_profiler_write_sampling_event_for_threads (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How frequent will this be called on WASM? When we run regular sample profiler this is called on the timer interval defined in __sampling_rate_in_ns. Looking further down it appears that this method will only be called once, is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I just want to capture the sampling_event and sampling_thread to be able to use it later.

}

static void
method_enter (MonoProfiler *prof, MonoMethod *method, MonoProfilerCallContext *ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we sample on every call? Sample profiler only samples at a specific time interval, like very ms, setup in __sampling_rate_in_ns. Maybe we should only sample based on the sample rate instead on every call to be more inline with how sampling works on all other platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it needs to consider sample frequency. But also I will bring another instrumentation for like safepoint.
I will make those improvements in next PR



DiagnosticsIpcStream *stream = NULL;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EP_ASSERT (ipc->mode == DS_IPC_CONNECTION_MODE_CONNECT); ?

@@ -455,6 +455,13 @@ ds_ipc_stream_factory_get_next_available_stream (ds_ipc_error_callback_func call
// clear the view.
dn_vector_clear (&ipc_poll_handles);
}
#if defined(PERFTRACING_MULTI_THREADED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to keep while loop as is and then just doing a break in single threaded mode at end.

@@ -192,6 +194,9 @@ sample_profiler_enable (void)
if (!sample_profiler_load_profiling_enabled ()) {
sample_profiler_store_profiling_enabled (true);

ep_rt_sample_profiler_enabled ();

#if defined(PERFTRACING_MULTI_THREADED)
EP_ASSERT (!ep_rt_wait_event_is_valid (&_thread_shutdown_event));
ep_rt_wait_event_alloc (&_thread_shutdown_event, true, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little curious if ep_rt_wait functions are all noop on browser or if they are actually implemented? Regardless we could just make the code work on both single and multithreaded and only do a sampling_thread function implementation that is different, would remove the need to ifdefing inside this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only busy spin wait on browser, you can't really wait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless we could just make the code work on both single and multithreaded

Probably not

  • we need to enter from interp loop, rather than from here.
  • we capture only one stack trace
  • the code for that is different on current thread than scanning another thread

}
if (!ep_session_write_all_buffers_to_file (session, events_written)) {
session->streaming_thread = NULL;
ep_disable ((EventPipeSessionID)session);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to preserve this call to happen outside the EP_GCX_PREEMP_ENTER/EXIT as it did before.

Copy link
Member Author

@pavelsavara pavelsavara Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Could you please explain why (for my education) ?

}
return 0; // try again
}

EP_RT_DEFINE_THREAD_FUNC (streaming_thread)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is simpler and more clear to just create two functions keeping the first error/state checking code and then call into another function implemented differently for single and multi threaded support, you can still call the loop tick, but it might just be as simple to inline that part into each implementation once you have two methods.

@@ -424,6 +424,8 @@
<_CmakeEnvironmentVariable Include="ENABLE_JS_INTEROP_BY_VALUE=0" Condition="'$(WasmEnableJsInteropByValue)' == 'false'"/>
<_CmakeEnvironmentVariable Include="WASM_ENABLE_SIMD=1" Condition="'$(WasmEnableSIMD)' != 'false'" />
<_CmakeEnvironmentVariable Include="WASM_ENABLE_SIMD=0" Condition="'$(WasmEnableSIMD)' == 'false'" />
<_CmakeEnvironmentVariable Include="FEATURE_PERFTRACING=1" Condition="'$(FeaturePerfTracing)' != 'false'" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is same problem that Maraf reported

@@ -568,6 +570,8 @@
<_MonoRollupEnvironmentVariable Include="WasmEnableThreads:$(WasmEnableThreads)" />
<_MonoRollupEnvironmentVariable Include="WASM_ENABLE_SIMD:1" Condition="'$(WasmEnableSIMD)' != 'false'" />
<_MonoRollupEnvironmentVariable Include="WASM_ENABLE_SIMD:0" Condition="'$(WasmEnableSIMD)' == 'false'" />
<_MonoRollupEnvironmentVariable Include="FEATURE_PERFTRACING:1" Condition="'$(FeaturePerfTracing)' != 'false'" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is same problem that Maraf reported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly architecture area-Diagnostics-mono os-browser Browser variant of arch-wasm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants