You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are Argobots (and Linux) tuning options that can influence this behavior, but more fundamentally, the Mochi use case just isn't a natural fit for the default stack allocator in Argobots. We use relatively large (2 MiB by default) stack sizes and frequently issue and retire ULTs in a producer/consumer model that crosses OS thread (ES) boundaries.
The purpose of this issue is to track the need to implement a custom allocator that is more suited to the Margo use case and (along the way) avoids the THP memory latency performance bug on systems that use THP.
Margo transparently creates ULTs to service incoming RPCs, but if the mechanism works well there then we can consider exposing a margo wrapper for thread creation so that ULTs created for other purposes in Mochi can leverage the same mechanism.
The text was updated successfully, but these errors were encountered:
We recently ran into a situation where the default Argobots stack allocator happens to trigger a performance regression in transparent huge page (THP) handling on some systems. See pmodels/argobots#369 and https://lists.mcs.anl.gov/pipermail/mochi-devel/2021-November/000127.html. The problem was reported and identified by @philip-davis.
There are Argobots (and Linux) tuning options that can influence this behavior, but more fundamentally, the Mochi use case just isn't a natural fit for the default stack allocator in Argobots. We use relatively large (2 MiB by default) stack sizes and frequently issue and retire ULTs in a producer/consumer model that crosses OS thread (ES) boundaries.
The purpose of this issue is to track the need to implement a custom allocator that is more suited to the Margo use case and (along the way) avoids the THP memory latency performance bug on systems that use THP.
A technical description of how to implement a custom stack allocator for use cases similar to ours can be found on the Argobots mailing list: https://lists.argobots.org/pipermail/discuss/2021-November/000162.html
Margo transparently creates ULTs to service incoming RPCs, but if the mechanism works well there then we can consider exposing a margo wrapper for thread creation so that ULTs created for other purposes in Mochi can leverage the same mechanism.
The text was updated successfully, but these errors were encountered: