Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define monitor unsubscribe_on_delete #10687

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

iziemba
Copy link
Contributor

@iziemba iziemba commented Jan 8, 2025

Not all memory montiors have a 1-N relationship between subscribe and the MR cache entries. Some memory monitors, such as kdreg2, have a 1-1 relationship and require unsubscribe to be called when the corresponding MR cache entry is deleted.

To meet this requirement, unsubscribe_on_delete is defined. When true for a memory monitor, ofi_monitor_unsubscribe() will be called on MR cache being freed.

@iziemba iziemba requested a review from j-xiong January 8, 2025 18:15
@iziemba iziemba force-pushed the mem_monitor_updates branch from 4ff05b7 to 20a72a4 Compare January 8, 2025 18:16
@j-xiong
Copy link
Contributor

j-xiong commented Jan 8, 2025

Please add the new files to Windows build file list.

ofi_monitor_unsubscribe(monitor, entry->info.iov.iov_base,
entry->info.iov.iov_len,
&entry->hmem_info);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done within cache->delete_region()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't. kdreg2 is designed to work with any provider. Putting this in delete_region() would result in the ofi_monitor_unsubscribe() being called for a specific provider delete_region() implementation.

In addition, ofi_monitor_subscribe() is called outside of cache->add_region(). Because of this, ofi_monitor_unsubscribe() should be called outside of cache->delete_region().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

But the general architecture we have for the cache and monitors is to use callbacks, some of which may be set to a no-op. This is changing that to using a variable to adjust the flow. Maybe that's okay...

But in other places, ofi_monitor_subscribe/unsubscribe are called holding the mm_lock. Here, there are no locks being held.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, each call to ofi_monitor_subscribe() during MR cache insert would have a matching ofi_monitor_unsubscribe() when MR cache entry is freed. Unfortunately, this is the the case for all memory monitors. For example, with uffd, ofi_monitor_subscribe() may be called multiple times against the same VMA. A single ofi_monitor_unsubscribe() will unregister the VMA.

Unless we want to define new memory monitor semantics or fix memory monitors to support ofi_monitor_unsubscribe() on MR cache free, unsubscribe_on_delete may be the best option.

The locking is a bit complicated. Ideally, each memory monitor would have independent locking from the MR cache. But, to align with what uffd and rocr do, I can add the locking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this locking is 'correct' architecturally, but I think it works because this path is only taken by 1 monitor which has per entry subscription. (Compare this to the locking where entries are added to the cache. The mm_lock is held around both inserting the entry into the rb tree and subscribing. Dropping the lock between removing the entry from the rb tree and unsubscribing looks racy, as a new entry could be added in between.)

The subscribe/unsubscribe model may need to change. Checking the code, most monitors set these to no-ops. I can't even remember when unsubscribe is called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out how ofi_monitor_unsubscribe() is currently used to see if there's a bug. I don't see that it's needed at all looking at the code, and that seems wrong.

It's called in 2 places: rocr_mm_dealloc_cb() and ofi_uffd_handler(). In both, we're dealing with a specific monitor, so the unsubscribe logic can be inserted directly into those places. I.e. rocr_mm_dealloc_cb -> rocr_mm_unsubscribe, and ofi_uffd_handler -> ofi_uffd_unsubscribe. A generic abstraction isn't needed.

kdreg2_monitor_unsubscribe() is the only other implementation of unsubscribe, which isn't called (hence this patch).

I agree subscribe should be paired with unsubscribe. The question is why is it not? Why is unsubscribe not called unconditionally in util_mr_free_entry()?

I can produce the same functionality by setting rocr and uffd unsubscribe callbacks to no-ops, calling the internal functions listed above directly, and always calling unsubscribe (which would now only be implemented for kdreg2). Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you described should work. By doing this, from the the MR cache point-of-view, each MR cache entry will have a 1-1 mapping to subscribe/unsubscribe. This opens up the door to simplifying util_mr_cache_create(), by moving ofi_monitor_subscribe() outside of the mm_lock. Something like the following.

cache->add_region(cache, *entry);
ofi_monitor_subscribe(monitor, info->iov.iov_base, info->iov.iov_len, &(*entry)->hmem_info);

pthread_mutex_lock(&mm_lock);
ofi_rbmap_insert(&cache->tree, (void *) &(*entry)->info, (void *) *entry, &(*entry)->node));
pthread_mutex_unlock(&mm_lock);

But, to support this, each memory monitor is may need its own lock. Looking at the memory monitors, ROCR would need a lock to protect its tree. Unclear what to do with import memory monitor. Should libfabric provide locking or the user? Requiring the user to do this may be a new requirement which changes existing behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: is the existing code wrong? Why isn't ofi_monitor_unsubscribe() called from some generic location?

I think the answer is... If the monitor works on a page boundary, then trying to unsubscribe when an entry is removed from the cache can unsubscribe pages which may overlap with ones still in the cache. I think this is the case for uffd.

That is, subscribe/unsubscribe aren't necessarily directly related to a cached entry, but the monitored region behind the entry. That monitored region may apply to more than 1 entry, since entries have byte granularity, but regions may not.

Maybe 'subscribe/unsubscribe' should have been called 'entry_added/entry_removed', but we're kind of stuck with the name (via fi_ext.h). We could add a comment to the code indicating when the callback is actually invoked, so an implementation doesn't assume unsubscribe means that the region isn't still in the cache under another registration.

As for the locking, I think we want to keep it as-is. I can envision the possibility of (un)subscribe checking the cache to see if a monitored region has overlap with another entry in the cache prior to deciding what it should do. E.g. uffd might benefit from this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer is... If the monitor works on a page boundary, then trying to unsubscribe when an entry is removed from the cache can unsubscribe pages which may overlap with ones still in the cache. I think this is the case for uffd.

This is case with UFFD (i.e., 1 UFFD context per VMA).

Are the following the next steps?

  1. Revert to old locking (i.e., no locking around ofi_monitor_unsubscribe)
  2. Update uffd and rocr monitor_unsubscribe to NULL
  3. Unconditionally call ofi_monitor_unsubscribe with comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the desired changes would be:

  1. Document that subscribe is called when a new memory registration entry is inserted into the cache.
  2. Document that unsubscribe is called when an existing registration entry is removed from the cache.
  3. Document that there may be registrations in the cache which overlap with the registered region at the time subscribe or unsubscribe are called.
  4. Update uffd and rocr unsubscribe calls to be no-ops.
  5. Call ofi_monitor_unsubscribe() when an entry is removed from the cache. That is, when it is removed from the rbtree, not when the entry itself is freed. This should be done while holding the mm_lock.

This pairs subscribe with unsubscribe together with rbtree insert / delete.

@iziemba iziemba force-pushed the mem_monitor_updates branch from 20a72a4 to 8a4a4f7 Compare January 10, 2025 04:02
@iziemba iziemba requested a review from shefty January 10, 2025 04:02
swelch
swelch previously approved these changes Jan 10, 2025
swelch
swelch previously approved these changes Jan 14, 2025
muttormark and others added 3 commits January 15, 2025 21:53
Other memory monitors, such as CUDA, ROCR, and ZE, have a .c file for
the implementation. This change cleans up the util_mem_monitor.c code by
defining a uffd and import .c file, thus aligning to other memory
monitor implementations.

Signed-off-by: Mike Uttormark <[email protected]>
Signed-off-by: Ian Ziemba <[email protected]>
Some memory monitors, such as kdreg2, have a subscription context per MR
cache entry. These memory monitors require unsubscribe to be called for
each freed MR cache entry.

To support this, call unsubscribe when an entry is remove from the MR
cache RB tree.

If a memory monitor does not support a subscription context per MR,
unsubscribe must be implemented as a noop. Update uffd and rocr memory
monitors accordingly.

Signed-off-by: Ian Ziemba <[email protected]>
ROCR deallocation CB will call rocr_unsubscribe with mm_lock held. If
memhooks is used, since rocr_unsubscribe may call free, this can result
in memhooks intercepting the free and leading to deadlock.

To avoid this, freeing is deferred until locks are released.

Signed-off-by: Ian Ziemba <[email protected]>
@iziemba
Copy link
Contributor Author

iziemba commented Jan 16, 2025

I think I got this issue addressed now. Also, during testing of the ROCR changes, I found a memhooks + rocr deadlock issue which needed fixing.

Copy link
Member

@shefty shefty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment -- overall changes look good. Thanks for working through this!


uffd.monitor.subscribe = ofi_uffd_subscribe;
uffd.monitor.unsubscribe = ofi_uffd_unsubscribe;
uffd.monitor.valid = ofi_uffd_valid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're just moving code, but do we know why these are set here rather than statically? ofi_uffd_handler could conceivably access null pointers. (It won't in practice.)

*/
static void rocr_mm_unsubscribe(struct ofi_mem_monitor *monitor,
const void *addr, size_t len,
union ofi_mr_hmem_info *hmem_info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: we can use ofi_monitor_unsubscribe_no_op() as the monitor callback and leave rocr_mm_unsubscribe() named as-is.

*/
static void ofi_uffd_unsubscribe(struct ofi_mem_monitor *monitor,
const void *addr, size_t len,
union ofi_mr_hmem_info *hmem_info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to rocr, we can use the common no-op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants