-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define monitor unsubscribe_on_delete #10687
base: main
Are you sure you want to change the base?
Conversation
4ff05b7
to
20a72a4
Compare
Please add the new files to Windows build file list. |
prov/util/src/util_mr_cache.c
Outdated
ofi_monitor_unsubscribe(monitor, entry->info.iov.iov_base, | ||
entry->info.iov.iov_len, | ||
&entry->hmem_info); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done within cache->delete_region()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't. kdreg2 is designed to work with any provider. Putting this in delete_region() would result in the ofi_monitor_unsubscribe() being called for a specific provider delete_region() implementation.
In addition, ofi_monitor_subscribe() is called outside of cache->add_region(). Because of this, ofi_monitor_unsubscribe() should be called outside of cache->delete_region().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
But the general architecture we have for the cache and monitors is to use callbacks, some of which may be set to a no-op. This is changing that to using a variable to adjust the flow. Maybe that's okay...
But in other places, ofi_monitor_subscribe/unsubscribe are called holding the mm_lock. Here, there are no locks being held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, each call to ofi_monitor_subscribe() during MR cache insert would have a matching ofi_monitor_unsubscribe() when MR cache entry is freed. Unfortunately, this is the the case for all memory monitors. For example, with uffd, ofi_monitor_subscribe() may be called multiple times against the same VMA. A single ofi_monitor_unsubscribe() will unregister the VMA.
Unless we want to define new memory monitor semantics or fix memory monitors to support ofi_monitor_unsubscribe() on MR cache free, unsubscribe_on_delete may be the best option.
The locking is a bit complicated. Ideally, each memory monitor would have independent locking from the MR cache. But, to align with what uffd and rocr do, I can add the locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this locking is 'correct' architecturally, but I think it works because this path is only taken by 1 monitor which has per entry subscription. (Compare this to the locking where entries are added to the cache. The mm_lock is held around both inserting the entry into the rb tree and subscribing. Dropping the lock between removing the entry from the rb tree and unsubscribing looks racy, as a new entry could be added in between.)
The subscribe/unsubscribe model may need to change. Checking the code, most monitors set these to no-ops. I can't even remember when unsubscribe is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out how ofi_monitor_unsubscribe() is currently used to see if there's a bug. I don't see that it's needed at all looking at the code, and that seems wrong.
It's called in 2 places: rocr_mm_dealloc_cb() and ofi_uffd_handler(). In both, we're dealing with a specific monitor, so the unsubscribe logic can be inserted directly into those places. I.e. rocr_mm_dealloc_cb -> rocr_mm_unsubscribe, and ofi_uffd_handler -> ofi_uffd_unsubscribe. A generic abstraction isn't needed.
kdreg2_monitor_unsubscribe() is the only other implementation of unsubscribe, which isn't called (hence this patch).
I agree subscribe should be paired with unsubscribe. The question is why is it not? Why is unsubscribe not called unconditionally in util_mr_free_entry()?
I can produce the same functionality by setting rocr and uffd unsubscribe callbacks to no-ops, calling the internal functions listed above directly, and always calling unsubscribe (which would now only be implemented for kdreg2). Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you described should work. By doing this, from the the MR cache point-of-view, each MR cache entry will have a 1-1 mapping to subscribe/unsubscribe. This opens up the door to simplifying util_mr_cache_create(), by moving ofi_monitor_subscribe() outside of the mm_lock. Something like the following.
cache->add_region(cache, *entry);
ofi_monitor_subscribe(monitor, info->iov.iov_base, info->iov.iov_len, &(*entry)->hmem_info);
pthread_mutex_lock(&mm_lock);
ofi_rbmap_insert(&cache->tree, (void *) &(*entry)->info, (void *) *entry, &(*entry)->node));
pthread_mutex_unlock(&mm_lock);
But, to support this, each memory monitor is may need its own lock. Looking at the memory monitors, ROCR would need a lock to protect its tree. Unclear what to do with import memory monitor. Should libfabric provide locking or the user? Requiring the user to do this may be a new requirement which changes existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is: is the existing code wrong? Why isn't ofi_monitor_unsubscribe() called from some generic location?
I think the answer is... If the monitor works on a page boundary, then trying to unsubscribe when an entry is removed from the cache can unsubscribe pages which may overlap with ones still in the cache. I think this is the case for uffd.
That is, subscribe/unsubscribe aren't necessarily directly related to a cached entry, but the monitored region behind the entry. That monitored region may apply to more than 1 entry, since entries have byte granularity, but regions may not.
Maybe 'subscribe/unsubscribe' should have been called 'entry_added/entry_removed', but we're kind of stuck with the name (via fi_ext.h). We could add a comment to the code indicating when the callback is actually invoked, so an implementation doesn't assume unsubscribe means that the region isn't still in the cache under another registration.
As for the locking, I think we want to keep it as-is. I can envision the possibility of (un)subscribe checking the cache to see if a monitored region has overlap with another entry in the cache prior to deciding what it should do. E.g. uffd might benefit from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer is... If the monitor works on a page boundary, then trying to unsubscribe when an entry is removed from the cache can unsubscribe pages which may overlap with ones still in the cache. I think this is the case for uffd.
This is case with UFFD (i.e., 1 UFFD context per VMA).
Are the following the next steps?
- Revert to old locking (i.e., no locking around ofi_monitor_unsubscribe)
- Update uffd and rocr monitor_unsubscribe to NULL
- Unconditionally call ofi_monitor_unsubscribe with comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the desired changes would be:
- Document that subscribe is called when a new memory registration entry is inserted into the cache.
- Document that unsubscribe is called when an existing registration entry is removed from the cache.
- Document that there may be registrations in the cache which overlap with the registered region at the time subscribe or unsubscribe are called.
- Update uffd and rocr unsubscribe calls to be no-ops.
- Call ofi_monitor_unsubscribe() when an entry is removed from the cache. That is, when it is removed from the rbtree, not when the entry itself is freed. This should be done while holding the mm_lock.
This pairs subscribe with unsubscribe together with rbtree insert / delete.
20a72a4
to
8a4a4f7
Compare
8a4a4f7
to
5477141
Compare
Other memory monitors, such as CUDA, ROCR, and ZE, have a .c file for the implementation. This change cleans up the util_mem_monitor.c code by defining a uffd and import .c file, thus aligning to other memory monitor implementations. Signed-off-by: Mike Uttormark <[email protected]> Signed-off-by: Ian Ziemba <[email protected]>
Some memory monitors, such as kdreg2, have a subscription context per MR cache entry. These memory monitors require unsubscribe to be called for each freed MR cache entry. To support this, call unsubscribe when an entry is remove from the MR cache RB tree. If a memory monitor does not support a subscription context per MR, unsubscribe must be implemented as a noop. Update uffd and rocr memory monitors accordingly. Signed-off-by: Ian Ziemba <[email protected]>
ROCR deallocation CB will call rocr_unsubscribe with mm_lock held. If memhooks is used, since rocr_unsubscribe may call free, this can result in memhooks intercepting the free and leading to deadlock. To avoid this, freeing is deferred until locks are released. Signed-off-by: Ian Ziemba <[email protected]>
5477141
to
7f0732f
Compare
I think I got this issue addressed now. Also, during testing of the ROCR changes, I found a memhooks + rocr deadlock issue which needed fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment -- overall changes look good. Thanks for working through this!
|
||
uffd.monitor.subscribe = ofi_uffd_subscribe; | ||
uffd.monitor.unsubscribe = ofi_uffd_unsubscribe; | ||
uffd.monitor.valid = ofi_uffd_valid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you're just moving code, but do we know why these are set here rather than statically? ofi_uffd_handler could conceivably access null pointers. (It won't in practice.)
*/ | ||
static void rocr_mm_unsubscribe(struct ofi_mem_monitor *monitor, | ||
const void *addr, size_t len, | ||
union ofi_mr_hmem_info *hmem_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we can use ofi_monitor_unsubscribe_no_op() as the monitor callback and leave rocr_mm_unsubscribe() named as-is.
*/ | ||
static void ofi_uffd_unsubscribe(struct ofi_mem_monitor *monitor, | ||
const void *addr, size_t len, | ||
union ofi_mr_hmem_info *hmem_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to rocr, we can use the common no-op.
Not all memory montiors have a 1-N relationship between subscribe and the MR cache entries. Some memory monitors, such as kdreg2, have a 1-1 relationship and require unsubscribe to be called when the corresponding MR cache entry is deleted.
To meet this requirement, unsubscribe_on_delete is defined. When true for a memory monitor, ofi_monitor_unsubscribe() will be called on MR cache being freed.