-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpmsg: add release cb and refcnt in endpoint to fix ept used-after-free #508
Conversation
Hello @yintao707 Your issue is not clear for me. Seems to me that you can call |
hi @arnopo ,
} static void rpmsg_unregister_endpoint(struct rpmsg_endpoint *ept)
} |
Something doesn't seem correct to me (if I understood your use case correctly). In your endpoint callback, you call rpmsg_destroy_ept(ept) to free the endpoint and then continue to use the ept structure. This is equivalent to using a pointer after freeing it. Could you share your code with me so that I can better understand the issue you are trying to solve with this PR?" |
hi, @arnopo , I have updated the description. Could you please take a look again |
Thank you for clarifying the race condition! For instance, the rpmsg callback can be called under an interrupt context. In such a case, a work can be created in the callback to treat the message in normal context. In such a scenario, your PR would not prevent the ept pointer from being freed between the callback call and the work execution. Is there any particular reason why you are not using counters or mutexes in your application to prevent race conditions?" |
@arnopo In our case, the Actually, Linux did the same thing to avoid the endpoint used-after-free issue, this is the commit: |
That's my point. Half of the solution is in the library and the other part is in the application. It seems to me that you are creating a open-amp interface to solve a problem in a complex way, which could be addressed with a mutex or acounter in the application in a safer way ( for instance you can face same issue if you free the ept->priv with another thread).
Yes, in Linux, this has been implemented because the endpoint allocation and free is also managed by the virtio rpmsg bus.. However, this is not the case for OpenAMP, as a requirement is to support static allocation. The ept->release_cb() only makes sense with a counter mechanism. Therefore, I don't see a strong reason to complexify the rpmsg API, to partially answer to some memory allocations/free that should be managed by application . @edmooring, @tnmysh : any opinion on this? |
@CV-Bowen has tried to add refcount inside application, but it can't fix this problem without any race condition, for example:
Let's assume this sequence:
You can see the reference count inside my_cb can just reduce the race condition(safe after 2*), but the gap between item 1 and 2* can just be handled by OpenAMP self. The root cause is that OpenAMP call cb after release lock, the fix could be:
|
Thanks! much more simple to understand the use case with code and associated explanations.
Here I suppose that you can not test g_priv ponter, else would be simple to exit with a mutex protection
Ok I have understood your issue now. Thanks for pointing it out. As you mention current patch does no protect between the release of the mutex and the counter increment : Same issue for the unbind: https://github.com/OpenAMP/open-amp/pull/508/files#diff-569d5f59c7dc7ac2fee7252d74df8af1ba6e366a30fae7e1b97de627bc0ce472R642
One alternative was to have a rpmsg_sync_destroy_ept( cb). But regarding the rpmsg_deinit_vdev function, the release callback you propose seems more flexible . I will review more closely the patch at the beginning of the week.
|
Yes, if it saves in a global variable and protected by a mutex in this simple demo, but the real case is that the endpoints are created and destroyed with dynamical number instances. Even this simple demo applies the change you suggest, the race condition still exists since OpenAMP access g_priv->ept's fields too before entering my_cb. |
hi @arnopo , Updated, please help review again.thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments., else LGTM
Please, also update the commit message to better explain the race condition issue, and fix typo errors.
hi, @arnopo ,Thank you very much for your patient review, I have updated the PR based on your modification suggestions. Could you please help me review again, thanks |
LGTM |
dc8b74e
to
f8d7246
Compare
hi, @arnopo , I updated this PR because I found that if the reference count is only handled in the openamp layer, there may be the following issues.
But if there is another thread that actively calls rpmsg_destroy_ept through my_stop_remote, and the previous reference count is 1, after rpmsg_destroy_ept is executed, release_cb will release ept->priv. If the rpmsg_service thread continues to access ept->priv, a problem may occur.
Therefore, I have changed the original release_cb to incref_cb and decref_cb, allowing the service to control the usage of reference counts. This can mitigate the aforementioned problem. Please @xiaoxiang781216 @arnopo help review this modification,thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
No cristal clear to me: "If the rpmsg_service thread continues to access ept->priv, a problem may occur."
Your last version looks more like a hack to me. If the release_cb does not solve the issue, we probably have to consider another approach |
7ae1d28
to
bfdb274
Compare
hi, @arnopo , I can solve the problem mentioned above at the application layer, so using release_cb and refnt may be a better choice. I has updated this pr. thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 remaining comments not yet addressed + 2 new one
Done, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@@ -515,6 +515,7 @@ static void rpmsg_virtio_rx_callback(struct virtqueue *vq) | |||
/* Get the channel node from the remote device channels list. */ | |||
metal_mutex_acquire(&rdev->lock); | |||
ept = rpmsg_get_ept_from_addr(rdev, rp_hdr->dst); | |||
rpmsg_ept_incref(ept); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yintao707, @arnopo does that make sense to move rpmsg_ept_incref() API within rpmsg_get_endpoint API ? If endpoint is retrieved successfully then we increase refcount. It is possible that get_endpoint API is called in future, so we increase refcount notifying endpoint is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arnopo Looks like refcnt is tracking more if callback is in progress or not rather than how many times endpoint is being used my multiple threads using rpmsg_get_endpoint correct ?
If this is the case, can we update documentation accordingly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yintao707, @arnopo does that make sense to move rpmsg_ept_incref() API within rpmsg_get_endpoint API ? If endpoint is retrieved successfully then we increase refcount. It is possible that get_endpoint API is called in future, so we increase refcount notifying endpoint is being used.
I would prefer not to hide it in rpmsg_get_endpoint
and address this only if we need to export the rpmsg_get_endpoint API in the future.
@arnopo Looks like refcnt is tracking more if callback is in progress or not rather than how many times endpoint is being used my multiple threads using rpmsg_get_endpoint correct ?
If this is the case, can we update documentation accordingly ?
Is the documentation header for rpmsg_ept_incref not explicit enough for you?
Could you provide more details on which part of the documentation you would like to see updated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation for ept_incref looks good. But same documentation for refcnt variable should be updated:
/** Reference count of the endpoint */
uint32_t refcnt;
Above comment gives impression that refcnt variable is used for endpoint object reference counts. But, refcnt variable is used to track if callback execution is in progress or not. So, above variable documentation should be updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to hide it in
rpmsg_get_endpoint
and address this only if we need to export the rpmsg_get_endpoint API in the future.
Ok sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation for ept_incref looks good. But same documentation for refcnt variable should be updated:
/** Reference count of the endpoint */ uint32_t refcnt;
Above comment gives impression that refcnt variable is used for endpoint object reference counts. But, refcnt variable is used to track if callback execution is in progress or not. So, above variable documentation should be updated accordingly.
@yintao707 , please could you address @tnmysh comment that we can merge it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM |
if rpmsg service free the ept when has got the ept from the ept list in rpmsg_virtio_rx_callback, there is a used after free about the ept, so add refcnt to end point and call the rpmsg service release callback when ept callback fininshed. Signed-off-by: Bowen Wang <[email protected]>
issue description:
In our case, the
rpmsg_virtio_rx_callback()
is called in a rpmsg thread. Let's assume a situation, whenmy_cb
has already got the ept in rpmsg thread, user service thread just right calledmy_deinit
to excute rpmsg_destroy_ept and released ept, at this time, the execution of ept ->cb has not been completed in rpmsg thread. so , there is a used after free about the ept.Assuming we choose to fix this situation at the user level, we can continue to refer to the following examples:
The root cause is that OpenAMP call cb after release lock
Therefore, reference counting in the application layer cannot solve this race condition. Therefore, to avoid race condition, we added refnt to the endpoint and call the release callback when ept callback finished