Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove CoreSuspender from interpose functions #5157

Merged
merged 3 commits into from
Jan 1, 2025

Conversation

myk002
Copy link
Member

@myk002 myk002 commented Dec 30, 2024

since it may end up deadlocking, depending on what's in the call chain

since it may end up deadlocking, depending on what's in the call chain
@myk002 myk002 requested a review from ab9rf December 30, 2024 13:21
@ab9rf
Copy link
Member

ab9rf commented Dec 30, 2024

I am fairly convinced that it is entirely unsafe, and inappropriate, to acquire a CoreSuspender from a vmethod interpose. If the vmethod is being called from the simulation thread, the thread will already hold the core lock (because the simulation thread holds the core mutex whenever DFHack isn't in the "update" callback) and so it's not needed. If it's being called from a DFhack thread, then the caller should have already acquired a core lock because to get to the interpose it would have had to call a DF entry point, which our protocols require obtaining a CoreSuspender so acquiring it again is redundant. Trying to acquire the corelock from the render thread is unsafe, and trying to acquire it from a DF compute worker thread will cause a deadlock.

I am convinced that attempting to acquire a CoreSuspender while in a vmethod interpose is a logic error and risks deadlock.

So exactly why are we acquiring a suspender in these situations in the first place?

@myk002
Copy link
Member Author

myk002 commented Dec 31, 2024

I had thought I found the CoreSuspender to be required in overlay, which interposes the viewscreen methods and calls Lua, but when I tested right now, I did not hit any assert (and I double-checked that this is a non-Release build).

Let me turn this PR into a "remove all interposed CoreSuspenders" change

@myk002 myk002 marked this pull request as draft December 31, 2024 01:07
@ab9rf
Copy link
Member

ab9rf commented Dec 31, 2024

I had thought I found the CoreSuspender to be required in overlay, which interposes the viewscreen methods and calls Lua, but when I tested right now, I did not hit any assert (and I double-checked that this is a non-Release build).

It is likely that I'll add some asserts to monitor for certain thread-safety violations, but this is outside the scope of this PR. We could add an assert-check for a vmethod interpose entered by a thread that doesn't hold the core mutex. This will trigger in the event that such a method is called by a DF threaded worker, but that's intelligence that we could probably stand to learn anyway.

I think we already have an assert for certain Lua operations without holding the core mutex; I have a PR in process that will move Lua::Core::State into Core and it would be fairly trivial after that to have the Core::getLuaState method that that PR will add include an assert for holding the core mutex as well.

@myk002 myk002 changed the title [eventful] don't lock over INTERPOSE_NEXT Remove CoreSuspender from interpose functions Jan 1, 2025
@myk002 myk002 marked this pull request as ready for review January 1, 2025 03:11
Copy link
Member

@ab9rf ab9rf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - did not confirm that this covers all interposes, however

@myk002
Copy link
Member Author

myk002 commented Jan 1, 2025

I didn't do stonesense, but this should be everything in the main repo

@myk002 myk002 merged commit ee86034 into DFHack:develop Jan 1, 2025
14 checks passed
@myk002 myk002 deleted the myk_eventful branch January 1, 2025 05:39
@lethosor
Copy link
Member

lethosor commented Jan 1, 2025

I had thought I found the CoreSuspender to be required in overlay, which interposes the viewscreen methods and calls Lua, but when I tested right now, I did not hit any assert (and I double-checked that this is a non-Release build).

Historically, the problematic assertions were in AssertCoreSuspend() (my guess is this is likely what you are thinking of). This is still called from the following places in LuaTools.cpp:

  • DFHack::Lua::SafeCall()
  • DFHack::Lua::SafeResume()
  • DFHack::Lua::PushModule()
  • DFHack::Lua::SafeCallString()
  • DFHack::Lua::Event::Invoke()

In particular, I suspect we have several places still calling SafeCall() from vmethod hooks, and maybe building-hacks calls Event::Invoke() indirectly. Have these been verified to still work and not trigger AssertCoreSuspend()?

Maybe it's possible that DFHack::Core::getInstance().isSuspended() in AssertCoreSuspend() returns true now when called from vmethod hooks, and didn't before? The implementation of AssertCoreSuspend() hasn't changed since 2012.

@lethosor
Copy link
Member

lethosor commented Jan 1, 2025

Also, stonesense doesn't use vmethod hooks so no changes should be needed there. (it well predates our vmethod machinery, and doesn't really integrate into the game aside from the overlay, which is/was renderer-level, essentially drawing a buffer on top of another without regard for DF's representation of anything).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants