-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Reboots (Spinlock Timeout Panic) on iOS 15 arm64e #274
Comments
Here is an attempt at a more indepth explanation of the issue, to my best current understanding of it. Keep in mind it is based on assumptions that are basically impossible to verify. So in a multithreaded system "locks" are used to prevent two threads from interfering with each other. By that one thread can acquire a lock, make the modification and unlock it. While locked, another thread trying to acquire the lock will wait until the object has been unlocked again. A spinlock is essentially the same thing, just used for performance relevant stuff and the main difference is that a spinlock can time out if something takes the lock too long while another thread is trying to acquire the lock. So when acquiring a lock and the object is already locked, it would wait for a few ticks and if the object doesn't get unlocked in that time frame, it will time out. This mechanism by itself is not the issue, the issue has to do with memory pages. Every memory page (which describes an area of 16kB of RAM) has a spinlock so that there are no issues when multiple processes try to acquire the same page at the same time. Specific pages can be mapped into multiple processes (e.g. if both load the same library), they reuse the same page in order to save memory. Tweaks want to overwrite such memory on a per-process basis, so they have to first make a process-specific copy of the existing mapping and map it on top of it, so that e.g. one page can be modified in one process while remaining stock in the other processes. The issue seems to specifically happen when mapping on top of a page that resides inside the dyld_shared_cache. The problem is now that Apple probably never tested this kind of hooking and apparently when you do it in a lot of processes, it can cause the original page (the one of the shared mapping) to be paged out, because it's not actively being used. Paging out a page essentially removes it from RAM and when it is accessed again it will be loaded again. On a stock system this will not happen because nothing has been hooked. Now the root cause appears to be something trying to page a previously paged out shared/executable page back in, this triggers a preemption issue where one thread takes the spinlock and while it has that, it gets preempted to a different context which also takes the same spinlock (Preemption essentially is a mechanism that allows one thread to be used for something else even if it's currently busy, code has to explicitely disable and reenable it if there is a piece of code that should always be executed in one go). So there seems to be one code path which is only invoked from this particular behaviour where Apple does not correctly disable preemption, leading to one thread taking the same spinlock two times, which makes it time out because the old context isn't executing anymore and can't unlock the spinlock again. As for mitigating it, I tried messing with spinlock related variables to make the threshold that it takes for it to time out higher, unfortunately Apple screwed us over because everything related to that is KTRR protected, for which we do not have a bypass. I guess the proper fix would be to "wire down" (wiring down a page prevents it from being paged out) every to-be-hooked page before it's overwritten to ensure that the page out never happens and therefore the code path involved in the issue doesn't trigger, I tried a bunch of stuff so far but it seems it's straight up impossible to acquire such a wiring from userspace, so it has to be done inside the kernel. Unfortunately the structures involved in this specific shared mapping that causes the issue are very convuluted and I have yet to find a way to get the correct page object to apply the wiring to. |
So the next step to try and fix it would be to find the vm_page structure of a DSC page in kernel memory, so far all my attempts at finding such a structure have failed. |
I assume since this is still here it’s still an issue in the latest release? Are there any reboot issues that have been resolved with the new release? Sounds difficult. If I ever start doing iPhone programming maybe I’ll take a look, Or maybe that would be the most awful first experience of such work I could think of, lol! well, I’m on my third day without a reboot which is better than I did on pale rain anyway. If I start to get more than a week without any reboots I’m definitely starting a success thread about it as far as I can tell it’s been an issue at all the rules jailbreaks. The main one in palerain was solved by automatically scheduling a user space reboot every 24 hours |
This issue only affects arm64e and is fixed in 16.0. So if you're coming from palera1n, you don't need to worry. |
You mean it’s fixed after iOS 16.0? Not sure what the 16.0 was. Thinking of buying a new in box iPhone 13 Pro Max because it will be on iOS 15 usually and even some used will be on 16.x, and this is the jailbreak that will support it and all my tweaks! |
Hi, I installed dopamine on an iPhone 7 32GB on 15.7 I’m a ‘bit dread of sending all data to an new iPhone I received iPhone 12 128GB, could the issue repeat on it ? I hope my explanation was clear, that it’s linked in a way- x) |
What you describe has nothing to do with this issue. A spinlock is simply a random reboot, not a spinning wheel. |
Oh okay thanks for the quick response ! Yup I don’t have random reboots but I have to force reboot sometimes_ |
Have you tried
|
You cannot mlock a page from the the dyld shared cache, as it is shared in multiple processes. The only way to do so is to use kernel r/w to lock it down in kernel. |
@opa334 I'm not sure how much this helps but I actually managed to find a pattern for my reboots. It seems to happen based on a geographical location each time I drive my car around some where but I do have a geographical based automation set so it might be related to that as well. |
there is a reason for why you needed to pull up archive.org to find this... |
@opa334 Couldn't we get a old version of |
That program's purpose was to create or update the dyld shared cache file from framework and library binaries on disk. It doesn't have anything to do with temporary modification of it in memory for function hooking. It's no longer used by either iOS or macOS because those separate binaries don't exist anymore. It wouldn't be possible to use it on modern iOS anyway, because you can't overwrite it since the filesystem it's stored on is read-only and signed.
|
Such tooling never worked on iOS in the first place. Even if I had the ability to replace the dyld_shared_cache, that also wouldn't fix this issue as it's about applying different function hooks in different processes, just changing the global one doesn't matter. |
@opa334 I have a idea. |
How can the So the idea is as follows. Call mmap or sbrk for a given system process to allocate more data segment pages for it. Then, change the function pointers in the in-memory variables in the system processes to call into the functions in our newly-allocated page (we're sort of hacking the binaries here, we would need to write assembly code here), which we then wire down because it's not in the James Pedersen |
You can't "change function pointers", the entire dyld shared cache is mapped r-- so if you want to replace instructions (which you do when hooking a function, as you replace it with a branch to your code), you need to map a different page on top of it. I did get one idea from this though: Maybe it's possible to map in a page from the shared cache directly and maybe then you can wire it down, I will try that soon. |
@opa334 But why can't you change function pointers, because aren't they computed at runtime? |
Because there are no function pointers, what you refer to are direct branches. |
What about the following idea? Clone the page from |
The cloned page is not backed by a file so it's always going to be "wired". Also the names of the function does not matter, it's direct branch from point a to point b. Anyhow, my idea from previously to mmap the dyld_shared_cache myself to wire it down does seem to work, so it might be that this issue will be fixed soon. |
This is still an issue for me but I am guessing you are aware it still happens ;) I got my hands on an XR with 15.2 and found out about this curse the hard way. I’ve since tried everything from limiting my usage of tweaks to limiting my choice of tweaks and using choicy to limit what the tweaks can interact with. At best I’ve gotten about two days of uptime. If there is anything I can help with like providing my next panic-full, just say the word. I did verify it is a true spinlock by peeking at the aforementioned log file after crash.
|
Here's my panic log: |
@opa334 I saw that you posted elsewhere that the idea to wire down the dyld shared cache page didn't work. That's unfortunate. Did you also try to wire down the cloned page (both alone, and along with the dyld shared cache page)? You said above that it is always going to be wired because it's not backed by a file, but iOS uses memory compression (like zram) so that is not necessarily true. https://developer.apple.com/videos/play/wwdc2018/416/ |
The idea of wiring down the shared cache did work (at least no tester had a spinlock panic with it), but it wasted so much RAM that on devices with less than 3GB RAM, things would go haywire and even on 3GB RAM devices stuff would still break after a few days of usage. |
That's strange. You weren't wiring down the entire shared cache instead of
just the original pages corresponding to the modified pages, right? That's
the only thing I can think of that could cause so much of an increase in
RAM consumption.
|
No, I was wiring down the entire shared cache since it's very hard to keep track of what pages have been modified. I tried only wiring down those modified, but spinlock panics were still occuring. |
Hey there, I never actually got a positive confirmation that this is only an issue on iOS 15? In other words, I never got a clarification that this only affects arm 64E and only on iOS 15, youhe said it was fixed with “16” but I don’t know what you meant by “16” As you said the number, but didn’t say that you meant iOS 16! By the way, I am on iOS 16.61 on an 8+ and I’ve never had a jailbreak this stable. I never get any re-Springs ever (!) and certainly no reboots. I’ve had up time for up to 20 days Before having to reboot myself, for some other reason. |
最新版本测试 有人验证是否解决15系统自动锁恐慌问题 !这个我觉得需要验证一下 |
Mapping on top of dyld_shared_cache executable pages seems to trigger an edge case behaviour in the PPL that sometimes causes a timeout on the spinlock of a memory page, resulting in a kernel panic.
The more tweaks that hook C functions are installed and the more processes those inject into, the more often this behaviour seems to be triggered.
It appears this issue could be fixed by wiring down all pages that have been hooked, but the userspace cannot take such a lock and finding the vm_page object in kernel memory to flip the
wired
bit directly is proving to be difficult.The text was updated successfully, but these errors were encountered: