-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Beam crash related to DETS or persistent_term #9222
Comments
I can provide windows crash dumps but I'd need guidance on how to set up windows for OTP debugging |
Which specific 27 versions have you tried? I assume 26 works? |
This crash has been failing CI since I added OTP 27 to the matrix 5 months ago https://github.com/elixir-lsp/elixir-ls/actions/runs/9817589619. The newest I tried locally was 27.2. I guess all versions are affected
Yes, I run mostly the same test suite on 22-27. Only 27 on windows is affected |
I managed to reproduce it locally, the crash happens when garbage collecting literals, which would point to something related to persistent_term deletion, or just literal GC in general. I will continue to dig today, but Christmas holidays is coming so it will probably be a couple of weeks before I have time to find out what is going on. |
So I'm trying to figure out what this is and I have a question. When it fails, this is printed:
just before Erlang segfault, while when it does not fail no such message is printed. So you know what might cause it to enter the path where that is printed? and if so, any ideas on how to make that always happen? |
And just after I typed that message, I ofcourse managed to get that printout on a run that did not fail, so that seems to be a red herring. Digging on... |
Those are the 2 places this printout may come from: The code reads a binary from persistent term and either initializes or resets DETS tables |
Adding some notes for myself: The crash happens here:
when erts_bs_start_match_3 is called with a bitstring where the underlying refc binary is a literal that has been GC:ed away. More digging to resume on Monday... |
After much digging I finally found the issue. Solution in #9349. The combination of GC bug together with Windows only made this so much harder than it needed to be to find... Thanks for the report! |
Fix will be part of Erlang/OTP 27.3. |
I've seen the fix is in generic code that was not changed for ages. Was it a regression introduced in 27? Does the bug affect only windows? |
It was introduced in 27 by a major refactoring in how binaries look inside the vm. The refactoring missed to update this part of the code that is only used on windows. |
Describe the bug
When ElixirLS test suite is run on Windows on OTP 27, beam crashes. When run from PowerShell terminal it corrupts the terminal.
When run from git bash it crashes and writes
No crash dump file is produced
To Reproduce
Unfortunately I was not able to isolate this crash to a simple erl script. The below steps require elixir 1.17 install with hex
mix deps.get
apps/language_server
mix test test/server_test.exs
ormix test test/providers/workspace_symbols_test.exs
Beam crashes almost on every run
Here's an example crash from ElixirLS CI
https://github.com/elixir-lsp/elixir-ls/actions/runs/12369438492/job/34521367766
Expected behavior
No crash
Affected versions
27 on windows
The bug was not present on earlier version
Linux and mac is not affected
Additional context
The bug seems to be some weird combination of DETS and/or persistent_term usage. Removing DETS makes the crash much harder to reproduce. Removing both makes the crash go away.
The text was updated successfully, but these errors were encountered: