-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-bytecode-sets V3PlusClosures and SistaV1 broken, build issue w/ Clang 18.1.2, CRASH Squeak 5.3 #19486 #679
Comments
This means that all recent CI builds for win32x86 and win64x64 cannot be used with Squeak 5.3 ... |
It would be lovely if I still had a clang17 MSYS2 at hand. But I chose to update it yesterday and now I cannot rebuild that tag with clang17 to verify my assumption... So, can we figure out how to build tag 202312181441 with a recent Clang18+ to still work with Squeak 5.3? There might even a bug in OSVM connected to this... |
Hi Marcel, I’ll take a look v soon._,,,^..^,,,_ (phone)On Apr 5, 2024, at 8:25 AM, Marcel Taeumel ***@***.***> wrote:
So, we had a new OSVM release 2023.12, tag 202312181441. Our CI built that using Clang 17.0.6. We bundled the binary with Squeak 5.3 #19486. Things worked.
Now, I noticed that rebuilding tag 202312181441 with the current Clang 18.1.2 produces binaries that will immediately crash Squeak 5.3 #19486. Squeak 6.0 works and so does 6.1alpha.
crash.dmp
...
Crashed in the VM thread
...
Current byte code: 224
Primitive index: 0
...
Primitive trace:
Stack backtrace:
[00007ff7c554368a] __DTOR_LIST__
+ 0x7ff6853e3fc2 in Squeak.exe
[00007ff7c66442e8] ??? + 0x0 in (null)
Smalltalk stack dump:
0000009a09fce550 I CursorWithMask(Object)>isKindOf: 00007ff7c6cf0210: a(n) CursorWithMask
0000009a09fce598 I Cursor class>currentCursor: 00007ff7c66442e8: a(n) Cursor
0000009a09fce5e0 I CursorWithMask(Cursor)>show 00007ff7c6cf0210: a(n) CursorWithMask
0000009a09fce630 I SmalltalkImage>snapshot:andQuit:withExitCode:embedded: 00007ff7c6688088: a(n) SmalltalkImage
7ff7c8c516d8 s SmalltalkImage>snapshot:andQuit:embedded: 00007ff7c6688088: a(n) SmalltalkImage
7ff7c8c52f28 s SmalltalkImage>snapshot:andQuit: 00007ff7c6688088: a(n) SmalltalkImage
7ff7c8c53098 s [] in ReleaseBuilder class>saveAndQuit 00007ff7c6657cd8: a(n) ReleaseBuilder
7ff7c8c53228 s WorldState>runStepMethodsIn: 00007ff7c6915f38: a(n) WorldState
7ff7c8c53320 s PasteUpMorph>runStepMethods 00007ff7c6734988: a(n) PasteUpMorph
7ff7c8c53490 s WorldState>doOneCycleNowFor: 00007ff7c6915f38: a(n) WorldState
7ff7c8c53548 s WorldState>doOneCycleFor: 00007ff7c6915f38: a(n) WorldState
7ff7c8c53610 s PasteUpMorph>doOneCycle 00007ff7c6734988: a(n) PasteUpMorph
7ff7c8c536c8 s [] in MorphicProject>spawnNewProcess 00007ff7c6af0f80: a(n) MorphicProject
7ff7c8c53780 s [] in BlockClosure>newProcess 00007ff7c8c53838: a(n) BlockClosure
stack page bytes 4096 available headroom 1480 minimum unused headroom 0
What is going on? I cannot rebuild tag 202312181441 with Clang 18.1.2 to then let it run Squeak 5.3 #19486. Neither 32-bit nor 64-bit. Squeak 6.0 and 6.1alpha work. I know that tag 202312181441 used to work with Squeak 5.3 ... that's why I suspect Clang here...
The Smalltalk code after start-up that seems to trigger the crash is more or less isKindOf::
Cursor normal show.
Cursor >> show
"Make the hardware's mouse cursor look like the receiver"
Cursor currentCursor: self
Cursor class >> currentCursor: aCursor
"Make the instance of cursor, aCursor, be the current cursor. Display it.
Create an error if the argument is not a Cursor."
(aCursor isKindOf: self)
ifTrue: [ "..." ]
ifFalse: [self error: 'The new cursor must be an instance of class Cursor']
Between Squeak 5.3 and 6.0, that code path did not change. We are still setting the normal Cursor after start-up, which invokes currentCursor: and then that isKindOf: check.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Fails for both However: The debug build of Well, the obvious difference would be that we switched to the SistaV1 bytecode set in Squeak 6.0 but not in 5.3 (V3 w/ full closures). Hmm.... |
That reminds me #658 The crash is not necessarily related, but the fact that I got the memory leak only in some specific (old ?) image is a clue:
I gave some snippets in the link above, maybe we can try them... |
I tried the following with my OSVM clang18 fast build:
I then patched out (in 5.3) that cursor-related code from the in-image snapshot method using a debug build. I will then get a CRASH at a different place: crash-nocursor.dmp
That Here is another excerpt when I just use the keyboard shortcut to evaluate snapshot-and-quit:
I do not think that platform-specific cursor code is involved in this issue. Not that the debug build still does not crash. These recent tests involved the 64-bit So, I think/conclude that:
|
Plus: It will almost immediately crash when I start switching back to "V3 plus closures" again. But only in the fast build, the debug build is fine. |
The crash seems to happen during a message send. In my two examples, we had bytecode 224 and 208, which is according to the class comment of
The selector with 1 argument was |
I tried to decompile the binary to check the switch-case in In
In the switch-case there is an extra +256 encoded for SistaV1 bytecodes because that build supports both bytecode sets. I assume that the C compiler tries to re-arrange cases if similar for a better performance. Hmm.... maybe some mixup triggers for a V3 bytecode a SisteV1 case? So, maybe a V3 bytecode 208 triggers Sista's "Pop and Store Temporary Variable #0" instead of the send? I will now try to generate and compile a V3-plus-closures-only VM, which should work with Squeak 5.3. Let's see how clang18 treats those sources. |
Yes, a VM with only support for V3-plus-closures works with Squeak 5.3 using clang18. Hmm.... VMMaker class >> generateSqueakSpurStack64VM
"No primitives since we can use those from the Cog VM. Patched to not include SistaV1 bytecodes."
^VMMaker
generate: StackInterpreter
with: #(ObjectMemory Spur64BitMemoryManager
FailImbalancedPrimitives false
MULTIPLEBYTECODESETS false
bytecodeTableInitializer initializeBytecodeTableForSqueakV3PlusClosures)
to: self sourceTree, '/src/spur64.stack'
platformDir: self sourceTree, '/platforms'
including: #() How can we fix that clang18 switch-case optimization? And what is it actually? Do we cant to bisect with all clang nightly builds between clang 17.0.6 and clang 18.1.2? Uhh.... Is this really a Windows-only problem? That MSYS2/MinGW config I use? |
What if instead of doing a full debug build, you turn off compiler optimizations in the production build? Also: Does this only happen on windows builds? |
Well, |
If Do you have all warnings enabled? |
Yes,
And here grouped by file:
I parsed ^.*/(.*?/.*?[ch]):\d*:\d*: warning: .*\[(.*?)\] After filtering the two groups, putting the resulting lines into the clipboard, I used Squeak to aggregate: | warnings dict |
warnings := (Clipboard clipboardText lines collect: [:line | line asString findTokens]).
dict := Dictionary new.
warnings do: [:ea | (dict at: ea first ifAbsentPut: [Bag new]) add: ea second].
Clipboard clipboardText: (String streamContents: [:s |
dict keysSortedSafely do: [:key |
s crlf; nextPutAll: key.
(dict at: key) sortedCounts do: [:ea | s crlf; tab; nextPutAll: ea key asString; tab; nextPutAll: ea value]] ]). For the simpler, ungrouped aggregation of warnings, I used the following regular expression: warning: .*\[(.*?)\] And aggregated that single group of warnings in Squeak: | warnings |
warnings := (Clipboard clipboardText lines collect: #asString) asBag.
Clipboard clipboardText: (String streamContents: [:s |
warnings sortedCounts do: [:ea |
s crlf; nextPutAll: ea key asString; tab; nextPutAll: ea value]]). |
Here are the numbers for
The big switch-case statement (and the |
Only two warnings are in the
The fall-throughs that seem to be wrongly optimized look like this (note the overlap between V3 and SistaV1 bytecodes): ...
BREAK;
CASE(208)
CASE(209)
CASE(210)
CASE(211)
CASE(212)
CASE(213)
CASE(214)
CASE(215)
CASE(216)
CASE(217)
CASE(218)
CASE(219)
CASE(220)
CASE(221)
CASE(222)
CASE(223)
CASE(384) /*128*/
CASE(385) /*129*/
CASE(386) /*130*/
CASE(387) /*131*/
CASE(388) /*132*/
CASE(389) /*133*/
CASE(390) /*134*/
CASE(391) /*135*/
CASE(392) /*136*/
CASE(393) /*137*/
CASE(394) /*138*/
CASE(395) /*139*/
CASE(396) /*140*/
CASE(397) /*141*/
CASE(398) /*142*/
CASE(399) /*143*/
/* sendLiteralSelector0ArgsBytecode */
{
... We figured out that the macro behind |
This probably affects primarily Windows builds at this point because:
So, local builds on an up-to-date Ubuntu system could also be affected. Well, there might be a way to get more recent clang versions onto macOS outside Xcode. Not sure if this is intended. |
The llvm-dev folks really like https://godbolt.org to reproduce different behavior across compiler versions (e.g. https://groups.google.com/g/llvm-dev/c/_r0D3AkyXFg) Probably that means we would have to extract the function into a standalone test case |
On Wed, Apr 10, 2024 at 2:12 AM Marcel Taeumel ***@***.***> wrote:
Only two warnings are in the interpret() function (lines 2493 to 12984):
../../../src/spur64.stack/gcc3x-interp.c:5623:12: warning: variable 'rcvr' set but not used [-Wunused-but-set-variable]
../../../src/spur64.stack/gcc3x-interp.c:5804:68: warning: left operand of comma operator has no effect [-Wunused-value]
The fall-throughs that seem to be wrongly optimized look like this (note
the overlap between V3 and SistaV1 bytecodes):
...
BREAK;
CASE(208)
CASE(209)
CASE(210)
CASE(211)
CASE(212)
CASE(213)
CASE(214)
CASE(215)
CASE(216)
CASE(217)
CASE(218)
CASE(219)
CASE(220)
CASE(221)
CASE(222)
CASE(223)
CASE(384) /*128*/
CASE(385) /*129*/
CASE(386) /*130*/
CASE(387) /*131*/
CASE(388) /*132*/
CASE(389) /*133*/
CASE(390) /*134*/
CASE(391) /*135*/
CASE(392) /*136*/
CASE(393) /*137*/
CASE(394) /*138*/
CASE(395) /*139*/
CASE(396) /*140*/
CASE(397) /*141*/
CASE(398) /*142*/
CASE(399) /*143*/
/* sendLiteralSelector0ArgsBytecode */
{
...
We figured out that the macro behind CASE is unrelated to the issue. Just
using -O0 will fix it. -O1 and -O2 corrupt these cases somehow...
The thing to do is find out which version of the compiler introduces the
bug and
a) compile with the version before, and
b) report the bug to the compiler maintainers
_,,,^..^,,,_
best, Eliot
|
So, we had a new OSVM release 2023.12, tag 202312181441. Our CI built that using
Clang 17.0.6
. We bundled the binary with Squeak 5.3 #19486. Things worked.Now, I noticed that rebuilding tag 202312181441 with the current
Clang 18.1.2
produces binaries that will immediately crash Squeak 5.3 #19486. Squeak 6.0 works and so does 6.1alpha.crash.dmp
What is going on? I cannot rebuild tag 202312181441 with
Clang 18.1.2
to then let it run Squeak 5.3 #19486. Neither 32-bit nor 64-bit. Squeak 6.0 and 6.1alpha work. I know that tag 202312181441 used to work with Squeak 5.3 ... that's why I suspect Clang here...The Smalltalk code after start-up that seems to trigger the crash is more or less
isKindOf:
:Between Squeak 5.3 and 6.0, that code path did not change. We are still setting the normal Cursor after start-up, which invokes
currentCursor:
and then thatisKindOf:
check.The text was updated successfully, but these errors were encountered: