Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux VM compiled from git commit 0d7eba4a or later fails on fetching updates from source.squeak.org #696

Open
dtlewis290 opened this issue Nov 26, 2024 · 9 comments

Comments

@dtlewis290
Copy link
Contributor

Last good commit: 1af9a9b (HEAD) CogVM source as per VMMaker.oscog-eem.3424
First bad commit: 0d7eba4 CogVM source as per VMMaker.oscog-eem.3444

Steps to reproduce:

  • Compile Linux VM from 0d7eba4 or later
  • Run Squeak trunk (updated to latest with Monticello-dtl.813), fetch updates from any repository
  • Result: ConnectionClosed: Connection closed while waiting for data.

Note: There are no intervening commits between 1af9a9b and 0d7eba4, so the issue is presumed related to VMMaker changes rather than platform code changes.

@eliotmiranda
Copy link
Contributor

eliotmiranda commented Nov 27, 2024 via email

@dtlewis290
Copy link
Contributor Author

Hi Eliot, my locally compiled VM from latest git pull does still have the issue, version info is:

Virtual Machine

/usr/local/lib/squeak/5.0-202411252058-64bit/squeak
Open Smalltalk Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3471]
Unix built on Nov 27 2024 08:25:11 Compiler: 11.4.0
platform sources revision VM: 202411252058 lewis@pop-os:squeak/git/opensmalltalk-vm Date: Mon Nov 25 12:58:18 2024 CommitHash: 108c8d3 Plugins: 202411252058 lewis@pop-os:squeak/git/opensmalltalk-vm
CoInterpreter VMMaker.oscog-eem.3471 uuid: c3abaac1-cda5-44ec-81c1-55154ab54aef Nov 27 2024
StackToRegisterMappingCogit VMMaker.oscog-eem.3470 uuid: 5eca4261-1c46-4eb4-bd5f-803847c2ab7f Nov 27 2024

Image

/home/lewis/squeak/Squeak6.0/squeak.13.image
Squeak6.1alpha
latest update: #23177
Current Change Set: trunk
Image format 68533 (64 bit)
Preferred bytecode set: SistaV1

@dtlewis290
Copy link
Contributor Author

Here is a summary of additional test results:

Symptoms: SocketStream test has many failures and timeouts. Socket tests has failures and also crashed the VM. Opening a repository on source.squeak.org fails. Updating Squeak from the update stream fails.

The issue is apparently related to both compiler and Slang code generation. With a compiler that exposes the problem, the issue appears first in commit 0d7eba4 "CogVM source as per VMMaker.oscog-eem.3444," and the symptoms do not appear to change in any later commits. The last good commit prior to that was 1af9a9b (HEAD) "CogVM source as per VMMaker.oscog-eem.3424", and the differences between these appear to be primarily related to Slang code generation.

I retested this on a much older Linux computer (thankfully rescued just in time from the recycle bin), and the issue does NOT appear there. I also have confirmation from Bruce O'Neel that he has been doing opensmalltalk-vm builds on Linux and has not seen any of the issues reported here.

Finally, I tried changing the gcc optimization level from -O2 to -O0, and this makes the problem go away.

The system I am using has an AMD processor and the following version information:

$ cat /proc/version
Linux version 6.9.3-76060903-generic ([email protected]) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu122.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #202405300957173214176822.04f2697e1 SMP PREEMPT_DYNAMIC Wed N

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ spur64 -version
5.0-202412062008 Fri Dec 6 18:49:31 EST 2024 gcc 11 [Production Spur 64-bit x86_64 VM]
CoInterpreter VMMaker.oscog-eem.3471 uuid: c3abaac1-cda5-44ec-81c1-55154ab54aef Dec 6 2024
StackToRegisterMappingCogit VMMaker.oscog-eem.3470 uuid: 5eca4261-1c46-4eb4-bd5f-803847c2ab7f Dec 6 2024
VM: 202412062008 lewis@pop-os:squeak/git/opensmalltalk-vm
Date: Fri Dec 6 17:08:41 2024 CommitHash: 2fc2d0c
Plugins: 202412062008 lewis@pop-os:squeak/git/opensmalltalk-vm
Linux pop-os 6.9.3-76060903-generic #202405300957173214176822.04~f2697e1 SMP PREEMPT_DYNAMIC Wed N x86_64 x86_64 x86_64 GNU/Linux
plugin path: /usr/local/bin/../lib/squeak/5.0-202412062008-64bit [default: /usr/local/lib/squeak/5.0-202412062008-64bit/]

More to follow, with the above information I hope be able to track something down in gcc.

@nicolas-cellier-aka-nice
Copy link
Contributor

I compiled the Cog HEAD revision (squeak.cog.spur) on a legacy MacOS 12.7.6 and got the same behavior, impossible to connect thru SSL.

If compiler optimization level makes a difference, then it's most probably a sign that the generated code invoke UB.

@dtlewis290
Copy link
Contributor Author

Here is how to show the relevant diffs, which are all in the generated C code from VMMaker.oscog-eem.3424 through VMMaker.oscog-eem.3444. It's a lot of code to look at but maybe some more eyeballs will help.

$ git diff 1af9a9b 0d7eba4

@dtlewis290
Copy link
Contributor Author

Recognizing that the issue is apparently related to C undefined behavior, and also associated with CCodeGenerator code generation changes, I used a VMMaker image to generate the code for VMMaker versions from VMMaker.oscog-eem.3424 through VMMaker.oscog-eem.3444.

I can confirm that the issue is introduced in VMMaker.oscog-eem.3444. Source generated from VMMaker.oscog-eem.3443 (into ./src/spur64.cog/ ) does not exhibit the issue, and code generated from VMMaker.oscog-eem.3444 exhibits the issue (in both cases on my system with -O2 compiler optimization).

So the issue is introduced in VMMaker.oscog-eem.3444, 23-Aug-2024 "Rewrite the Slang transpiler's parse tree and inliner".

This is a large VMMaker commit so we are still looking for a needle in a haystack, but I think the haystack may be a bit smaller now. I note that Eliot specifically asked for review and criticism in that commit, so please consider this as a much belated review :-)

@dtlewis290
Copy link
Contributor Author

@nicolas-cellier-aka-nice can you please say what compiler (and version level of compiler) you have on your legacy MacOS 12.7.6? I am not familiar with the Mac environment, but a compiler bug is not out of the question. I have gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on my system, and the bug is present when compiling with -O1 or higher with generated VM sources from VMMaker.oscog-eem.3444 and above. But other compilers (including those used for our GitHub actions builds) do not show any problem at all.

@dtlewis290
Copy link
Contributor Author

Addition information, working with a (possibly bad?) gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 compiler:

The issue is introduced in VMMaker.oscog-eem.3444. Later fixes in VMM have no effect on the observed symptoms. The problem goes away with gcc optimization turned off ( -O0). The problem is present in both the stack VM and the Cog VM.

The issue is only in the main VM module lib/squeak/5.0-202408232148-64bit/squeak, as opposed to the plugins and VM modules. I confirmed this by compiling with almost all plugins external, and copying individual compiled files into the last known good build in lib/squeak/5.0-202407312233-64bit/. No other files (including SocketPlugin.so) cause a problem, only the main VM module is at issue.

The VMMaker.oscog-eem.3444 generated sources (in commit 0d7eba4) produce over 90 additional compiler warnings in the ./vm build, mainly associated with function pointer assignments. After hand editing the generated source files to address the warnings, the problem still exists, so I see no evidence that these warnings are pointing to C undefined behavior issues.

@nicolas-cellier-aka-nice
Copy link
Contributor

nicolas-cellier-aka-nice commented Dec 16, 2024

I used native makefile for mac OS which I think rely on CC=clang as defined in ./building/macos64x64/common/Makefile.rules

% clang --version     
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

For me, the problem disappeared with commit cfd1161 based on VMMaker.oscog-eem.3475.

Since potentially each and every operation on signed integer is subject to undefined behavior (or almost every), the C compiler won't warn you about it, but for the most suspicious cases.

A possibility is to instrument the generated code to detect UB at run time, at least with clang
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html. The option -fsanitize=undefined seem to exist in gcc too if you wanna try... You should use it with no optimization to be sure (because optimization may remove some UB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants