Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: unexpected bindingNone in '_go.buildid' #42082

Closed
linzhp opened this issue Oct 20, 2020 · 35 comments
Closed

cmd/link: unexpected bindingNone in '_go.buildid' #42082

linzhp opened this issue Oct 20, 2020 · 35 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@linzhp
Copy link
Contributor

linzhp commented Oct 20, 2020

After upgrading to Go 1.15, the linker fails on some packages with unexpected bindingNone in '_go.buildid'. This only happens on Darwin but not Linux. We haven't been able to construct the small example to reproduce the issue.

The issues seems to related to the size of the linked binary. For example, a test binary that fails to link in Go 1.15 is 330MB when it's linked in Go 1.14.

Similar errors:

What version of Go are you using (go version)?

$ go version
go version go1.15 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/zplin/Library/Caches/go-build"
GOENV="/Users/zplin/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/zplin/gocode/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/zplin/gocode"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.15/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.15/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/bf/3ympgpy92txgknkb4z30dldh0000gn/T/go-build303752340=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Building some packages with either go build or bazel build

What did you expect to see?

Build succeeds

What did you see instead?

Linker fails with the following error:

final section layout:
    __TEXT/__text addr=0x04001E00, size=0x092B5F84, fileOffset=0x00001E00, type=1
    __TEXT/__stubs addr=0x0D2B7D84, size=0x00000534, fileOffset=0x092B7D84, type=28
    __TEXT/__stub_helper addr=0x0D2B82B8, size=0x000008BC, fileOffset=0x092B82B8, type=32
    __TEXT/__rodata addr=0x0D2B8B80, size=0x02DE98D8, fileOffset=0x092B8B80, type=0
    __TEXT/__typelink addr=0x100A2460, size=0x00054884, fileOffset=0x0C0A2460, type=0
    __TEXT/__itablink addr=0x100F6CE8, size=0x0002F820, fileOffset=0x0C0F6CE8, type=0
    __TEXT/__gosymtab addr=0x10126508, size=0x00000000, fileOffset=0x0C126508, type=0
    __TEXT/__gopclntab addr=0x10126520, size=0x06EB5A95, fileOffset=0x0C126520, type=0
    __TEXT/__cstring addr=0x16FDBFC0, size=0x000074AC, fileOffset=0x12FDBFC0, type=13
    __TEXT/__const addr=0x16FE3470, size=0x000052A0, fileOffset=0x12FE3470, type=0
    __TEXT/text_env addr=0x16FE8710, size=0x00003340, fileOffset=0x12FE8710, type=0
    __TEXT/__unwind_info addr=0x16FEBA50, size=0x000015A0, fileOffset=0x12FEBA50, type=22
    __DATA/__nl_symbol_ptr addr=0x16FED000, size=0x00000008, fileOffset=0x12FED000, type=29
    __DATA/__got addr=0x16FED008, size=0x00000038, fileOffset=0x12FED008, type=29
    __DATA/__la_symbol_ptr addr=0x16FED040, size=0x000006F0, fileOffset=0x12FED040, type=27
    __DATA/__const addr=0x16FED730, size=0x000026A0, fileOffset=0x12FED730, type=0
    __DATA/__go_buildinfo addr=0x16FEFDD0, size=0x00000020, fileOffset=0x12FEFDD0, type=0
    __DATA/__noptrdata addr=0x16FEFE00, size=0x00092A60, fileOffset=0x12FEFE00, type=0
    __DATA/__data addr=0x17082860, size=0x00036068, fileOffset=0x13082860, type=0
    __DATA/__bss addr=0x170B88E0, size=0x000695F0, fileOffset=0x00000000, type=25
    __DATA/__noptrbss addr=0x17121EE0, size=0x00022080, fileOffset=0x00000000, type=25
    __DATA/__common addr=0x17143F60, size=0x00000010, fileOffset=0x00000000, type=25
ld: unexpected bindingNone in '_go.buildid' from /var/folders/6w/bx57vndx04l6w11tf912xzyc0000gn/T/go-link-018988157/go.o for architecture x86_64
@thanm
Copy link
Contributor

thanm commented Oct 20, 2020

Thanks for the report.

It would be helpful to include some details on your Darwin system, e.g. OS version, Xcode version, and the output of "clang --version".

@linzhp
Copy link
Contributor Author

linzhp commented Oct 20, 2020

This is consistently failing on dev laptops with macOS 10.15.7 in Uber.

$ clang --version
Apple clang version 11.0.3 (clang-1103.0.32.62)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

The detail link command printed out from Bazel rules_go contains some information about the environment:

    ld: unexpected bindingNone in '_go.buildid' from /var/folders/lz/ks5vv2xs5s139vsqzyyyfjww0000gn/T/go-link-559035928/go.o for architecture x86_64
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    
    link: error running the following subcommand: exit status 2
    PATH=external/local_config_cc:/bin:/usr/bin \
    SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk \
    CGO_ENABLED=1 \
    GOARCH=amd64 \
    DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer \
    GOPATH= \
    TMPDIR=/var/folders/lz/ks5vv2xs5s139vsqzyyyfjww0000gn/T/ \
    APPLE_SDK_PLATFORM=MacOSX \
    GOROOT_FINAL=GOROOT \
    XCODE_VERSION_OVERRIDE=12.0.0.12A7209 \
    GOOS=darwin \
    GOROOT=external/go_sdk \
    APPLE_SDK_VERSION_OVERRIDE=10.15 \
    __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0 \
    external/go_sdk/pkg/tool/darwin_amd64/link -importcfg bazel-out/darwin-fastbuild/bin/<some internal path>/go_default_test_/importcfg572378406 -X <some internal path>/go/version.git.BuildHash={STABLE_GIT_COMMIT} -X <some internal path>/go/version.git.buildHost={BUILD_HOST} -X <some internal path>/go/version.git.buildUser={BUILD_USER} -X <some internal path>/go/version.git.buildUnixSeconds={BUILD_TIMESTAMP} -o bazel-out/darwin-fastbuild/bin/<some internal path>/go_default_test_/go_default_test -extld external/local_config_cc/cc_wrapper.sh -s -w -buildid=redacted -extldflags "-fobjc-link-runtime -headerpad_max_install_names -no-canonical-prefixes -mmacosx-version-min=10.15" /private/var/tmp/_bazel_bxun/1f85203e1283ccb97a0066d2da6be00b/sandbox/darwin-sandbox/5843/execroot/__main__/bazel-out/darwin-fastbuild/bin/<some internal path>/go_default_test.a

@thanm
Copy link
Contributor

thanm commented Oct 20, 2020

@cherrymui

@thanm
Copy link
Contributor

thanm commented Oct 20, 2020

Thanks. Not sure right off the bat what might be causing this, but if there is any way you can create a shareable reproducer, that would be very helpful. Or possibly share the intermediates being created by the Go linker, if that is an option (via "mkdir /tmp/xxx ; go build ... -ldflags=-tmpdir=/tmp/xxx).

@cagedmantis cagedmantis changed the title unexpected bindingNone in '_go.buildid' cmd/link: unexpected bindingNone in '_go.buildid' Oct 20, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 20, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Oct 20, 2020
@r-hang
Copy link

r-hang commented Oct 28, 2020

Unfortunately, we're not able currently able to publish our existing repro case. We are working on creating a repo case from open source components! I'll look into sharing the intermediates generated by the linker.

@mh-park
Copy link

mh-park commented Oct 28, 2020

I've been digging around in the linker intermediates of a different project with the same error: the only place with a reference to _go.buildid is towards the end of the go.o file:

... map[uint32]*<internal path>/vendor/github.com/uber/tchannel-go.Connection }).Unlock^@_go.buildid^@_go.func.*^@_go.itab.*archive/tar.Reader,io.Reader^@ ...

I don't know if that's relevant, since I don't have much understanding of how the linker works, but the _go.buildid does seem out of place.

@r-hang
Copy link

r-hang commented Oct 30, 2020

@thanm, if we can get approval on our side. Is there a process by which we can share the linker intermediates through a private channel?

@thanm
Copy link
Contributor

thanm commented Oct 30, 2020

Is there a process by which we can share the linker intermediates through a private channel?

That would be fine with me, although as far as I know we don't have an official policy or mechanism for such side channel communications. If you'd like to set something up, please email me (userid at google.com is same as github name).

@thanm
Copy link
Contributor

thanm commented Oct 30, 2020

Another thing that would help would be if you could run the failing link by hand and then inspect the intermediates with "gobjdump".

What I'm looking for is more info about the offending symbol, "_go.buildid". This is a text symbol that is emitted by the Go linker when linking on non-ELF systems. It is placed at the very start of the text segment (offset 0), and will contain a payload built from the value of the -buildid option passed to the linker (normally a hash like -buildid=gxOfactYKvIT8obo4Tx9/BBqcYSuatFUb4E_Y6MER/q64SD6zppB19Re5mq7xb/gxOfactYKvIT8obo4Tx9).

The "_go.buildid" symbol should not have any relocations itself, nor should there be any relocations that point to it as a target symbol.

Here's an example of how to dump the required info.

$ mkdir /tmp/mydir
$ go build -o /tmp/myprogram -ldflags="-tmpdir=/tmp/mydir -v"    ... remainder of 'go build' args ...
$ cd /tmp/mydir
$ ls
000000.o	000003.o	000006.o	000009.o	go.dwarf
000001.o	000004.o	000007.o	000010.o	go.o
000002.o	000005.o	000008.o	a.out		trivial.c
...
$ gobjdump -tr go.o
0000000000000000 l       0e SECT   01 0000 [.text] _go.buildid
$

The failure message from the linker refers to bindingNone, which (based on my very limited understanding of the Darwin linker) has to do with relocations -- so what I am trying to determine is if somehow there are relocations on or against the symbol (which seems impossible, but I don't have any other theories at the moment).

@thanm
Copy link
Contributor

thanm commented Oct 30, 2020

Sorry, that should read

$ gobjdump -tr go.o | fgrep go.buildid
0000000000000000 l       0e SECT   01 0000 [.text] _go.buildid
$

in the recipe I sent.

@mh-park
Copy link

mh-park commented Nov 5, 2020

What would a problematic or unexpected relocation look like?

$ gobjdump -tr go.o | fgrep go.buildid
0000000000000000 l       0e SECT   01 0000 [.text] _go.buildid

This was the output for the linker intermediates, which is the same as the example you've given above; I'm guessing that that's the expected output.

@thanm
Copy link
Contributor

thanm commented Nov 5, 2020

Yes, that's the expected "normal" output. So at this point I'm not sure what the problem might be-- more info needed.

I think we're back where we started then -- to make progress I think we need a smaller reproducer or some way to inspect the linker intermediates.

@robbertvanginkel
Copy link

We're still looking into sharing artifacts. Meanwhile although I haven't found a reproducible case yet, I did find some interesting bits while investigating a bit further using zld (as its the only easily compilable macho linker I know https://github.com/michaeleisel/zld).

It seems like the "unexpected bindingNone" exception is actually being thrown while throwing another exception:

#0	0x0000000100296a4c in ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) at zld-1.2.1/ld/src/ld/OutputFile.cpp:537
#1	0x00000001002979f9 in ld::tool::OutputFile::rangeCheckRIP32(long long, ld::Internal&, ld::Atom const*, ld::Fixup const*) at zld-1.2.1/ld/src/ld/OutputFile.cpp:742
#2	0x00000001002990d9 in ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) at zld-1.2.1/ld/src/ld/OutputFile.cpp:1530
#3	0x00000001002a51de in ___ZN2ld4tool10OutputFile10writeAtomsERNS_8InternalEPh_block_invoke at zld-1.2.1/ld/src/ld/OutputFile.cpp:3046
...

Patching up the code to throw the original exception results in something like:

32-bit RIP relative reference out of range (2191161195 max is +/-2GB): from _go.buildid (0x04011F20) to <addressOf function that throws "unexpected bindingNone">

We have different projects runniing into this issue: one using go build and one using bazel/rules_go. Using ObjectDump (from the zld repo), it looks like there might be some unexpected fixups generated:

$ ObjectDump -only _go.buildid bazel-fail/go.o
name:     _go.buildid
size:     0x20
align:    0 mod 32
scope:    translation-unit
def:      regular
combine:  never
symbol:   in
attrs:    dont-dead-strip 
section:  __TEXT,__text
fixups:
    0x0000 direct(_internal/cpu.options) + 0x6F4720FF, then store as x86 32-bit pcrel
    0x0000 followed by direct(_internal/cpu.Initialize)

$ ObjectDump -only _go.buildid gobuild-fail/go.o
name:     _go.buildid
size:     0x80
align:    0 mod 32
scope:    translation-unit
def:      regular
combine:  never
symbol:   in
attrs:    dont-dead-strip 
section:  __TEXT,__text
fixups:
    0x0000 direct(_internal/cpu.options) + 0x6F4720FF, then store as x86 32-bit pcrel
    0x0000 followed by direct(_internal/cpu.Initialize)

$ ObjectDump -only _go.buildid gobuild-pass/go.o
name:     _go.buildid
size:     0x80
align:    0 mod 32
scope:    translation-unit
def:      regular
combine:  never
symbol:   in
attrs:    dont-dead-strip 
section:  __TEXT,__text
fixups:
    0x0000 followed by direct(_internal/cpu.Initialize)

(for referencee gobuild-pass is a simple cgo project taken from cgo docs, see https://gist.github.com/robbertvanginkel/05445fb48537f3b3b1ad8c51630685c4)

I don't think those showed up for gobjdump, but with otool -r I think they're visible:

$ gobjdump -tr gobuild-fail/go.o | grep go.buildid 
0000000000000000 l       0e SECT   01 0000 [.text] _go.buildid
$ otool -r gobuild-fail/go.o | grep -A4 __text
Relocation information (__TEXT,__text) 1292512 entries
address  pcrel length extern type    scattered symbolnum/value
00000000 1     2      1      1       0         500856
00000288 1     2      1      1       0         500856
0000028f 1     2      1      1       0         500856
$ gobjdump -tr bazel-fail/go.o | grep go.buildid
0000000000000000 l       0e SECT   01 0000 [.text] _go.buildid
$ otool -r bazel-fail/go.o | grep -A4 __text
Relocation information (__TEXT,__text) 1143410 entries
address  pcrel length extern type    scattered symbolnum/value
00000000 1     2      1      1       0         118842
00000228 1     2      1      1       0         118842
0000022f 1     2      1      1       0         118842
$ gobjdump -tr gobuild-pass/go.o | grep go.buildid
0000000000000000 l       0e SECT   01 0000 [.text] _go.buildid
$ otool -r gobuild-pass/go.o | grep -A4 __text
Relocation information (__TEXT,__text) 7450 entries
address  pcrel length extern type    scattered symbolnum/value
0000026d 1     2      1      1       0         433
00000288 1     2      1      1       0         433
0000028f 1     2      1      1       0         433
Some extra disassembly details that might be useful.

I was initially looking at the contents of the object files, which all seemed to be the same when using `hexdump` or `objdump --disassemble=`

$ hexdump -s 4096 -C gobuild-fail/go.o | head -n 1
00001000  ff 20 47 6f 20 62 75 69  6c 64 20 49 44 3a 20 22  |. Go build ID: "|
$ hexdump -s 4096 -C bazel-fail/go.o | head -n 1
00001000  ff 20 47 6f 20 62 75 69  6c 64 20 49 44 3a 20 22  |. Go build ID: "|
$ hexdump -s 4096 -C gobuild-pass/go.o | head -n 1
00001000  ff 20 47 6f 20 62 75 69  6c 64 20 49 44 3a 20 22  |. Go build ID: "|

However, using a more advanced disassembler tool (hopper) they showed somewhat different:

/*
--------------------------------------------------------------------------------

        File: gobuild-fail/go.o
        File created with Hopper 4.6.2-demo
        Analysis version 58
        MachO file
        CPU: intel/x86_64
        64 bits addresses (Little Endian)

--------------------------------------------------------------------------------
*/



        ; Segment
        ; Range: [0x0; 0x1346a000[ (323395584 bytes)
        ; File offset : [4096; 322813952[ (322809856 bytes)
        ; Permissions:  - 



        ; Section __text
        ; Range: [0x0; 0x93bd374[ (154915700 bytes)
        ; File offset : [4096; 154919796[ (154915700 bytes)
        ; Flags: 0x80000400
        ;   S_REGULAR
        ;   S_ATTR_PURE_INSTRUCTIONS
        ;   S_ATTR_SOME_INSTRUCTIONS

                     _go.buildid:        // 
0000000000000000         db  0xeb ; '.'
0000000000000001         db  0x31 ; '1'
0000000000000002         db  0x86 ; '.'
0000000000000003         db  0x82 ; '.'
0000000000000004         db  0x20 ; ' '
0000000000000005         db  0x62 ; 'b'
0000000000000006         db  0x75 ; 'u'
0000000000000007         db  0x69 ; 'i'
0000000000000008         db  0x6c ; 'l'
0000000000000009         db  0x64 ; 'd'
000000000000000a         db  0x20 ; ' '
000000000000000b         db  0x49 ; 'I'
000000000000000c         db  0x44 ; 'D'
000000000000000d         db  0x3a ; ':'

/*
--------------------------------------------------------------------------------

        File: gobuild-pass/go.o
        File created with Hopper 4.6.2-demo
        Analysis version 58
        MachO file
        CPU: intel/x86_64
        64 bits addresses (Little Endian)

--------------------------------------------------------------------------------
*/



        ; Segment
        ; Range: [0x0; 0x19f000[ (1699840 bytes)
        ; File offset : [4096; 1499136[ (1495040 bytes)
        ; Permissions:  - 



        ; Section __text
        ; Range: [0x0; 0xa65aa[ (681386 bytes)
        ; File offset : [4096; 685482[ (681386 bytes)
        ; Flags: 0x80000400
        ;   S_REGULAR
        ;   S_ATTR_PURE_INSTRUCTIONS
        ;   S_ATTR_SOME_INSTRUCTIONS

                     _go.buildid:        // 
0000000000000000         db  0xff ; '.'
0000000000000001         db  0x20 ; ' '
0000000000000002         db  0x47 ; 'G'
0000000000000003         db  0x6f ; 'o'
0000000000000004         db  0x20 ; ' '
0000000000000005         db  0x62 ; 'b'
0000000000000006         db  0x75 ; 'u'
0000000000000007         db  0x69 ; 'i'
0000000000000008         db  0x6c ; 'l'
0000000000000009         db  0x64 ; 'd'
000000000000000a         db  0x20 ; ' '
000000000000000b         db  0x49 ; 'I'
000000000000000c         db  0x44 ; 'D'
000000000000000d         db  0x3a ; ':'
000000000000000e         db  0x20 ; ' '
000000000000000f         db  0x22 ; '"'

/*
--------------------------------------------------------------------------------

        File: bazel-fail/go.o
        File created with Hopper 4.6.2-demo
        Analysis version 58
        MachO file
        CPU: intel/x86_64
        64 bits addresses (Little Endian)

--------------------------------------------------------------------------------
*/



        ; Segment
        ; Range: [0x0; 0xf4be000[ (256630784 bytes)
        ; File offset : [4096; 256217088[ (256212992 bytes)
        ; Permissions:  - 



        ; Section __text
        ; Range: [0x0; 0x7cb0fdd[ (130748381 bytes)
        ; File offset : [4096; 130752477[ (130748381 bytes)
        ; Flags: 0x80000400
        ;   S_REGULAR
        ;   S_ATTR_PURE_INSTRUCTIONS
        ;   S_ATTR_SOME_INSTRUCTIONS

                     _go.buildid:        // 
0000000000000000         db  0x4b ; 'K'
0000000000000001         db  0x61 ; 'a'
0000000000000002         db  0x8d ; '.'
0000000000000003         db  0x7e ; '~'
0000000000000004         db  0x20 ; ' '
0000000000000005         db  0x62 ; 'b'
0000000000000006         db  0x75 ; 'u'
0000000000000007         db  0x69 ; 'i'
0000000000000008         db  0x6c ; 'l'
0000000000000009         db  0x64 ; 'd'
000000000000000a         db  0x20 ; ' '
000000000000000b         db  0x49 ; 'I'
000000000000000c         db  0x44 ; 'D'
000000000000000d         db  0x3a ; ':'
000000000000000e         db  0x20 ; ' '
000000000000000f         db  0x22 ; '"'
0000000000000010         db  0x72 ; 'r'
0000000000000011         db  0x65 ; 'e'
0000000000000012         db  0x64 ; 'd'
0000000000000013         db  0x61 ; 'a'
0000000000000014         db  0x63 ; 'c'
0000000000000015         db  0x74 ; 't'
0000000000000016         db  0x65 ; 'e'
0000000000000017         db  0x64 ; 'd'

I suppose that's due to resolving the fixups.

Hopefully this can help in narrowing it down.

@thanm
Copy link
Contributor

thanm commented Nov 16, 2020

Thanks. Looking at the ObjectDump and "otool -r" output is helpful, it does indeed confirm that there is a (bogus) relocation on _go.buildid; this should never happen. This points to a bug in the Go linker, but without a debuggable/buildable test case, I'm not sure how to go about locating the bug.

One other thing that would be helpful would be if you could try your testcase using a tip version of Go (e.g. download and build Go source from https://go.googlesource.com/go). There have been a number of linker changes between Go 1.15 and the current trunk (Go 1.16 candidate). It's possible that the 1.16/trunk linker might fail in more descriptive or helpful way. Thanks.

@robbertvanginkel
Copy link

Using a recent commit (d70a33a40b), the go build project build succesfully! Unfortunately I wasn't able to verify against the bazel project, it seems like there might be some incompatibilities between trunk & rules_go. Although this is good news, the go1.16 release is a bit far out for us to wait for.

I was considering a bisect, but a quick look through the history shows that the linker work was mostly done on a branch that was merged into master as of 52fe92f. Compiling this project at that commit and its parent on master, it fails at 2bfa45c but passes at 52fe92f.

Would bisecting the dev.link branch be a possible approach to narrow this down further?

@thanm
Copy link
Contributor

thanm commented Nov 16, 2020

@robbertvanginkel yes, bisecting on the dev.link branch is a viable option in this case. Current dev.link is a bit out of sync with trunk at this point, but in this case it should give you what you need.

My guess at this point (based on the symptoms) is that this was a problem with over-aggressive unreachable method pruning; there was a fair amount of activity related to this on the dev.link branch between 1.14 and 1.15.

@robbertvanginkel
Copy link

I ran a bisect yesterday between e92be18 and 89f687d, which points to 59a702a as the first commit in which it is does not fail anymore between 1.15-trunk.

I'll try to run another bisect between 1.14-1.15 to find the first commit in which it starts failing.

@thanm
Copy link
Contributor

thanm commented Nov 17, 2020

Hmm, 59a702a is a bit surprising as a candidate. That commit shouldn't actually have any change on what the linker emits, it's just adding in some additional parallelism in the way that Macho relocations are emitted.

Maybe the 1.14-1.15 bisect might provide some more insight.

@robbertvanginkel
Copy link

Tried another bisect between e77c99c and 1667b35, which seems to suggest 7466cad is when this started.

Bisect logs for reference

git bisect start
# bad: [1667b35740bd6974082cba6b48b4ea1881e29088] [dev.link] cmd/link: directly use loader.ExtReloc in ELF relocation generation
git bisect bad 1667b35740bd6974082cba6b48b4ea1881e29088
# good: [e77c99ce4c377a0ea68a3c101ac143e9ae29841b] [dev.link] cmd/link: remove some globals from symtab.go
git bisect good e77c99ce4c377a0ea68a3c101ac143e9ae29841b
# bad: [adea6a90e361629d20a68400c0c5cdcdfcdf087e] [dev.link] cmd/link/internal/loader: fix buglet in section handling
git bisect bad adea6a90e361629d20a68400c0c5cdcdfcdf087e
# good: [ed5233166fd75541d9d2464e1b165079ee948a53] cmd/compile: simplify slicebytes
git bisect good ed5233166fd75541d9d2464e1b165079ee948a53
# good: [38c2c12bc1b3da40e1b33cac9268b7df9fa49a7e] runtime/pprof: plumb labels for goroutine profiles
git bisect good 38c2c12bc1b3da40e1b33cac9268b7df9fa49a7e
# good: [245a2f5780ebc956a84964c25804b50f27c2d984] [dev.link] cmd/link: delete ctxt.Reachparent
git bisect good 245a2f5780ebc956a84964c25804b50f27c2d984
# good: [9f4dd09bf555632a39a01a4c171e713acb55fda9] cmd/compile: refactor variadac call desugaring
git bisect good 9f4dd09bf555632a39a01a4c171e713acb55fda9
# bad: [dee3e3aebd1c26de237f44138406c51c6a162058] [dev.link] cmd/link: clean up some tests
git bisect bad dee3e3aebd1c26de237f44138406c51c6a162058
# good: [45bd3b1bc4aa36ef313899fa372c23d850380b12] [dev.link] cmd/link: create loader-specific version of GCProg
git bisect good 45bd3b1bc4aa36ef313899fa372c23d850380b12
# good: [00723603eb1e183e010371fc5aa76a3d8efda8d1] [dev.link] cmd/link/internal/loader: fix AttrSubSymbol
git bisect good 00723603eb1e183e010371fc5aa76a3d8efda8d1
# bad: [7466cad9c4f9a08133bfb9b3c99c70b4897eed0d] [dev.link] cmd/link: only allow heap area to grow to 10MB
git bisect bad 7466cad9c4f9a08133bfb9b3c99c70b4897eed0d
# first bad commit: [7466cad9c4f9a08133bfb9b3c99c70b4897eed0d] [dev.link] cmd/link: only allow heap area to grow to 10MB

git bisect log for #42082 (comment). Note how good/bad have reversed meaning here as I was searching from broken -> fixed.

# bad: [89f687d6dbc11613f715d1644b4983905293dd33] Merge "cmd/link: merge branch 'dev.link' into master"
# good: [e92be18fd8b525b642ca25bdb3e2056b35d9d73c] runtime: fix typo in FuncForPC doc
git bisect start 'HEAD' 'go1.15beta1'
# bad: [0941fc3f9ff43598d25fa6e964e7829a268102bf] runtime: reduce syscall when call runtime.clone
git bisect bad 0941fc3f9ff43598d25fa6e964e7829a268102bf
# bad: [86f53c2a3c08c416fe62e83db1d1a666b3da5f21] [dev.link] all: merge branch 'master' into dev.link
git bisect bad 86f53c2a3c08c416fe62e83db1d1a666b3da5f21
# good: [574dac9d9707ddd35d57aaea646710dfae67bd89] doc/go1.15: fix TODO about -buildmode=pie
git bisect good 574dac9d9707ddd35d57aaea646710dfae67bd89
# good: [c551318046115104ee4edddf2c5b0e459711bbb2] [dev.link] cmd/link: move macho asmb2 support to generic functions
git bisect good c551318046115104ee4edddf2c5b0e459711bbb2
# good: [d1a186d29ce9d917dda7c66cfaee7788f88e7b9e] [dev.link] cmd/link: parallelize second-stage DWARF generation
git bisect good d1a186d29ce9d917dda7c66cfaee7788f88e7b9e
# good: [b473a1f8da2998be9dee2b0e59a6854a4955dba1] [dev.link] cmd/link: read symbol type only when necessary in elfreloc1
git bisect good b473a1f8da2998be9dee2b0e59a6854a4955dba1
# good: [130ede0d9e01ef53e734371faea080f5301d9c55] [dev.link] cmd/link: remove some unneeded code from writeBlock()
git bisect good 130ede0d9e01ef53e734371faea080f5301d9c55
# bad: [0434d4093458d24db6af1e65fb257cee78512c25] [dev.link] cmd/compile: mark stmp and stkobj symbols as static
git bisect bad 0434d4093458d24db6af1e65fb257cee78512c25
# bad: [59a702aa6aca364eb75f40261fdafe4ae9be153e] [dev.link] cmd/link: emit Mach-O relocations in mmap
git bisect bad 59a702aa6aca364eb75f40261fdafe4ae9be153e
# good: [041d8850a15a4c4af23f8cb21cc47c0b4d85d7fa] [dev.link] cmd/link: run more tests in parallel
git bisect good 041d8850a15a4c4af23f8cb21cc47c0b4d85d7fa
# first bad commit: [59a702aa6aca364eb75f40261fdafe4ae9be153e] [dev.link] cmd/link: emit Mach-O relocations in mmap

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/270941 mentions this issue: cmd/link: recompute heapPos after copyHeap

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/270942 mentions this issue: [release-branch.go1.15] cmd/link: recompute heapPos after copyHeap

@robbertvanginkel
Copy link

I cherry-picked https://golang.org/cl/270942 onto the 1.15.5 tag and rebuild the failing project to verify and it seems this solves the issue for our go-build project.

Going to verify the bazel project too, but so far so good!

@robbertvanginkel
Copy link

robbertvanginkel commented Nov 18, 2020

Using the same go build from cherry-picking https://golang.org/cl/270942 onto the 1.15.5 tag also works for the bazel project, so I'm fairly confident that must have been it.

@cherrymui
Copy link
Member

@robbertvanginkel thanks for confirming!

gopherbot pushed a commit that referenced this issue Nov 18, 2020
Immediately after a forward Seek, the offset we're writing to is
beyond len(buf)+len(heap):

|<--- buf --->|<--- heap --->|
                                    ^
                                    off

If we do a copyHeap at this point, the new heapPos should not be
0:

|<---------- buf ----------->|<-heap->|
                                    ^
                                    off

Recompute it.

For #42082.

Change-Id: Icb3e4e1c7bf7d1fd3d76a2e0d7dfcb319c661534
Reviewed-on: https://go-review.googlesource.com/c/go/+/270941
Trust: Cherry Zhang <[email protected]>
Run-TryBot: Cherry Zhang <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Than McIntosh <[email protected]>
@cherrymui cherrymui modified the milestones: Backlog, Go1.15.6 Nov 18, 2020
@cherrymui cherrymui added the CherryPickCandidate Used during the release process for point releases label Nov 18, 2020
@dmitshur
Copy link
Contributor

@cherrymui Should this issue target Go1.16 milestone, and once fixed, a backport of it requested via the process described at https://golang.org/wiki/MinorReleases? Or is there a reason not to follow that process in this case?

@dmitshur
Copy link
Contributor

From #42082 (comment), I see this issue is already resolved on tip via changes in dev.link that were merged into master. So I understand this issue is about determining what would be needed to be backported to Go 1.15.

@linzhp Is it known whether this problem applies to Go 1.14 as well, or just 1.15?

@robbertvanginkel
Copy link

@dmitshur this only applies to 1.15, we have not experienced the same for 1.14.

@robbertvanginkel
Copy link

@dmitshur @cherrymui any idea if/when https://go-review.googlesource.com/c/go/+/270942/ might be approved for cherry-pick and when a 1.15 patch release could be available?

Want to throw a voice out there that this solves a real regression for us that was introduced in 1.15 and we'd like to be able to use the latest release again :)

@cherrymui
Copy link
Member

I'd like the CL to be cherry-picked. But I don't know when. Probably soon, as we do minor releases at beginning of each month, if I understand correctly. @dmitshur may have better answer.

@dmitshur
Copy link
Contributor

dmitshur commented Nov 30, 2020

It would help make the cherry-pick review process easier and more consistent if we follow the process documented at https://golang.org/wiki/MinorReleases.

@cherrymui Is it okay to make this issue target the Go 1.16 milestone, mark it as fixed by CL 270941, then request a new backport issue for Go 1.15, and update the commit message of CL 270942 accordingly? Creating a new backport issue gives a chance to highlight the short rationale for why this backport is needed.

@cherrymui
Copy link
Member

Yeah, feel free to do whatever that makes your workflow easier.

I didn't make this issue target Go 1.16 as it just doesn't fail at Go 1.16. (CL 270941 is still nice to have.)

@robbertvanginkel
Copy link

@gopherbot please consider this for backport to 1.15, it's a linker regression that prevents some programs from linking succesfully.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #42948 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@robbertvanginkel
Copy link

I read through https://github.com/golang/go/wiki/MinorReleases and tried to follow the process by having the bot create #42948.

With that I think the CherryPickCandidate/NeedsInvestigation labels can be removed from this issue and it can be resolved as b194b51 is in master.

@dmitshur
Copy link
Contributor

dmitshur commented Dec 2, 2020

Thank you for doing that @robbertvanginkel. As discussed above, I'll make this issue target Go 1.16 and close it since this issue is resolved on tip.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed CherryPickCandidate Used during the release process for point releases NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 2, 2020
@dmitshur dmitshur modified the milestones: Go1.15.6, Go1.16 Dec 2, 2020
@dmitshur dmitshur closed this as completed Dec 2, 2020
gopherbot pushed a commit that referenced this issue Dec 3, 2020
Immediately after a forward Seek, the offset we're writing to is
beyond len(buf)+len(heap):

|<--- buf --->|<--- heap --->|
                                    ^
                                    off

If we do a copyHeap at this point, the new heapPos should not be
0:

|<---------- buf ----------->|<-heap->|
                                    ^
                                    off

Recompute it.

Updates #42082
Fixes #42948

Change-Id: Icb3e4e1c7bf7d1fd3d76a2e0d7dfcb319c661534
Reviewed-on: https://go-review.googlesource.com/c/go/+/270942
Run-TryBot: Carlos Amedee <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Than McIntosh <[email protected]>
Reviewed-by: Jeremy Faller <[email protected]>
Trust: Cherry Zhang <[email protected]>
@golang golang locked and limited conversation to collaborators Dec 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

9 participants