This repository provides a method for working around the sporadic issue seen on older linux distributions: MathWorks® products can trigger an assert failure at concurrent pthread_create and dlopen (BZ-19329) in the GNU C Libraries (glibc).
If you are running
- ubuntu-based systems and can upgrade to version 22.04 (Jammy Jellyfish) this is the safest and easiest way to alleviate the issue, since that version contains glibc v2.35 in which the underlying issue is completely fixed.
- RHEL-based 8.4 or 8.5 systems (update 27 June 2022). It appears that RHEL have patched the
glibc-2.28
packages in release189
to fix this issue. Ensure that you have installed at leastglibc-2.28-189.1.el8
.
If instead you want to work around this issue, you can use this repository. It provides a build procedure (in an isolated Docker® container) to produce patched versions of the glibc libraries for recent Almalinux, Ubuntu® and Debian® releases. These patched versions incorporate an initial fix proposed on the libc-alpha mailing list that mitigate the issue. In the release area of this repository you can find the debian package build artefacts produced by running the build on Ubuntu 18.04 & 20.04 as well as Debian 9, 10 & 11. You can install these artefacts on an appropriate debian-based machine, virtual machine or docker container, using dpkg -i
. For Almalinux you cand find the appropriate rpm's
which should also work on UBI and CentOS containers.
The assert failure at concurrent pthread_create and dlopen glibc bug was first reported in December 2015 and can affect any process on Linux that creates a thread at the same time as opening a dynamic shared object library. Initially the issue was only observable with reasonable frequency on very large scale systems such as high performance computing clusters or cloud scale deployment platforms and so did not receive significant attention. However, early on there were proposed patches to the library. Large scale systems applied those patches in-house and saw significant benefit. More recently a proposed complete fix for this and a set of related issues has been reviewed by the glibc team and accepted into version 2.34 of glibc (released in August 2021). The 2.34 version of glibc is available in RHEL 9 beta and Ubuntu 21.10 (Impish Indri). However, there are no plans to backport the fix into previous glibc versions and it is expected that previous versions will be in production use for a significant number of years (e.g. the current end-of-life date for Ubuntu:20.04 is April 2030).
More recently MathWorks products have made extensive use of a C++ micro-services architecture. This architecture leads to a more dynamic system in which library modules are loaded at the point of use. As a result, the MATLAB® process is more likely to load a library at the same time as creating a thread, and so is more likely to encounter this glibc bug. When this issue is encountered the console that opened MATLAB shows a message similar to the following:
Inconsistency detected by ld.so: ../elf/dl-tls.c: 597: _dl_allocate_tls_init: Assertion 'listp != NULL' failed!
or
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
There might also be a stack trace file called matlab_crash_dump.${PID}
in the users home folder or the current working folder. This usually indicates that a segmentation violation has been detected and the stack trace starts with something similar to the following:
Stack Trace (from fault):
[ 0] 0x00002b661142d5a0 /lib64/ld-linux-x86-64.so.2+00075168 _dl_allocate_tls_init+00000080
[ 1] 0x00002b66120c187c /usr/lib64/libpthread.so.0+00034940 pthread_create+00001884
If you see these or similar signatures at a sufficient frequency on a system, you might want to consider patching glibc on that system, machine or container.
RHEL have just integrated the BZ-19329 patch into glibc-2.28-189.1.el8
. It appear that the change actually went into build 2.28-175
and got released with 2.28-189
.
Unless you need to use a pre-189
release of the package you should no longer need to use this repository to patch RHEL and AlmaLinux for BZ-18329
These patches all derive from an original patch put together by Szabolcs Nagy in January 2016. The 2.24 to 2.28 patches in this repo are derived from this original e-mail and can be downloaded directly from the archive of the [email protected]
mailing list where they were proposed:
- https://sourceware.org/legacy-ml/libc-alpha/2016-11/msg01092.html
- https://sourceware.org/legacy-ml/libc-alpha/2016-11/msg01093.html
These 2 patches are directly linked in the original bug report in comment 7 by Pádraig Brady. In addition, the bug report also has a reference to the original Szabolcs Nagy patch in comment 4 (dated January 2016). The 2 messages above refer back to that original patch via a message describing the overall problem in more detail.
In addition, in Sept 2017 Pádraig Brady pointed out that there was an off-by-one error in the original patch that needs to be included
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 073321c..2c9ad2a 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
}
total += cnt;
- if (total >= dtv_slots)
+ if (total > dtv_slots)
break;
/* Synchronize with dl_add_to_slotinfo. */
This is source for the final unsubmitted-bz19329-fixup.v2.27.patch
In glibc v2.31 the original source code changed significantly and the patches needed to be slightly adapted so as to match the new codebase. These adapted patches are included here in the patches/2.31
folder and soft-linked from 2.32 and 2.33.
Many thanks to the broader glibc team and particularly Szabolcs Nagy for providing the original patches and for fixing these issues in glibc v2.34.