makeinfo and flex not found even though they're on path #3

timotheecour · 2014-06-04T22:01:09Z

I had to use:make MAKEINFO=/usr/bin/makeinfo FLEX=/usr/bin/flex

…ng unwind. https://sourceware.org/ml/gdb-patches/2014-05/msg00737.html Currently a MEMORY_ERROR raised during unwinding a frame will cause the unwind to stop with an error message, for example: (gdb) bt #0 breakpt () at amd64-invalid-stack-middle.c:27 #1 0x00000000004008f0 in func5 () at amd64-invalid-stack-middle.c:32 #2 0x0000000000400900 in func4 () at amd64-invalid-stack-middle.c:38 #3 0x0000000000400910 in func3 () at amd64-invalid-stack-middle.c:44 #4 0x0000000000400928 in func2 () at amd64-invalid-stack-middle.c:50 Cannot access memory at address 0x2aaaaaab0000 However, frame #4 is marked as being the end of the stack unwind, so a subsequent request for the backtrace looses the error message, such as: (gdb) bt #0 breakpt () at amd64-invalid-stack-middle.c:27 #1 0x00000000004008f0 in func5 () at amd64-invalid-stack-middle.c:32 #2 0x0000000000400900 in func4 () at amd64-invalid-stack-middle.c:38 #3 0x0000000000400910 in func3 () at amd64-invalid-stack-middle.c:44 #4 0x0000000000400928 in func2 () at amd64-invalid-stack-middle.c:50 When fetching the backtrace, or requesting the stack depth using the MI interface the situation is even worse, the first time a request is made we encounter the memory error and so the MI returns an error instead of the correct result, for example: (gdb) -stack-info-depth ^error,msg="Cannot access memory at address 0x2aaaaaab0000" Or, (gdb) -stack-list-frames ^error,msg="Cannot access memory at address 0x2aaaaaab0000" However, once one of these commands has been used gdb has, internally, walked the stack and figured that out that frame #4 is the bottom of the stack, so the second time an MI command is tried you'll get the "expected" result: (gdb) -stack-info-depth ^done,depth="5" Or, (gdb) -stack-list-frames ^done,stack=[frame={level="0", .. snip lots .. }] After this patch the MEMORY_ERROR encountered during the frame unwind is attached to frame #4 as the stop reason, and is displayed in the CLI each time the backtrace is requested. In the MI, catching the error means that the "expected" result is returned the first time the MI command is issued. So, from the CLI the results of the backtrace will be: (gdb) bt #0 breakpt () at amd64-invalid-stack-middle.c:27 #1 0x00000000004008f0 in func5 () at amd64-invalid-stack-middle.c:32 #2 0x0000000000400900 in func4 () at amd64-invalid-stack-middle.c:38 #3 0x0000000000400910 in func3 () at amd64-invalid-stack-middle.c:44 #4 0x0000000000400928 in func2 () at amd64-invalid-stack-middle.c:50 Backtrace stopped: Cannot access memory at address 0x2aaaaaab0000 Each and every time that the backtrace is requested, while the MI output will similarly be consistently: (gdb) -stack-info-depth ^done,depth="5" Or, (gdb) -stack-list-frames ^done,stack=[frame={level="0", .. snip lots .. }] gdb/ChangeLog: * frame.c (struct frame_info): Add stop_string field. (get_prev_frame_always_1): Renamed from get_prev_frame_always. (get_prev_frame_always): Old content moved into get_prev_frame_always_1. Call get_prev_frame_always_1 inside TRY_CATCH, handle MEMORY_ERROR exceptions. (frame_stop_reason_string): New function definition. * frame.h (unwind_stop_reason_to_string): Extend comment to mention frame_stop_reason_string. (frame_stop_reason_string): New function declaration. * stack.c (frame_info): Switch to frame_stop_reason_string. (backtrace_command_1): Switch to frame_stop_reason_string. * unwind_stop_reason.def: Add UNWIND_MEMORY_ERROR. (LAST_ENTRY): Changed to UNWIND_MEMORY_ERROR. * guile/lib/gdb.scm: Add FRAME_UNWIND_MEMORY_ERROR to export list. gdb/doc/ChangeLog: * guile.texi (Frames In Guile): Mention FRAME_UNWIND_MEMORY_ERROR. * python.texi (Frames In Python): Mention gdb.FRAME_UNWIND_MEMORY_ERROR. gdb/testsuite/ChangeLog: * gdb.arch/amd64-invalid-stack-middle.exp: Update expected results. * gdb.arch/amd64-invalid-stack-top.exp: Likewise.

Since target-async was turned on by default, debugging on Windows using GDB+GDBserver sometimes hangs while waiting for a RSP reply. The problem is a race in the gdb_select machinery. This is what we see for a faulty next on the GDB side: (gdb) n infrun: clear_proceed_status_thread (Thread 4424) infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=1) (...) infrun: resume (step=1, signal=GDB_SIGNAL_0), ... Sending packet: $vCont;s:1148;c#5e... *hang* At this point, attaching a debugger to the hanging GDB confirms that it is blocked, waiting for a socket event: #6 0x757841d8 in WaitForMultipleObjects () from C:\Windows\syswow64\kernel32.dll #7 0x004708e7 in gdb_select (n=469, readfds=0x88ca50 <gdb_notifier+784>, writefds=0x88cb54 <gdb_notifier+1044>, exceptfds=0x88cc58 <gdb_notifier+1304>, timeout=0x0) at /[...]/gdb/mingw-hdep.c:172 #8 0x00527926 in gdb_wait_for_event (block=1) at /[...]/gdb/event-loop.c:831 #9 0x00526ff1 in gdb_do_one_event () at /[...]/gdb/event-loop.c:403 However, on the GDBserver side, we see that GDBserver already sent a T05 packet reply: gdbserver: kernel event EXCEPTION_DEBUG_EVENT for pid=4968 tid=1148 EXCEPTION_SINGLE_STEP Child Stopped with signal = 5 Writing resume reply for LWP 4968.4424:1 DEBUG: write_prim ($T0505:c8fe2800;04:a0fe2800;08:38164000;thread:1148;#f0) -> 55 To recap, on Windows, 'select' only works with sockets, so we have a wrapper, gdb_select, that uses the GDB serial abstraction to handle sockets, consoles, pipes, and serial ports. Each serial descriptor has a thread associated (we call those the select threads), and those threads communicate with the main thread by means of standard Windows events. It basically goes like this: gdb_select first loops through all fds of interest, calling their wait_handle hooks, which returns an event that WaitForMultipleObjects can wait on. gdb_select then blocks in WaitForMultipleObjects with all those event handles. The wait_handle hook is responsible for arranging for the returned event to become set once data is available. This is done by setting the descriptor's helper thread running, which itself knows how to wait for data from the type of handle it manages (sockets, pipes, consoles, files, etc.). Once data arrives, the select thread sets the corresponding event which unblocks WaitForMultipleObjects within gdb_select. However, the wait_handle hook can also apply an optimization: if data is already pending, then there's no need to set the thread running, and the descriptors event can be set immediately. It's around this latter aspect that lies the bug/race. Adding some ad hoc debug logs to ser-mingw.c and mingw-hdep.c, we see the following sequence of events, right after sending "$vCont;s:1148;c#5e". Thread 1 is the main thread, and thread 2 is the socket's helper/select thread. gdb_select was only passed one descriptor to wait on, the remote target's socket. net_windows_select_thread is the entry point of the select threads for sockets. #1 - thread 1: gdb_select: enter #2 - thread 2: net_windows_select_thread: WaitForMultipleObjects blocking gdb_select walked over the wait_handle hooks, and woke up the socket's helper thread. The helper thread is now blocked waiting for socket events. #3 - thread 1: gdb_select: WaitForMultipleObjects polling (timeout=0ms) #4 - thread 1: gdb_select: WaitForMultipleObjects returned 102 (WAIT_TIMEOUT) There was no pending data available yet, and gdb_select was passed timeout==0ms, and so WaitForMultipleObjects times out immediately. #5 - thread 2: net_windows_select_thread: WaitForMultipleObjects returned 1 Just afterwards, socket data arrives, and thread 2 wakes up. Thread 2 calls WSAEnumNetworkEvents, which clears state->sock_event, and marks the serial's read_event event, telling the main thread that data is available. #6 - thread 1: gdb_select: call serial_done_wait_handle on each serial gdb_select stops all the helper/select threads. #7 - thread 1: gdb_select: return 0 (WAIT_TIMEOUT) gdb_select in the main thread returns to the caller. Note that at this point, data is pending on the socket, the serial's read_event is set, but the socket's sock_event event is not set, until _further_ data arrives. Now GDB does its thing and goes back to the event loop. That calls gdb_select, but with timeout==INFINITE. Again, gdb_select calls the socket serial's wait_handle hook. It first clears its events, starting from a clean slate: ResetEvent (state->base.read_event); ResetEvent (state->base.except_event); ResetEvent (state->base.stop_select); That cleared read_event, which was previously set in #5 above. And then it checks for pending events, in the sock_event event: /* Check any pending events. This both avoids starting the thread unnecessarily, and handles stray FD_READ events (see below). */ if (WaitForSingleObject (state->sock_event, 0) == WAIT_OBJECT_0) { That also fails because state->sock_event was cleared in #5 too... So the wait_handle hook erroneously decides that it needs to start the helper thread to wait for input: #8 - thread 2: net_windows_select_thread: WaitForMultipleObjects blocking #9 - thread 1: gdb_select: WaitForMultipleObjects blocking (INFINITE) But, GDBserver already sent all it had to send, so both threads waits forever... At first I thought that net_windows_wait_handle shouldn't be resetting state->base.read_event or state->base.except_event, but looking deeper, the pipe and console wait_handle hooks reset all events too. It actually makes sense that way -- consuming an event from different threads is bad practice, and, we should always be able to query pending state without looking at the state->sock_event from within net_windows_wait_handle. The end result is much simpler, and makes net_windows_select_thread look a lot like console_select_thread, actually. gdb/ 2014-06-11 Pedro Alves <[email protected]> PR remote/17028 * ser-mingw.c (net_windows_socket_check_pending): New function. (net_windows_select_thread): Ignore spurious wakeups. Use net_windows_socket_check_pending. (net_windows_wait_handle): Check for pending events with ioctlsocket, through net_windows_socket_check_pending, instead of checking the socket's event.

On async targets, a synchronous attach is done like this: #1 - target_attach is called (PTRACE_ATTACH is issued) #2 - a continuation is installed #3 - we go back to the event loop #4 - target reports stop (SIGSTOP), event loop wakes up, and attach continuation is called #5 - among other things, the continuation calls target_terminal_inferior, which removes stdin from the event loop Note that in #3, GDB is still processing user input. If the user is fast enough, e.g., with something like: echo -e "attach PID\nset xxx=1" | gdb ... then the "set" command is processed before the attach completes. We get worse behavior even, if input is a tty and therefore readline/editing is enabled, with e.g.,: (gdb) attach PID\nset xxx=1 we then crash readline/gdb, with: Attaching to program: attach-wait-input, process 14537 readline: readline_callback_read_char() called with no handler! Aborted $ Fix this by calling target_terminal_inferior before #3 above. The test covers both scenarios by running with editing/readline forced to both on and off. gdb/ 2014-07-09 Pedro Alves <[email protected]> * infcmd.c (attach_command_post_wait): Don't call target_terminal_inferior here. (attach_command): Call it here instead. gdb/testsuite/ 2014-07-09 Pedro Alves <[email protected]> * gdb.base/attach-wait-input.exp: New file. * gdb.base/attach-wait-input.c: New file.

Here's an example, with the new test: gdbserver :9999 gdb.threads/kill gdb gdb.threads/kill (gdb) b 52 Breakpoint 1 at 0x4007f4: file kill.c, line 52. Continuing. Breakpoint 1, main () at kill.c:52 52 return 0; /* set break here */ (gdb) k Kill the program being debugged? (y or n) y gdbserver :9999 gdb.threads/kill Process gdb.base/watch_thread_num created; pid = 9719 Listening on port 1234 Remote debugging from host 127.0.0.1 Killing all inferiors Segmentation fault (core dumped) Backtrace: (gdb) bt #0 0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199 #1 0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966 #2 0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163 #3 0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934 #4 0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57 #5 0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767 #6 0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880 #7 0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434 #8 0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189 #9 0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552 #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283 The problem is that linux_wait_for_event deletes lwps that have exited (even those not passed in as lwps of interest), while the lwp/thread list is being walked on with find_inferior. find_inferior can handle the current iterated inferior being deleted, but not others. When killing lwps, we don't really care about any of the pending status handling of linux_wait_for_event. We can just waitpid the lwps directly, which is also what GDB does (see linux-nat.c:kill_wait_callback). This way the lwps are not deleted while we're walking the list. They'll be deleted by linux_mourn afterwards. This crash triggers several times when running the testsuite against GDBserver with the native-gdbserver board (target remote), but as GDB can't distinguish between GDBserver crashing and "kill" being sucessful, as in both cases the connection is closed (the 'k' packet doesn't require a reply), and the inferior is gone, that results in no FAIL. The patch adds a generic test that catches the issue with extended-remote mode (and works fine with native testing too). Here's how it fails with the native-extended-gdbserver board without the fix: (gdb) info threads Id Target Id Frame 6 Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 5 Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 4 Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 3 Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 2 Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 * 1 Thread 15367.15367 main () at .../gdb.threads/kill.c:52 (gdb) kill Kill the program being debugged? (y or n) y Remote connection closed ^^^^^^^^^^^^^^^^^^^^^^^^ (gdb) FAIL: gdb.threads/kill.exp: kill Extended remote should remain connected after the kill. gdb/gdbserver/ 2014-07-11 Pedro Alves <[email protected]> * linux-low.c (kill_wait_lwp): New function, based on kill_one_lwp_callback, but use my_waitpid directly. (kill_one_lwp_callback, linux_kill): Use it. gdb/testsuite/ 2014-07-11 Pedro Alves <[email protected]> * gdb.threads/kill.c: New file. * gdb.threads/kill.exp: New file.

The tests at <https://sourceware.org/ml/gdb-patches/2014-07/msg00277.html> show that comparing a fully optimized out value's contents with a value that has not been optimized out, or is partially optimized out crashes GDB: (gdb) bt #0 __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:816 #1 0x00000000005a1914 in memcmp_with_bit_offsets (ptr1=0x202b2f0 "\n", offset1_bits=0, ptr2=0x0, offset2_bits=0, length_bits=32) at /home/pedro/gdb/mygit/build/../src/gdb/value.c:678 #2 0x00000000005a1a05 in value_available_contents_bits_eq (val1=0x2361ad0, offset1=0, val2=0x23683b0, offset2=0, length=32) at /home/pedro/gdb/mygit/build/../src/gdb/value.c:717 #3 0x00000000005a1c09 in value_available_contents_eq (val1=0x2361ad0, offset1=0, val2=0x23683b0, offset2=0, length=4) at /home/pedro/gdb/mygit/build/../src/gdb/value.c:769 #4 0x00000000006033ed in read_frame_arg (sym=0x1b78d20, frame=0x19bca50, argp=0x7fff4aba82b0, entryargp=0x7fff4aba82d0) at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:416 #5 0x0000000000603abb in print_frame_args (func=0x1b78cb0, frame=0x19bca50, num=-1, stream=0x1aea450) at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:671 #6 0x0000000000604ae8 in print_frame (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, print_args=1, sal=...) at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:1205 #7 0x0000000000604050 in print_frame_info (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, print_args=1, set_current_sal=1) at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:857 #8 0x00000000006029b3 in print_stack_frame (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, set_current_sal=1) at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:169 #9 0x00000000005fc4b8 in print_stop_event (ws=0x7fff4aba8790) at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:6068 #10 0x00000000005fc830 in normal_stop () at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:6214 The 'ptr2=0x0' in frame #1 is val2->contents, and since git 4f14910: gdb/ChangeLog 2013-11-26 Andrew Burgess <[email protected]> * value.c (allocate_optimized_out_value): Mark value as non-lazy. ... a fully optimized-out value can have it's value contents buffer NULL. As a spotgap fix, revert 4f14910, with a comment. A full fix would be too invasive for 7.8. gdb/ 2014-07-22 Pedro Alves <[email protected]> * value.c (allocate_optimized_out_value): Don't mark value as non-lazy.

The TUI currently crashes when the user types <return> in response to a pagination prompt: $ gdb --tui ... *the TUI is now active* (gdb) set height 2 (gdb) help List of classes of commands: Program received signal SIGSEGV, Segmentation fault. strlen () at ../sysdeps/x86_64/strlen.S:106 106 movdqu (%rax), %xmm12 (top-gdb) bt #0 strlen () at ../sysdeps/x86_64/strlen.S:106 #1 0x000000000086be5f in xstrdup (s=0x0) at ../src/libiberty/xstrdup.c:33 #2 0x00000000005163f9 in tui_prep_terminal (notused1=1) at ../src/gdb/tui/tui-io.c:296 #3 0x000000000077a7ee in _rl_callback_newline () at ../src/readline/callback.c:82 #4 0x000000000077a853 in rl_callback_handler_install (prompt=0x0, linefunc=0x618b60 <command_line_handler>) at ../src/readline/callback.c:102 #5 0x0000000000718a5c in gdb_readline_wrapper_cleanup (arg=0xfd14d0) at ../src/gdb/top.c:788 #6 0x0000000000596d08 in do_my_cleanups (pmy_chain=0xcf0b38 <cleanup_chain>, old_chain=0x1043d10) at ../src/gdb/cleanups.c:155 #7 0x0000000000596d75 in do_cleanups (old_chain=0x1043d10) at ../src/gdb/cleanups.c:177 #8 0x0000000000718bd9 in gdb_readline_wrapper (prompt=0x7fffffffcfa0 "---Type <return> to continue, or q <return> to quit---") at ../src/gdb/top.c:835 #9 0x000000000071cf74 in prompt_for_continue () at ../src/gdb/utils.c:1894 #10 0x000000000071d434 in fputs_maybe_filtered (linebuffer=0x1043db0 "List of classes of commands:\n\n", stream=0xf72e20, filter=1) at ../src/gdb/utils.c:2111 #11 0x000000000071da0f in vfprintf_maybe_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n", args=0x7fffffffd118, filter=1) at ../src/gdb/utils.c:2339 #12 0x000000000071da4a in vfprintf_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n", args=0x7fffffffd118) at ../src/gdb/utils.c:2347 #13 0x000000000071dc72 in fprintf_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n") at ../src/gdb/utils.c:2399 #14 0x00000000004f90ab in help_list (list=0xe6d100, cmdtype=0x89ad8c "", class=all_classes, stream=0xf72e20) at ../src/gdb/cli/cli-decode.c:1038 #15 0x00000000004f8dba in help_cmd (arg=0x0, stream=0xf72e20) at ../src/gdb/cli/cli-decode.c:946 Git 0017922 added: @@ -776,6 +777,12 @@ gdb_readline_wrapper_cleanup (void *arg) gdb_assert (input_handler == gdb_readline_wrapper_line); input_handler = cleanup->handler_orig; + + /* Reinstall INPUT_HANDLER in readline, without displaying a + prompt. */ + if (async_command_editing_p) + rl_callback_handler_install (NULL, input_handler); and tui_prep_terminal simply misses handling the case of a NULL rl_prompt. I also checked that readline's sources do similar checks. gdb/ 2014-07-24 Pedro Alves <[email protected]> * tui/tui-io.c (tui_prep_terminal): Handle NULL rl_prompt.

echo 'void f(char *s){}main(){f((char *)1);}'|gcc -g -x c -;../gdb ./a.out -ex 'b f' -ex r ====ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000aaccf at pc 0x96eea7 bp 0x7fff75bdbc90 sp 0x7fff75bdbc80 READ of size 1 at 0x6020000aaccf thread T0 #0 0x96eea6 in extract_unsigned_integer .../gdb/findvar.c:108 #1 0x9df02b in val_print_string .../gdb/valprint.c:2513 [...] 0x6020000aaccf is located 1 bytes to the left of 8-byte region [0x6020000aacd0,0x6020000aacd8) allocated by thread T0 here: #0 0x7f45fad26b97 in malloc (/lib64/libasan.so.1+0x57b97) #1 0xdb3409 in xmalloc common/common-utils.c:45 #2 0x9d8cf9 in read_string .../gdb/valprint.c:1845 #3 0x9defca in val_print_string .../gdb/valprint.c:2502 [..] ====ABORTING gdb/ 2014-08-18 Jan Kratochvil <[email protected]> Fix -fsanitize=address on unreadable inferior strings. * valprint.c (val_print_string): Fix access before BUFFER.

https://bugzilla.redhat.com/show_bug.cgi?id=1126177 ERROR: AddressSanitizer: SEGV on unknown address 0x000000000050 (pc 0x000000992bef sp 0x7ffff9039530 bp 0x7ffff9039540 T0) #0 0x992bee in value_type .../gdb/value.c:925 #1 0x87c951 in py_print_single_arg python/py-framefilter.c:445 #2 0x87cfae in enumerate_args python/py-framefilter.c:596 #3 0x87e0b0 in py_print_args python/py-framefilter.c:968 It crashes because frame_arg::val is documented it may contain NULL (frame_arg::error is then non-NULL) but the code does not handle it. Another bug is that py_print_single_arg() calls goto out of its TRY_CATCH which messes up GDB cleanup chain crashing GDB later. It is probably 7.7 regression (I have not verified it) due to the introduction of Python frame filters. gdb/ChangeLog PR python/17355 * python/py-framefilter.c (py_print_single_arg): Handle NULL FA->VAL. Fix goto out of TRY_CATCH. gdb/testsuite/ChangeLog PR python/17355 * gdb.python/amd64-py-framefilter-invalidarg.S: New file. * gdb.python/py-framefilter-invalidarg-gdb.py.in: New file. * gdb.python/py-framefilter-invalidarg.exp: New file. * gdb.python/py-framefilter-invalidarg.py: New file.

The test does a backtrace to see which thread (#2 or #3) is assigned to which SIGUSR (1 or 2). If the main thread gets to all_threads_running before the sigusr threads get to their entry point, then the function name isn't in the backtrace and the test fails. Alas this version of the code is within epsilon of what I started with, and then over-simplified things.

…nd crashes readline/GDB Jan caught an intermittent GDB crash with the annota1.exp test: Starting program: .../gdb/testsuite/gdb.base/annota1 ^M [...] FAIL: gdb.base/annota1.exp: run until main breakpoint (timeout) [...] readline: readline_callback_read_char() called with no handler!^M ERROR: Process no longer exists All we need to is to continue the inferior in the foreground, and type a command while the inferior is running. E.g.: (gdb) set annotate 2 ▒▒pre-prompt (gdb) ▒▒prompt c ▒▒post-prompt Continuing. ▒▒starting ▒▒frames-invalid *inferior is running now* p 1<ret> readline: readline_callback_read_char() called with no handler! Aborted (core dumped) $ When we run a foreground execution command we call target_terminal_inferior to stop GDB from processing input, and to put the inferior's terminal settings in effect. Then we tell readline to hide the prompt with display_gdb_prompt, which clears readline's input callback too. When the target stops, we call target_terminal_ours, which re-installs stdin in the event loop, and then we redisplay the prompt, reinstalling the readline callbacks. However, when annotations are in effect, the "frames-invalid" annotation code calls target_terminal_ours after 'resume' had already called target_terminal_inferior: (top-gdb) bt #0 0x000000000056b82f in annotate_frames_invalid () at gdb/annotate.c:219 #1 0x000000000072e6cc in reinit_frame_cache () at gdb/frame.c:1705 #2 0x0000000000594bb9 in registers_changed_ptid (ptid=...) at gdb/regcache.c:612 #3 0x000000000064cca1 in target_resume (ptid=..., step=1, signal=GDB_SIGNAL_0) at gdb/target.c:2136 #4 0x00000000005f57af in resume (step=1, sig=GDB_SIGNAL_0) at gdb/infrun.c:2263 #5 0x00000000005f6051 in proceed (addr=18446744073709551615, siggnal=GDB_SIGNAL_DEFAULT, step=1) at gdb/infrun.c:2613 And then once we hide the prompt and remove readline's input handler callback, we're in a bad state. We end up with the target running supposedly in the foreground, but with stdin still installed on the event loop. Any input then calls into readline, which aborts because no rl_linefunc callback handler is installed: Program received signal SIGABRT, Aborted. 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (top-gdb) bt #0 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x0000003b36a36f68 in __GI_abort () at abort.c:89 During symbol reading, debug info gives source 9 included from file at zero line 0. During symbol reading, debug info gives command-line macro definition with non-zero line 19: _STDC_PREDEF_H 1. #2 0x0000000000784a25 in rl_callback_read_char () at src/readline/callback.c:116 #3 0x0000000000619111 in rl_callback_read_char_wrapper (client_data=0x0) at src/gdb/event-top.c:167 #4 0x00000000006194e7 in stdin_event_handler (error=0, client_data=0x0) at src/gdb/event-top.c:373 #5 0x00000000006180da in handle_file_event (data=...) at src/gdb/event-loop.c:763 #6 0x00000000006175c1 in process_event () at src/gdb/event-loop.c:340 #7 0x0000000000617688 in gdb_do_one_event () at src/gdb/event-loop.c:404 #8 0x00000000006176d8 in start_event_loop () at src/gdb/event-loop.c:429 #9 0x0000000000619143 in cli_command_loop (data=0x0) at src/gdb/event-top.c:182 #10 0x000000000060f4c8 in current_interp_command_loop () at src/gdb/interps.c:318 #11 0x0000000000610691 in captured_command_loop (data=0x0) at src/gdb/main.c:323 #12 0x000000000060c385 in catch_errors (func=0x610676 <captured_command_loop>, func_args=0x0, errstring=0x900241 "", mask=RETURN_MASK_ALL) at src/gdb/exceptions.c:237 #13 0x0000000000611b8f in captured_main (data=0x7fffffffd7b0) at src/gdb/main.c:1151 #14 0x000000000060c385 in catch_errors (func=0x610a8e <captured_main>, func_args=0x7fffffffd7b0, errstring=0x900241 "", mask=RETURN_MASK_ALL) at src/gdb/exceptions.c:237 #15 0x0000000000611bb8 in gdb_main (args=0x7fffffffd7b0) at src/gdb/main.c:1159 #16 0x000000000045ef57 in main (argc=3, argv=0x7fffffffd8b8) at src/gdb/gdb.c:32 The fix is to make the annotation code call target_terminal_inferior again after printing, if the inferior's settings were in effect. While at it, when we're doing output only, instead of target_terminal_ours, we should call target_terminal_ours_for_output. The latter doesn't actually remove stdin from the event loop, and also leaves SIGINT forwarded to the target. New test included. Tested on x86_64 Fedora 20, native and gdbserver. gdb/ 2014-10-17 Pedro Alves <[email protected]> PR gdb/17472 * annotate.c (annotate_breakpoints_invalid): Use target_terminal_our_for_output instead of target_terminal_ours. Give back the terminal to the target. (annotate_frames_invalid): Likewise. gdb/testsuite/ 2014-10-17 Pedro Alves <[email protected]> PR gdb/17472 * gdb.base/annota-input-while-running.c: New file. * gdb.base/annota-input-while-running.exp: New file.

If all threads in the target were already running when the user does "c -a", nothing puts the inferior's terminal settings in effect and removes stdin from the event loop, which we must when running a foreground command. The result is that user input afterwards crashes readline/gdb: (gdb) start Temporary breakpoint 1 at 0x4005d4: file continue-all-already-running.c, line 23. Starting program: continue-all-already-running Temporary breakpoint 1, main () at continue-all-already-running.c:23 23 sleep (10); (gdb) c -a& Continuing. (gdb) c -a Continuing. p 1 readline: readline_callback_read_char() called with no handler! Aborted (core dumped) $ Backtrace: Program received signal SIGABRT, Aborted. 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (top-gdb) p 1 $1 = 1 (top-gdb) bt #0 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x0000003b36a36f68 in __GI_abort () at abort.c:89 #2 0x0000000000784aa9 in rl_callback_read_char () at readline/callback.c:116 #3 0x0000000000619181 in rl_callback_read_char_wrapper (client_data=0x0) at gdb/event-top.c:167 #4 0x0000000000619557 in stdin_event_handler (error=0, client_data=0x0) at gdb/event-top.c:373 #5 0x000000000061814a in handle_file_event (data=...) at gdb/event-loop.c:763 #6 0x0000000000617631 in process_event () at gdb/event-loop.c:340 #7 0x00000000006176f8 in gdb_do_one_event () at gdb/event-loop.c:404 #8 0x0000000000617748 in start_event_loop () at gdb/event-loop.c:429 #9 0x00000000006191b3 in cli_command_loop (data=0x0) at gdb/event-top.c:182 #10 0x000000000060f538 in current_interp_command_loop () at gdb/interps.c:318 #11 0x0000000000610701 in captured_command_loop (data=0x0) at gdb/main.c:323 #12 0x000000000060c3f5 in catch_errors (func=0x6106e6 <captured_command_loop>, func_args=0x0, errstring=0x9002c1 "", mask=RETURN_MASK_ALL) at gdb/exceptions.c:237 #13 0x0000000000611bff in captured_main (data=0x7fffffffd780) at gdb/main.c:1151 #14 0x000000000060c3f5 in catch_errors (func=0x610afe <captured_main>, func_args=0x7fffffffd780, errstring=0x9002c1 "", mask=RETURN_MASK_ALL) at gdb/exceptions.c:237 #15 0x0000000000611c28 in gdb_main (args=0x7fffffffd780) at gdb/main.c:1159 #16 0x000000000045ef97 in main (argc=5, argv=0x7fffffffd888) at gdb/gdb.c:32 (top-gdb) Tested on x86_64 Fedora 20, native and gdbserver. gdb/ 2014-10-17 Pedro Alves <[email protected]> PR gdb/17300 * infcmd.c (continue_1): If continuing all threads in the foreground, make sure the inferior's terminal settings are put in effect. gdb/testsuite/ 2014-10-17 Pedro Alves <[email protected]> PR gdb/17300 * gdb.base/continue-all-already-running.c: New file. * gdb.base/continue-all-already-running.exp: New file.

I noticed that with: $ TERM=dumb ./gdb -q -nx <c-x,a> Cannot enable the TUI: terminal doesn't support cursor addressing [TERM=dumb] (gdb) The next key the user types is silently eaten. The problem is that we're throwing an exception while in a readline callback that isn't prepared for that: (top-gdb) bt #0 tui_enable () at /home/pedro/gdb/mygit/build/../src/gdb/tui/tui.c:388 #1 0x000000000051f47b in tui_rl_switch_mode (notused1=1, notused2=1) at /home/pedro/gdb/mygit/build/../src/gdb/tui/tui.c:101 #2 0x0000000000768d6f in _rl_dispatch_subseq (key=1, map=0xd069c0 <emacs_ctlx_keymap>, got_subseq=0) at /home/pedro/gdb/mygit/build/../src/readline/readline.c:774 #3 0x0000000000768acb in _rl_dispatch_callback (cxt=0x1ce6190) at /home/pedro/gdb/mygit/build/../src/readline/readline.c:686 #4 0x000000000078120b in rl_callback_read_char () at /home/pedro/gdb/mygit/build/../src/readline/callback.c:170 #5 0x0000000000619445 in rl_callback_read_char_wrapper (client_data=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/event-top.c:166 #6 0x000000000061981b in stdin_event_handler (error=0, client_data=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/event-top.c:372 #7 0x000000000061840e in handle_file_event (data=...) at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:762 #8 0x00000000006178f5 in process_event () at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:339 #9 0x00000000006179bc in gdb_do_one_event () at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:403 #10 0x0000000000617a0c in start_event_loop () at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:428 Here, in _rl_dispatch_subseq: 769 770 rl_executing_keymap = map; 771 772 rl_dispatching = 1; 773 RL_SETSTATE(RL_STATE_DISPATCHING); 774 (*map[key].function)(rl_numeric_arg * rl_arg_sign, key); 775 RL_UNSETSTATE(RL_STATE_DISPATCHING); 776 rl_dispatching = 0; 777 778 /* If we have input pending, then the last command was a prefix 779 command. Don't change the state of rl_last_func. Otherwise, GDB is called from line 774, but longjmp'ing at that point leaves rl_dispatching and RL_STATE_DISPATCHING set. Fix this by wrapping tui_rl_switch_mode in a TRY_CATCH. gdb/ 2014-10-29 Pedro Alves <[email protected]> * tui/tui.c (tui_rl_switch_mode): Wrap tui_enable/tui_disable in TRY_CATCH.

... instead of relying on libthread_db. I wrote a test that attaches to a program that constantly spawns short-lived threads, which exposed several issues. This is one of them. On Linux, we need to attach to all threads of a process (thread group) individually. We currently rely on libthread_db to list the threads, but that is problematic, because libthread_db relies on reading data structures out of the inferior (which may well be corrupted). If threads are being created or exiting just while we try to attach, we may trip on inconsistencies in the inferior's thread list. To work around that, when we see a seemingly corrupt list, we currently retry a few times: static void thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new) { ... if (until_no_new) { /* Require 4 successive iterations which do not find any new threads. The 4 is a heuristic: there is an inherent race here, and I have seen that 2 iterations in a row are not always sufficient to "capture" all threads. */ ... That heuristic may well fail, and when it does, we end up with threads in the program that aren't under GDB's control. That's obviously bad and results in quite mistifying failures, like e.g., the process dying for seeminly no reason when a thread that wasn't attached trips on a breakpoint. There's really no reason to rely on libthread_db for this nowadays when we have /proc mounted. In that case, which is the usual case, we can list the LWPs from /proc/PID/task/. In fact, GDBserver is already doing this. The patch factors out that code that knows to walk the task/ directory out of GDBserver, and makes GDB use it too. Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for them to stop immediately. Instead, we just tag the LWP as having an expected stop. Because we can only set the ptrace options when the thread stops, we need a new flag in the lwp structure to keep track of whether we've already set the ptrace options, just like in GDBserver. Note that nothing issues any ptrace command to the threads between the PTRACE_ATTACH and the stop, so this is safe (unlike one scenario described in gdbserver's linux-low.c). When we attach to a program that has threads exiting while we attach, it's easy to race with a thread just exiting as we try to attach to it, like: #1 - get current list of threads #2 - attach to each listed thread #3 - ooops, attach failed, thread is already gone As this is pretty normal, we shouldn't be issuing a scary warning in step #3. When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes we'll see EPERM as well. That happens when the kernel still has the thread in its task list, but the thread is marked as dead. Unfortunately, EPERM is ambiguous and we'll get it also on other scenarios where the thread isn't dead, and in those cases, it's useful to get a warning. To distiguish the cases, when we get an EPERM failure, we open /proc/PID/status, and check the thread's state -- if the /proc file no longer exists, or the state is "Z (Zombie)" or "X (Dead)", we ignore the EPERM error silently; otherwise, we'll warn. Unfortunately, there seems to be a kernel race here. Sometimes I get EPERM, and then the /proc state still indicates "R (Running)"... If we wait a bit and retry, we do end up seeing X or Z state, or get an ESRCH. I thought of making GDB retry the attach a few times, but even with a 500ms wait and 4 retries, I still see the warning sometimes. I haven't been able to identify the kernel path that causes this yet, but in any case, it looks like a kernel bug to me. As this just results failure to suppress a warning that we've been printing since about forever anyway, I'm just making the test cope with it, and issue an XFAIL. gdb/gdbserver/ 2015-01-09 Pedro Alves <[email protected]> * linux-low.c (linux_attach_fail_reason_string): Move to nat/linux-ptrace.c, and rename. (linux_attach_lwp): Update comment. (attach_proc_task_lwp_callback): New function. (linux_attach): Adjust to rename and use linux_proc_attach_tgid_threads. (linux_attach_fail_reason_string): Delete declaration. gdb/ 2015-01-09 Pedro Alves <[email protected]> * linux-nat.c (attach_proc_task_lwp_callback): New function. (linux_nat_attach): Use linux_proc_attach_tgid_threads. (wait_lwp, linux_nat_filter_event): If not set yet, set the lwp's ptrace option flags. * linux-nat.h (struct lwp_info) <must_set_ptrace_flags>: New field. * nat/linux-procfs.c: Include <dirent.h>. (linux_proc_get_int): New parameter "warn". Handle it. (linux_proc_get_tgid): Adjust. (linux_proc_get_tracerpid): Rename to ... (linux_proc_get_tracerpid_nowarn): ... this. (linux_proc_pid_get_state): New function, factored out from (linux_proc_pid_has_state): ... this. Add new parameter "warn" and handle it. (linux_proc_pid_is_gone): New function. (linux_proc_pid_is_stopped): Adjust. (linux_proc_pid_is_zombie_maybe_warn) (linux_proc_pid_is_zombie_nowarn): New functions. (linux_proc_pid_is_zombie): Use linux_proc_pid_is_zombie_maybe_warn. (linux_proc_attach_tgid_threads): New function. * nat/linux-procfs.h (linux_proc_get_tgid): Update comment. (linux_proc_get_tracerpid): Rename to ... (linux_proc_get_tracerpid_nowarn): ... this, and update comment. (linux_proc_pid_is_gone): New declaration. (linux_proc_pid_is_zombie): Update comment. (linux_proc_pid_is_zombie_nowarn): New declaration. (linux_proc_attach_lwp_func): New typedef. (linux_proc_attach_tgid_threads): New declaration. * nat/linux-ptrace.c (linux_ptrace_attach_fail_reason): Adjust to use nowarn functions. (linux_ptrace_attach_fail_reason_string): Move here from gdbserver/linux-low.c and rename. (ptrace_supports_feature): If the current ptrace options are not known yet, check them now, instead of asserting. * nat/linux-ptrace.h (linux_ptrace_attach_fail_reason_string): Declare.

Running the testsuite with a series that reimplements user-visible all-stop behavior on top of a target running in non-stop mode revealed problems related to event starvation avoidance. For example, I see gdb.threads/signal-while-stepping-over-bp-other-thread.exp failing. What happens is that GDB core never gets to see the signal event. It ends up processing the events for the same threads over an over, because Linux's waitpid(-1, ...) returns that first task in the task list that has an event, starving threads on the tail of the task list. So I wrote a non-stop mode test originally inspired by signal-while-stepping-over-bp-other-thread.exp, to stress this independently of all-stop on top of non-stop. Fixing it required the changes described below. The test will be added in a following commit. 1) linux-nat.c has code in place that picks an event LWP at random out of all that have had events. This is because on the kernel side, "waitpid(-1, ...)" just walks the task list linearly looking for the first that had an event. But, this code is currently only used in all-stop mode. So with a multi-threaded program that has multiple events triggering debug events in parallel, GDB ends up starving some threads. To make the event randomization work in non-stop mode too, the patch makes us pull out all the already pending events on the kernel side, with waitpid, before deciding which LWP to report to the core. There's some code in linux_wait that takes care of leaving events pending if they were for LWPs the caller is not interested in. The patch moves that to linux_nat_filter_event, so that we only have one place that leaves events pending. With that in place, conceptually, the flow is simpler and more normalized: #1 - walk the LWP list looking for an LWP with a pending event to report. #2 - if no pending event, pull events out of the kernel, and store them in the LWP structures as pending. #3- goto #1. 2) Then, currently the event randomization code only considers SIGTRAP (or trap-like) events. That means that if e.g., have have multiple threads stepping in parallel that hit a breakpoint that needs stepping over, and one gets a signal, the signal may end up never getting processed, because GDB will always be giving priority to the SIGTRAPs. The patch fixes this by making the randomization code consider all kinds of pending events. 3) If multiple threads hit a breakpoint, we report one of those, and "cancel" the others. Cancelling means decrementing the PC, and discarding the event. If the next time the LWP is resumed the breakpoint is still installed, the LWP should hit it again, and we'll report the hit then. The problem I found is that this delays threads from advancing too much, with the kernel potentially ending up scheduling the same threads over and over, and others not advancing. So the patch switches away from cancelling the breakpoints, and instead remembering that the LWP had stopped for a breakpoint. If on resume the breakpoint is still installed, we report it. If it's no longer installed, we discard the pending event then. This is actually how GDBserver used to handle this before d50171e (Teach linux gdbserver to step-over-breakpoints), but with the difference that back then we'd delay adjusting the PC until resuming, which made it so that "info threads" could wrongly see threads with unadjusted PCs. gdb/ 2015-01-09 Pedro Alves <[email protected]> * breakpoint.c (hardware_breakpoint_inserted_here_p): New function. * breakpoint.h (hardware_breakpoint_inserted_here_p): New declaration. * linux-nat.c (linux_nat_status_is_event): Move higher up in file. (linux_resume_one_lwp): Store the thread's PC. Adjust to clear stop_reason. (check_stopped_by_watchpoint): New function. (save_sigtrap): Reimplement. (linux_nat_stopped_by_watchpoint): Adjust. (linux_nat_lp_status_is_event): Delete. (stop_wait_callback): Only call save_sigtrap after storing the pending status. (status_callback): If the thread had been stopped for a breakpoint that has since been removed, discard the event and resume the LWP. (count_events_callback, select_event_lwp_callback): Use lwp_status_pending_p instead of linux_nat_lp_status_is_event. (cancel_breakpoint): Rename to ... (check_stopped_by_breakpoint): ... this. Record whether the LWP stopped for a software breakpoint or hardware breakpoint. (select_event_lwp): Only give preference to the stepping LWP in all-stop mode. Adjust comments. (stop_and_resume_callback): Remove references to new_pending_p. (linux_nat_filter_event): Likewise. Leave exit events of the leader thread pending here. Handle signal short circuiting here. Only call save_sigtrap after storing the pending waitstatus. (linux_nat_wait_1): Remove 'retry' label. Remove references to new_pending. Don't handle leaving events the caller is not interested in pending here, nor handle signal short-circuiting here. Also give equal priority to all LWPs that have had events in non-stop mode. If reporting a software breakpoint event, unadjust the LWP's PC. * linux-nat.h (enum lwp_stop_reason): New. (struct lwp_info) <stop_pc>: New field. (struct lwp_info) <stopped_by_watchpoint>: Delete field. (struct lwp_info) <stop_reason>: New field. * x86-linux-nat.c (x86_linux_prepare_to_resume): Adjust.

If the user: #1 - disables the TUI #2 - resizes the terminal #3 - and then re-enables the TUI the next wgetch() returns KEY_RESIZE. This indicates to the ncurses client that ncurses detected that the terminal has been resized. We don't handle KEY_RESIZE anywhere, so it gets passed on to readline which interprets it as a multibyte character, and then the end result is that the first key press after enabling the TUI is misinterpreted. We shouldn't really need to handle KEY_RESIZE (and not all ncurses implementations have that). We have our own SIGWINCH handler, and, when we re-enable the TUI, we explicitly detect terminal resizes and resize all windows. The reason ncurses currently does detects a resize is that something within tui_enable forces a refresh/display of some window before we get to do the actual resizing. Setting a break on ncurses' 'resizeterm' function helps find the culprit(s): (top-gdb) bt #0 resizeterm (ToLines=28, ToCols=114) at ../../ncurses/base/resizeterm.c:462 #1 0x0000003b42812f3f in _nc_update_screensize (sp=0x2674730) at ../../ncurses/tinfo/lib_setup.c:443 #2 0x0000003b0821cbe0 in doupdate () at ../../ncurses/tty/tty_update.c:726 #3 0x0000003b08215539 in wrefresh (win=0x2a7bc00) at ../../ncurses/base/lib_refresh.c:65 #4 0x00000000005257cb in tui_refresh_win (win_info=0xd73d60 <_locator>) at /home/pedro/gdb/mygit/src/gdb/tui/tui-wingeneral.c:60 #5 0x000000000052265b in tui_show_locator_content () at /home/pedro/gdb/mygit/src/gdb/tui/tui-stack.c:269 #6 0x00000000005273a6 in tui_set_key_mode (mode=TUI_COMMAND_MODE) at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:321 #7 0x00000000005278c7 in tui_enable () at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:494 #8 0x0000000000527011 in tui_rl_switch_mode (notused1=1, notused2=1) at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:108 That is, tui_enable calls tui_set_key_mode before we've resized all windows, and that refreshes a window as side effect. And if we're already debugging something (there's a frame), then we'll instead show a window from within tui_show_frame_info: (top-gdb) bt #0 resizeterm (ToLines=28, ToCols=114) at ../../ncurses/base/resizeterm.c:462 #1 0x0000003b42812f3f in _nc_update_screensize (sp=0x202e6c0) at ../../ncurses/tinfo/lib_setup.c:443 #2 0x0000003b0821cbe0 in doupdate () at ../../ncurses/tty/tty_update.c:726 #3 0x0000003b08215539 in wrefresh (win=0x2042890) at ../../ncurses/base/lib_refresh.c:65 #4 0x00000000005257cb in tui_refresh_win (win_info=0xd73d60 <_locator>) at /home/pedro/gdb/mygit/src/gdb/tui/tui-wingeneral.c:60 #5 0x000000000052265b in tui_show_locator_content () at /home/pedro/gdb/mygit/src/gdb/tui/tui-stack.c:269 #6 0x0000000000522931 in tui_show_frame_info (fi=0x16b9cc0) at /home/pedro/gdb/mygit/src/gdb/tui/tui-stack.c:364 #7 0x00000000005278ba in tui_enable () at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:491 #8 0x0000000000527011 in tui_rl_switch_mode (notused1=1, notused2=1) at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:108 The fix is to resize windows earlier. gdb/ChangeLog: 2015-02-17 Pedro Alves <[email protected]> * tui/tui.c (tui_enable): Resize windows before anything might show a window.

The attached patch fixes the SEGV and lets GDB successfully load all kernel modules installed by default on RHEL 7. Valgrind on F-21 x86_64 host has shown me more clear what is the problem: Reading symbols from /home/jkratoch/t/cordic.ko...Reading symbols from /home/jkratoch/t/cordic.ko.debug...================================================================= ==22763==ERROR: AddressSanitizer: heap-use-after-free on address 0x6120000461c8 at pc 0x150cdbd bp 0x7fffffffc7e0 sp 0x7fffffffc7d0 READ of size 8 at 0x6120000461c8 thread T0 #0 0x150cdbc in ppc64_elf_get_synthetic_symtab /home/jkratoch/redhat/gdb-test-asan/bfd/elf64-ppc.c:3282 #1 0x8c5274 in elf_read_minimal_symbols /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1205 #2 0x8c55e7 in elf_symfile_read /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1268 [...] 0x6120000461c8 is located 264 bytes inside of 288-byte region [0x6120000460c0,0x6120000461e0) freed by thread T0 here: #0 0x7ffff715454f in __interceptor_free (/lib64/libasan.so.1+0x5754f) #1 0xde9cde in xfree common/common-utils.c:98 #2 0x9a04f7 in do_my_cleanups common/cleanups.c:155 #3 0x9a05d3 in do_cleanups common/cleanups.c:177 #4 0x8c538a in elf_read_minimal_symbols /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1229 #5 0x8c55e7 in elf_symfile_read /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1268 [...] previously allocated by thread T0 here: #0 0x7ffff71547c7 in malloc (/lib64/libasan.so.1+0x577c7) #1 0xde9b95 in xmalloc common/common-utils.c:41 #2 0x8c4da2 in elf_read_minimal_symbols /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1147 #3 0x8c55e7 in elf_symfile_read /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1268 [...] SUMMARY: AddressSanitizer: heap-use-after-free /home/jkratoch/redhat/gdb-test-asan/bfd/elf64-ppc.c:3282 ppc64_elf_get_synthetic_symtab [...] ==22763==ABORTING A similar case a few lines later I have fixed in 2010 by: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=3f1eff0a2c7f0e7078f011f55b8e7f710aae0cc2 My testcase does not always reproduce it but at least a bit: * GDB without ppc64 target (even as a secondary one) is reported as "untested" * ASAN-built GDB with ppc64 target always crashes (and PASSes with this fix) * unpatched non-ASAN-built GDB with ppc64 target crashes from commandline * unpatched non-ASAN-built GDB with ppc64 target PASSes from runtest (?) gdb/ChangeLog 2015-02-26 Jan Kratochvil <[email protected]> * elfread.c (elf_read_minimal_symbols): Use bfd_alloc for bfd_canonicalize_symtab. gdb/testsuite/ChangeLog 2015-02-26 Jan Kratochvil <[email protected]> * gdb.arch/cordic.ko.bz2: New file. * gdb.arch/cordic.ko.debug.bz2: New file. * gdb.arch/ppc64-symtab-cordic.exp: New file.

…ume LWP Ref: https://sourceware.org/ml/gdb-patches/2015-03/msg00060.html The record-btrace target can hit an assertion here: Breakpoint 1, record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>, regcache=0x9a0a798, regno=8) at gdb/record-btrace.c:1202 1202 gdb_assert (tp != NULL); (gdb) p regcache->ptid $3 = {pid = 23856, lwp = 0, tid = 0} The problem is that the linux-nat layer converts the ptid to a single-process ptid before passing the request down to the inf-ptrace layer, which loses information, and then record-btrace can't find the corresponding thread in GDB's thread list: (gdb) bt #0 record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>, regcache=0x9a0a798, regno=8) at gdb/record-btrace.c:1202 #1 0x083f4ee2 in delegate_fetch_registers (self=0x974bfc0 <record_btrace_ops>, arg1=0x9a0a798, arg2=8) at gdb/target-delegates.c:149 #2 0x08406562 in target_fetch_registers (regcache=0x9a0a798, regno=8) at gdb/target.c:3279 #3 0x08355255 in regcache_raw_read (regcache=0x9a0a798, regnum=8, buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]") at gdb/regcache.c:643 #4 0x083558a7 in regcache_cooked_read (regcache=0x9a0a798, regnum=8, buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]") at gdb/regcache.c:734 #5 0x08355de3 in regcache_cooked_read_unsigned (regcache=0x9a0a798, regnum=8, val=0xbfffe738) at gdb/regcache.c:838 #6 0x0827a106 in i386_linux_resume (ops=0x9737ca0 <linux_ops_saved>, ptid=..., step=1, signal=GDB_SIGNAL_0) at gdb/i386-linux-nat.c:670 #7 0x08280c12 in linux_resume_one_lwp (lp=0x9a0a5b8, step=1, signo=GDB_SIGNAL_0) at gdb/linux-nat.c:1529 #8 0x08281281 in linux_nat_resume (ops=0x98da608, ptid=..., step=1, signo=GDB_SIGNAL_0) at gdb/linux-nat.c:1708 #9 0x0850738e in record_btrace_resume (ops=0x98da608, ptid=..., step=1, signal=GDB_SIGNAL_0) at gdb/record-btrace.c:1760 ... The fix is just to not lose information, and let the intact ptid reach record-btrace.c. Tested on x86-64 Fedora 20, -m32. gdb/ChangeLog: 2015-03-03 Pedro Alves <[email protected]> * i386-linux-nat.c (i386_linux_resume): Get the ptrace PID out of the lwp field of ptid. Pass the full ptid to get_thread_regcache. * inf-ptrace.c (get_ptrace_pid): New function. (inf_ptrace_resume): Use it. * linux-nat.c (linux_resume_one_lwp): Pass the LWP's ptid ummodified to the lower layer.

This patch changes the heuristic the linespec lexer uses to detect a keyword in the input stream. Currently, the heuristic is: a word is a keyword if it 1) points to a string that is a keyword 2) is followed by a non-identifier character This is strictly more correct than using whitespace. For example, it allows constructs such as "break foo if(i == 1)". However, find_condition_and_thread in breakpoint.c does not support this expanded usage. It requires whitespace to follow the keyword. The proposed new heuristic is: a word is a keyword if it 1) points to a string that is a keyword 2) is followed by whitespace 3) is not followed by another keyword string followed by whitespace This additional complexity allows constructs such as "break thread thread 3" and "break thread 3". In the former case, the actual location is a symbol named "thread" to be set on thread #3. In the later case, the location is NULL, i.e., the default location, to be set on thread #3. In order to pass all the new tests added here, I've also had to add a new feature to parse_breakpoint_sals, which expands recognition of the default location to keywords other than "if", which is the only keyword currently permitted with the default (NULL) location, but there is no reason to exclude other keywords. Consequently, it will be possible to use "break thread 1" or "break task 1". In addition to all of this, it is now possible to remove the keyword_ok state from the linespec parser. gdb/ChangeLog * breakpoint.c (parse_breakpoint_sals): Use linespec_lexer_lex_keyword to ascertain if the user specified a NULL location. * linespec.c [IF_KEYWORD_INDEX]: Define. (linespec_lexer_lex_keyword): Export. (struct ls_parser) <keyword_ok>: Remove. A keyword is only a keyword if not followed by another keyword. (linespec_lexer_lex_one): Remove keyword_ok handling. Add comment explaining why the parsing stream is not advanced when a keyword is seen. (parse_linespec): Remove parser->keyword_ok. * linespec.h (linespec_lexer_lex_keyword): Add declaration. gdb/testsuite/ChangeLog * gdb.linespec/keywords.c: New file. * gdb.linespec/keywords.exp: New file.

On GNU/Linux, if the target reuses the TID of a thread that GDB still has in its list marked as THREAD_EXITED, GDB crashes, like: (gdb) continue Continuing. src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error) Here: (top-gdb) bt #0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.") at src/gdb/common/errors.c:54 #1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789 #2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114 #3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127 #4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478 #5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722 #6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1) at src/gdb/linux-thread-db.c:1525 #7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116 #8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206 #9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275 #10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56 #11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655 I managed to come up with a test that reliably reproduces this. It spawns enough threads for the pid number space to wrap around, so could potentially take a while. On my box that's 4 seconds; on gcc110, a PPC box which has max_pid set to 65536, it's over 10 seconds. So I made the test compute how long that would take, and cap the time waited if it would be unreasonably long. Tested on x86_64 Fedora 20. gdb/ChangeLog: 2015-04-01 Pedro Alves <[email protected]> * linux-thread-db.c (record_thread): Readd the thread to gdb's list if it was marked exited. gdb/testsuite/ChangeLog: 2015-04-01 Pedro Alves <[email protected]> * gdb.threads/tid-reuse.c: New file. * gdb.threads/tid-reuse.exp: New file.

…gnals Both PRs are triggered by the same use case. PR18214 is about software single-step targets. On those, the 'resume' code that detects that we're stepping over a breakpoint and delivering a signal at the same time: /* Currently, our software single-step implementation leads to different results than hardware single-stepping in one situation: when stepping into delivering a signal which has an associated signal handler, hardware single-step will stop at the first instruction of the handler, while software single-step will simply skip execution of the handler. ... Fortunately, we can at least fix this particular issue. We detect here the case where we are about to deliver a signal while software single-stepping with breakpoints removed. In this situation, we revert the decisions to remove all breakpoints and insert single- step breakpoints, and instead we install a step-resume breakpoint at the current address, deliver the signal without stepping, and once we arrive back at the step-resume breakpoint, actually step over the breakpoint we originally wanted to step over. */ doesn't handle the case of _another_ thread also needing to step over a breakpoint. Because the other thread is just resumed at the PC where it had stopped and a breakpoint is still inserted there, the thread immediately re-traps the same breakpoint. This test exercises that. On software single-step targets, it fails like this: KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr2: continue to sigusr1_handler gdb.log (simplified): (gdb) continue Continuing. Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66 66 callme (); /* set breakpoint thread 2 here */ (gdb) thread 3 (gdb) queue-signal SIGUSR1 (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fc1740 (LWP 24824))] #0 main () at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:106 106 wait_threads (); /* set wait-threads breakpoint here */ (gdb) break sigusr1_handler Breakpoint 5 at 0x400837: file src/gdb/testsuite/gdb.threads/multiple-step-overs.c, line 31. (gdb) continue Continuing. [Switching to Thread 0x7ffff7fc0700 (LWP 24828)] Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66 66 callme (); /* set breakpoint thread 2 here */ (gdb) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler For good measure, I made the test try displaced stepping too. And then I found it crashes GDB on x86-64 (a hardware step target), but only when displaced stepping... : KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr1: continue to sigusr1_handler (PRMS: gdb/18216) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr2: continue to sigusr1_handler (PRMS: gdb/18216) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr3: continue to sigusr1_handler (PRMS: gdb/18216) Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964 4964 if (sr_bp->loc->permanent Setting up the environment for debugging gdb. Breakpoint 1 at 0x79fcfc: file src/gdb/common/errors.c, line 54. Breakpoint 2 at 0x50a26c: file src/gdb/cli/cli-cmds.c, line 217. (top-gdb) p sr_bp $1 = (struct breakpoint *) 0x0 (top-gdb) bt #0 0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964 #1 0x000000000062a1af in handle_signal_stop (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4715 #2 0x0000000000629097 in handle_inferior_event (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4165 #3 0x0000000000627482 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3298 #4 0x000000000064ad7b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56 #5 0x00000000004c375f in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4658 #6 0x0000000000648c47 in handle_file_event (file_ptr=0x2e0eaa0, ready_mask=1) at src/gdb/event-loop.c:658 The all-stop-non-stop series fixes this, but meanwhile, this augments the multiple-step-overs.exp test to cover this, KFAILed. gdb/testsuite/ChangeLog: 2015-04-08 Pedro Alves <[email protected]> PR gdb/18214 PR gdb/18216 * gdb.threads/multiple-step-overs.c (sigusr1_handler): New function. (main): Install it as SIGUSR1 handler. * gdb.threads/multiple-step-overs.exp (setup): Remove 'prefix' parameter. Always use "setup" as prefix. Toggle "set displaced-stepping" off/on depending on global. Don't switch to thread 1 here. (top level): Add displaced stepping "off/on" test axis. Update "setup" calls. Wrap each subtest with with_test_prefix. Test continuing with a queued signal in each thread.

…gdbserver Fixes: -FAIL: gdb.trace/mi-tracepoint-changed.exp: reconnect: break-info 1 +PASS: gdb.trace/mi-tracepoint-changed.exp: reconnect: tracepoint created +PASS: gdb.trace/mi-tracepoint-changed.exp: reconnect: tracepoint on marker is installed +PASS: gdb.trace/mi-tracepoint-changed.exp: reconnect: break-info 1 -FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv1 created -FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv2 created +PASS: gdb.trace/mi-tsv-changed.exp: upload: tsv1 created +PASS: gdb.trace/mi-tsv-changed.exp: upload: tsv2 created These tests do something like this: #0 - start gdb/gdbserver normally #1 - setup some things in the debug session #2 - disconnect from gdbserver #3 - restart gdb #4 - reconnect to gdbserver The problem is that the native-extended-gdbserver board always spawns a new gdbserver instance in #3 (and has gdb connect to that). So when the test gets to #4, it connects to that new instance instead of the old one: (gdb) spawn ../gdbserver/gdbserver --multi :2354 Listening on port 2354 target extended-remote localhost:2354 Remote debugging using localhost:2354 ... spawn ../gdbserver/gdbserver --multi :2355 Listening on port 2355 47-target-select extended-remote localhost:2355 =tsv-created,name="trace_timestamp",initial="0"\n 47^connected (gdb) ... 47-target-select extended-remote localhost:2355 47^connected (gdb) FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv1 created FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv2 created testsuite/ChangeLog: 2015-04-16 Pedro Alves <[email protected]> * boards/native-extended-gdbserver.exp (mi_gdb_start): Don't start a new gdbserver if gdbserver_reconnect_p is set.

Hi, I see the fail in gdb.base/relativedebug.exp on aarch64 box on which glibc doesn't have debug info, bt^M #0 0x0000002000061a88 in raise () from /lib/aarch64-linux-gnu/libc.so.6^M #1 0x0000002000064efc in abort () from /lib/aarch64-linux-gnu/libc.so.6^M #2 0x0000000000400640 in handler (signo=14) at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:25^M #3 <signal handler called>^M #4 0x00000020000cc478 in ?? () from /lib/aarch64-linux-gnu/libc.so.6^M #5 0x0000000000400664 in main () at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:32^M (gdb) FAIL: gdb.base/relativedebug.exp: pause found in backtrace if glibc has debug info, this test doesn't fail. In sysdeps/unix/sysv/linux/generic/pause.c, __libc_pause calls __syscall_pause, static int __syscall_pause (void) { sigset_t set; int rc = INLINE_SYSCALL (rt_sigprocmask, 4, SIG_BLOCK, NULL, &set, _NSIG / 8); if (rc == 0) rc = INLINE_SYSCALL (rt_sigsuspend, 2, &set, _NSIG / 8); return rc; } int __libc_pause (void) { if (SINGLE_THREAD_P) return __syscall_pause (); <--- tail call int oldtype = LIBC_CANCEL_ASYNC (); int result = __syscall_pause (); LIBC_CANCEL_RESET (oldtype); return result; } and GDB unwinder is confused by the GCC optimized code, (gdb) disassemble pause Dump of assembler code for function pause: 0x0000007fb7f274c4 <+0>: stp x29, x30, [sp,#-32]! 0x0000007fb7f274c8 <+4>: mov x29, sp 0x0000007fb7f274cc <+8>: adrp x0, 0x7fb7fd2000 0x0000007fb7f274d0 <+12>: ldr w0, [x0,#364] 0x0000007fb7f274d4 <+16>: stp x19, x20, [sp,#16] 0x0000007fb7f274d8 <+20>: cbnz w0, 0x7fb7f274e8 <pause+36> 0x0000007fb7f274dc <+24>: ldp x19, x20, [sp,#16] 0x0000007fb7f274e0 <+28>: ldp x29, x30, [sp],#32 0x0000007fb7f274e4 <+32>: b 0x7fb7f27434 <---- __syscall_pause 0x0000007fb7f274e8 <+36>: bl 0x7fb7f5e080 Note that the program stops in __syscall_pause, but its symbol is stripped in glibc, so GDB doesn't know where the program stops. __syscall_pause is a tail call in __libc_pause, so it returns to main instead of __libc_pause. As a result, the backtrace is like, #0 0x0000007fb7ebca88 in raise () from /lib/aarch64-linux-gnu/libc.so.6 #1 0x0000007fb7ebfefc in abort () from /lib/aarch64-linux-gnu/libc.so.6 #2 0x0000000000400640 in handler (signo=14) at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:25 #3 <signal handler called> #4 0x0000007fb7f27478 in ?? () from /lib/aarch64-linux-gnu/libc.so.6 <-- [in __syscall_pause] #5 0x0000000000400664 in main () at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:32 looks GDB does nothing wrong here. I looked back at the test case gdb.base/relativedebug.exp, which was added https://sourceware.org/ml/gdb-patches/2006-10/msg00305.html This test was indented to test the problem that "backtraces no longer display some libc functions" after separate debug info is installed. IOW, it makes few sense to test against libc which doesn't have debug info at all, such as my case. This patch is to tweak the test case to catch the output of "info shared", if "(*)" is found for libc.so, which means libc doesn't have debug info, then skip the test. gdb/testsuite: 2015-04-30 Yao Qi <[email protected]> * gdb.base/relativedebug.exp: Invoke gdb command "info sharedlibrary", and if libc.so doesn't have debug info, skip the test.

(gdb) PASS: gdb.compile/compile.exp: set unwindonsignal on compile code *(volatile int *) 0 = 0; Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7fba426 in _gdb_expr (__regs=0x7ffff7fb8000) at gdb command line:1 1 gdb command line: No such file or directory. ================================================================= ==10462==ERROR: AddressSanitizer: heap-use-after-free on address 0x621000cf7a3d at pc 0x0000004e46b9 bp 0x7ffdeb0f7a40 sp 0x7ffdeb0f71b8 READ of size 10 at 0x621000cf7a3d thread T0 #0 0x4e46b8 in printf_common(void*, char const*, __va_list_tag*) [clone .isra.6] (/home/jkratoch/redhat/gdb-clean-asan/gdb/gdb+0x4e46 b8) #1 0x4f645e in vasprintf (/home/jkratoch/redhat/gdb-clean-asan/gdb/gdb+0x4f645e) #2 0xe5cf00 in xstrvprintf common/common-utils.c:120 #3 0xe74192 in throw_it common/common-exceptions.c:332 #4 0xe742f6 in throw_verror common/common-exceptions.c:361 #5 0xddc89e in verror /home/jkratoch/redhat/gdb-clean-asan/gdb/utils.c:541 #6 0xe734bd in error common/errors.c:43 #7 0xafa1d6 in call_function_by_hand_dummy /home/jkratoch/redhat/gdb-clean-asan/gdb/infcall.c:1031 #8 0xe81858 in compile_object_run compile/compile-object-run.c:119 #9 0xe7733c in eval_compile_command compile/compile.c:577 #10 0xe7541e in compile_code_command compile/compile.c:153 It is obvious why that happens, dummy_frame_pop() will call compile objfile cleanup which will free that objfile and NAME then becomes a stale pointer. > Is there any reason we release OBJFILE in the dummy frame dtor? Why > don't we register a cleanup to release in OBJFILE in compile_object_run? > together with releasing compile_module? 'struct compile_module' has a > field objfile, which should be released together with > 'struct compile_module' instead of dummy_frame. (gdb) break puts Breakpoint 2 at 0x3830c6fd30: file ioputs.c, line 34. (gdb) compile code puts("hello") Breakpoint 2, _IO_puts (str=0x7ffff7ff8000 "hello") at ioputs.c:34 34 { The program being debugged stopped while in a function called from GDB. Evaluation of the expression containing the function (_gdb_expr) will be abandoned. When the function is done executing, GDB will silently stop. (gdb) bt (gdb) _ Now compile_object_run() called from line (gdb) compile code puts("hello") has finished for a long time. But we still need to have that injected code OBJFILE valid when GDB is executing it. Therefore OBJFILE is freed only from destructor of the frame #1. At the patched line of call_function_by_hand_dummy() the dummy frame destructor has not yet been run but it will be run before the fetched NAME will get used. gdb/ChangeLog 2015-05-19 Jan Kratochvil <[email protected]> Fix ASAN crash for gdb.compile/compile.exp. * infcall.c (call_function_by_hand_dummy): Use xstrdup for NAME.

…info In some cases tui_show_frame_info() may get called while the inferior's terminal settings are still in effect. But when we call this function we absolutely need to have our terminal settings in effect because the function is responsible for redrawing TUI's windows following a change in the selected frame or a change in the PC. If our terminal settings are not in effect, the screen does not get redrawn properly, causing temporary display artifacts (which can be fixed via ^L). This scenario happens most prominently when stepping through a program in TUI while a watchpoint is in effect. Here is an example backtrace for when tui_show_frame_info() gets called while target_terminal_is_inferior() == 1: #1 0x00000000004988ee in tui_selected_frame_level_changed_hook (level=0) #2 0x0000000000617b99 in select_frame (fi=0x18c9820) #3 0x0000000000617c3f in get_selected_frame (message=message@entry=0x0) #4 0x00000000004ce534 in update_watchpoint (b=b@entry=0x2d9a760, reparse=reparse@entry=0) #5 0x00000000004d625e in insert_breakpoints () #6 0x0000000000531cfe in keep_going (ecs=ecs@entry=0x7ffea7884ac0) #7 0x00000000005326d7 in process_event_stop_test (ecs=ecs@entry=0x7ffea7884ac0) #8 0x000000000053596e in handle_inferior_event_1 (ecs=0x7ffea7884ac0) The fix is simple: call target_terminal_ours_for_output() before calling tui_show_frame_info() in TUI's frame-changed hook, making sure to restore the original terminal settings afterwards. gdb/ChangeLog: * tui/tui-hooks.c (tui_selected_frame_level_changed_hook): Call target_terminal_ours_for_output() before calling tui_show_frame_info(), and restore the original terminal settings afterwards.

…event The tail end of linux_wait_1 isn't expecting that the select_event_lwp machinery can pick a whole-process exit event to report to GDB. When that happens, both gdb and gdbserver end up quite confused: ... (gdb) [Thread 24971.24971] #1 stopped. 0x0000003615a011f0 in ?? () c& Continuing. (gdb) [New Thread 24971.24981] [New Thread 24983.24983] [New Thread 24971.24982] [Thread 24983.24983] #3 stopped. 0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130 130 pid = ARCH_FORK (); [New Thread 24984.24984] Error in re-setting breakpoint -16: PC register is not available Error in re-setting breakpoint -17: PC register is not available Error in re-setting breakpoint -18: PC register is not available Error in re-setting breakpoint -19: PC register is not available Error in re-setting breakpoint -24: PC register is not available Error in re-setting breakpoint -25: PC register is not available Error in re-setting breakpoint -26: PC register is not available Error in re-setting breakpoint -27: PC register is not available Error in re-setting breakpoint -28: PC register is not available Error in re-setting breakpoint -29: PC register is not available Error in re-setting breakpoint -30: PC register is not available PC register is not available (gdb) gdb/gdbserver/ChangeLog: 2015-08-06 Pedro Alves <[email protected]> * linux-low.c (add_lwp): Set waitstatus to TARGET_WAITKIND_IGNORE. (linux_thread_alive): Use lwp_is_marked_dead. (extended_event_reported): Delete. (linux_wait_1): Check if waitstatus is TARGET_WAITKIND_IGNORE instead of extended_event_reported. (mark_lwp_dead): Don't set the 'dead' flag. Store the waitstatus as well. (lwp_is_marked_dead): New function. (lwp_running): Use lwp_is_marked_dead. * linux-low.h: Delete 'dead' field, and update 'waitstatus's comment.

Making all-stop run on top of non-stop caused a small regression in behavior. This was observed on x86_64-linux. The attached testcase is in C whereas the investigation was done with an Ada program, but it's the same scenario, and using a C testcase allows wider testing. Basically: I am debugging a single-threaded program, and currently stopped inside a function provided by a shared-library, at a line calling a subprogram provided by a second shared library, and trying to "next" over that function call. Before we changed the default all-stop behavior, we had: 7 Impl_Initialize; -- Stop here and try "next" over this line (gdb) n 8 return 5; <<-- OK But now, "next" just stops much earlier: (gdb) n 0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so What happens is that next stops at a call instruction, which calls the function's PLT, and GDB fails to notice that the inferior stepped into a subroutine, and so decides that we're done. We can see another symptom of the same issue by looking at the backtrace at the point GDB stopped: (gdb) bt #0 0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so #1 0x00000000f7bd86f9 in ?? () #2 0x00007fffffffdf50 in ?? () #3 0x0000000000401893 in a () at /[...]/a.adb:7 Backtrace stopped: frame did not save the PC With a functioning GDB, the backtrace looks like the following instead: #0 0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so #1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7 #2 0x0000000000401893 in a () at /[...]/a.adb:7 Note how, for frame #1, the address looks quite similar, except for the high-order bits not being set: #1 0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7 <<<-- OK #1 0x00000000f7bd86f9 in ?? () <<<-- WRONG ^^^^ |||| Wrong Investigating this further led me to displaced stepping. As we are "next"-ing from a location where a breakpoint is inserted, we need to step out of it, and since we're on non-stop mode, we need to do it using displaced stepping. And looking at amd64-tdep.c:amd64_displaced_step_fixup, I found the code that handles the return address: regcache_cooked_read_unsigned (regs, AMD64_RSP_REGNUM, &rsp); retaddr = read_memory_unsigned_integer (rsp, retaddr_len, byte_order); retaddr = (retaddr - insn_offset) & 0xffffffffUL; The mask used to compute retaddr looks wrong to me, keeping only 4 bytes instead of 8, and explains why the high order bits of the backtrace are unset. What happens is that, after the displaced stepping has completed, GDB restores that return address at the location where the program expects it. But because the top half bits of the address have been masked out, the return address is now invalid. The incorrect behavior of the "next" command and the backtrace at that location are the first symptoms of that. Another symptom is that this actually alters the behavior of the program, where a "cont" from there soon leads to a SEGV when the inferior tries to jump back to that incorrect return address: (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00000000f7bd86f9 in ?? () ^^^^^^^^^^^^^^^^^^ This patch fixes the issue by using a mask that seems more appropriate for this architecture. gdb/ChangeLog: * amd64-tdep.c (amd64_displaced_step_fixup): Fix the mask used to compute RETADDR. gdb/testsuite/ChangeLog: * gdb.base/dso2dso-dso2.c, gdb.base/dso2dso-dso2.h, gdb.base/dso2dso-dso1.c, gdb.base/dso2dso-dso1.h, gdb.base/dso2dso.c, gdb.base/dso2dso.exp: New files. Tested on x86_64-linux, no regression.

…g-bp.exp Running that test in a loop, I found a gdbserver core dump with the following back trace: Core was generated by `../gdbserver/gdbserver --once --multi :2346'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236 236 return inferior->regcache_data; (gdb) up #1 0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31 31 regcache = (struct regcache *) inferior_regcache_data (thread); (gdb) bt #0 0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236 #1 0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31 #2 0x0000000000409271 in prepare_resume_reply (buf=0x20dd593 "", ptid=..., status=0x20edce0) at src/gdb/gdbserver/remote-utils.c:1147 #3 0x000000000040ab0a in vstop_notif_reply (event=0x20edcc0, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/server.c:183 #4 0x0000000000426b38 in notif_write_event (notif=0x66e6c0 <notif_stop>, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/notif.c:69 #5 0x0000000000426c55 in handle_notif_ack (own_buf=0x20dd590 "T05", packet_len=8) at src/gdb/gdbserver/notif.c:113 #6 0x000000000041118f in handle_v_requests (own_buf=0x20dd590 "T05", packet_len=8, new_packet_len=0x7fff742c77b8) at src/gdb/gdbserver/server.c:2862 #7 0x0000000000413850 in process_serial_event () at src/gdb/gdbserver/server.c:4148 #8 0x0000000000413945 in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:4196 #9 0x000000000041a1ef in handle_file_event (event_file_desc=5) at src/gdb/gdbserver/event-loop.c:429 #10 0x00000000004199b6 in process_event () at src/gdb/gdbserver/event-loop.c:184 #11 0x000000000041a735 in start_event_loop () at src/gdb/gdbserver/event-loop.c:547 #12 0x00000000004123d2 in captured_main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3562 #13 0x000000000041252e in main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3631 Clearly this means that a thread pushed a stop reply in the event queue, and then before GDB confused the event, the whole process died, along with its thread. But the pending thread event was left dangling. When GDB fetched that event, gdbserver looked up the corresponding thread, but found NULL; not expecting this, gdbserver crashes when it tries to read this thread's registers. gdb/gdbserver/ 2015-08-21 Pedro Alves <[email protected]> PR gdb/18749 * inferiors.c (remove_thread): Discard any pending stop reply for this thread. * server.c (remove_all_on_match_pid): Rename to ... (remove_all_on_match_ptid): ... this. Work with a filter ptid instead of a pid. (discard_queued_stop_replies): Change parameter to a ptid. Now extern. (handle_v_kill, kill_inferior_callback) (process_serial_event): Adjust. (captured_main): Call initialize_notif before starting the program, thus before threads are created. * server.h (discard_queued_stop_replies): Declare.

I build GDB with asan, and run test case hook-stop.exp, and threadapply.exp, I got the following asan error, =================================================================^M ^[[1m^[[31m==2291==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000999c4 at pc 0x000000826022 bp 0x7ffd28a8ff70 sp 0x7ffd28a8ff60^M ^[[1m^[[0m^[[1m^[[34mREAD of size 4 at 0x6160000999c4 thread T0^[[1m^[[0m^M #0 0x826021 in release_stop_context_cleanup ../../binutils-gdb/gdb/infrun.c:8203^M #1 0x72798a in do_my_cleanups ../../binutils-gdb/gdb/common/cleanups.c:154^M #2 0x727a32 in do_cleanups(cleanup*) ../../binutils-gdb/gdb/common/cleanups.c:176^M #3 0x826895 in normal_stop() ../../binutils-gdb/gdb/infrun.c:8381^M #4 0x815208 in fetch_inferior_event(void*) ../../binutils-gdb/gdb/infrun.c:4011^M #5 0x868aca in inferior_event_handler(inferior_event_type, void*) ../../binutils-gdb/gdb/inf-loop.c:44^M .... ^[[1m^[[32m0x6160000999c4 is located 68 bytes inside of 568-byte region [0x616000099980,0x616000099bb8)^M ^[[1m^[[0m^[[1m^[[35mfreed by thread T0 here:^[[1m^[[0m^M #0 0x7fb0bc1312ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)^M #1 0xb8c62f in xfree(void*) ../../binutils-gdb/gdb/common/common-utils.c:100^M #2 0x83df67 in free_thread ../../binutils-gdb/gdb/thread.c:207^M #3 0x83dfd2 in init_thread_list() ../../binutils-gdb/gdb/thread.c:223^M #4 0x805494 in kill_command ../../binutils-gdb/gdb/infcmd.c:2595^M .... Detaching from program: /home/yao.qi/SourceCode/gnu/build-with-asan/gdb/testsuite/outputs/gdb.threads/threadapply/threadapply, process 2399^M =================================================================^M ^[[1m^[[31m==2387==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000a98c0 at pc 0x00000083fd28 bp 0x7ffd401c3110 sp 0x7ffd401c3100^M ^[[1m^[[0m^[[1m^[[34mREAD of size 4 at 0x6160000a98c0 thread T0^[[1m^[[0m^M #0 0x83fd27 in thread_alive ../../binutils-gdb/gdb/thread.c:741^M #1 0x844277 in thread_apply_all_command ../../binutils-gdb/gdb/thread.c:1804^M .... ^M ^[[1m^[[32m0x6160000a98c0 is located 64 bytes inside of 568-byte region [0x6160000a9880,0x6160000a9ab8)^M ^[[1m^[[0m^[[1m^[[35mfreed by thread T0 here:^[[1m^[[0m^M #0 0x7f59a7e322ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)^M #1 0xb8c62f in xfree(void*) ../../binutils-gdb/gdb/common/common-utils.c:100^M #2 0x83df67 in free_thread ../../binutils-gdb/gdb/thread.c:207^M #3 0x83dfd2 in init_thread_list() ../../binutils-gdb/gdb/thread.c:223^M This patch fixes the issue by deleting thread_info object if it is deletable, otherwise, mark it as exited (by set_thread_exited). Function set_thread_exited is shared from delete_thread_1. This patch also moves field "refcount" to private and methods incref and decref. Additionally, we stop using "ptid_t" in "struct current_thread_cleanup" to reference threads, instead we use "thread_info" directly. Due to this change, we don't need restore_current_thread_ptid_changed anymore. gdb: 2017-04-10 Yao Qi <[email protected]> PR gdb/19942 * gdbthread.h (thread_info::deletable): New method. (thread_info::incref): New method. (thread_info::decref): New method. (thread_info::refcount): Move it to private. * infrun.c (save_stop_context): Call inc_refcount. (release_stop_context_cleanup): Likewise. * thread.c (set_thread_exited): New function. (init_thread_list): Delete "tp" only it is deletable, otherwise call set_thread_exited. (delete_thread_1): Call set_thread_exited. (current_thread_cleanup) <inferior_pid>: Remove. <thread>: New field. (restore_current_thread_ptid_changed): Removed. (do_restore_current_thread_cleanup): Adjust. (restore_current_thread_cleanup_dtor): Don't call find_thread_ptid. (set_thread_refcount): Use dec_refcount. (make_cleanup_restore_current_thread): Adjust. (thread_apply_all_command): Call inc_refcount. (_initialize_thread): Don't call observer_attach_thread_ptid_changed.

katastic mentioned this issue Nov 23, 2017

Fresh compile from git, Python spams error messages when running gdb #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

makeinfo and flex not found even though they're on path #3

makeinfo and flex not found even though they're on path #3

timotheecour commented Jun 4, 2014

makeinfo and flex not found even though they're on path #3

makeinfo and flex not found even though they're on path #3

Comments

timotheecour commented Jun 4, 2014