Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeinfo and flex not found even though they're on path #3

Open
timotheecour opened this issue Jun 4, 2014 · 0 comments
Open

makeinfo and flex not found even though they're on path #3

timotheecour opened this issue Jun 4, 2014 · 0 comments

Comments

@timotheecour
Copy link

I had to use:make MAKEINFO=/usr/bin/makeinfo FLEX=/usr/bin/flex

ibuclaw pushed a commit that referenced this issue Aug 4, 2014
…ng unwind.

  https://sourceware.org/ml/gdb-patches/2014-05/msg00737.html

Currently a MEMORY_ERROR raised during unwinding a frame will cause the
unwind to stop with an error message, for example:

  (gdb) bt
  #0  breakpt () at amd64-invalid-stack-middle.c:27
  #1  0x00000000004008f0 in func5 () at amd64-invalid-stack-middle.c:32
  #2  0x0000000000400900 in func4 () at amd64-invalid-stack-middle.c:38
  #3  0x0000000000400910 in func3 () at amd64-invalid-stack-middle.c:44
  #4  0x0000000000400928 in func2 () at amd64-invalid-stack-middle.c:50
  Cannot access memory at address 0x2aaaaaab0000

However, frame #4 is marked as being the end of the stack unwind, so a
subsequent request for the backtrace looses the error message, such as:

  (gdb) bt
  #0  breakpt () at amd64-invalid-stack-middle.c:27
  #1  0x00000000004008f0 in func5 () at amd64-invalid-stack-middle.c:32
  #2  0x0000000000400900 in func4 () at amd64-invalid-stack-middle.c:38
  #3  0x0000000000400910 in func3 () at amd64-invalid-stack-middle.c:44
  #4  0x0000000000400928 in func2 () at amd64-invalid-stack-middle.c:50

When fetching the backtrace, or requesting the stack depth using the MI
interface the situation is even worse, the first time a request is made
we encounter the memory error and so the MI returns an error instead of
the correct result, for example:

  (gdb) -stack-info-depth
  ^error,msg="Cannot access memory at address 0x2aaaaaab0000"

Or,

  (gdb) -stack-list-frames
  ^error,msg="Cannot access memory at address 0x2aaaaaab0000"

However, once one of these commands has been used gdb has, internally,
walked the stack and figured that out that frame #4 is the bottom of the
stack, so the second time an MI command is tried you'll get the "expected"
result:

  (gdb) -stack-info-depth
  ^done,depth="5"

Or,

  (gdb) -stack-list-frames
  ^done,stack=[frame={level="0", .. snip lots .. }]

After this patch the MEMORY_ERROR encountered during the frame unwind is
attached to frame #4 as the stop reason, and is displayed in the CLI each
time the backtrace is requested.  In the MI, catching the error means that
the "expected" result is returned the first time the MI command is issued.
So, from the CLI the results of the backtrace will be:

  (gdb) bt
  #0  breakpt () at amd64-invalid-stack-middle.c:27
  #1  0x00000000004008f0 in func5 () at amd64-invalid-stack-middle.c:32
  #2  0x0000000000400900 in func4 () at amd64-invalid-stack-middle.c:38
  #3  0x0000000000400910 in func3 () at amd64-invalid-stack-middle.c:44
  #4  0x0000000000400928 in func2 () at amd64-invalid-stack-middle.c:50
  Backtrace stopped: Cannot access memory at address 0x2aaaaaab0000

Each and every time that the backtrace is requested, while the MI output
will similarly be consistently:

  (gdb) -stack-info-depth
  ^done,depth="5"

Or,

  (gdb) -stack-list-frames
  ^done,stack=[frame={level="0", .. snip lots .. }]

gdb/ChangeLog:

	* frame.c (struct frame_info): Add stop_string field.
	(get_prev_frame_always_1): Renamed from get_prev_frame_always.
	(get_prev_frame_always): Old content moved into
	get_prev_frame_always_1.  Call get_prev_frame_always_1 inside
	TRY_CATCH, handle MEMORY_ERROR exceptions.
	(frame_stop_reason_string): New function definition.
	* frame.h (unwind_stop_reason_to_string): Extend comment to
	mention frame_stop_reason_string.
	(frame_stop_reason_string): New function declaration.
	* stack.c (frame_info): Switch to frame_stop_reason_string.
	(backtrace_command_1): Switch to frame_stop_reason_string.
	* unwind_stop_reason.def: Add UNWIND_MEMORY_ERROR.
	(LAST_ENTRY): Changed to UNWIND_MEMORY_ERROR.
	* guile/lib/gdb.scm: Add FRAME_UNWIND_MEMORY_ERROR to export list.

gdb/doc/ChangeLog:

	* guile.texi (Frames In Guile): Mention FRAME_UNWIND_MEMORY_ERROR.
	* python.texi (Frames In Python): Mention
	gdb.FRAME_UNWIND_MEMORY_ERROR.

gdb/testsuite/ChangeLog:

	* gdb.arch/amd64-invalid-stack-middle.exp: Update expected results.
	* gdb.arch/amd64-invalid-stack-top.exp: Likewise.
ibuclaw pushed a commit that referenced this issue Aug 4, 2014
Since target-async was turned on by default, debugging on Windows
using GDB+GDBserver sometimes hangs while waiting for a RSP reply.

The problem is a race in the gdb_select machinery.

This is what we see for a faulty next on the GDB side:

    (gdb) n
    infrun: clear_proceed_status_thread (Thread 4424)
    infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=1)
    (...)
    infrun: resume (step=1, signal=GDB_SIGNAL_0), ...
    Sending packet: $vCont;s:1148;c#5e...
    *hang*

At this point, attaching a debugger to the hanging GDB confirms that
it is blocked, waiting for a socket event:

    #6  0x757841d8 in WaitForMultipleObjects ()
       from C:\Windows\syswow64\kernel32.dll
    #7  0x004708e7 in gdb_select (n=469, readfds=0x88ca50 <gdb_notifier+784>,
        writefds=0x88cb54 <gdb_notifier+1044>,
        exceptfds=0x88cc58 <gdb_notifier+1304>, timeout=0x0)
        at /[...]/gdb/mingw-hdep.c:172
    #8  0x00527926 in gdb_wait_for_event (block=1)
        at /[...]/gdb/event-loop.c:831
    #9  0x00526ff1 in gdb_do_one_event ()
        at /[...]/gdb/event-loop.c:403

However, on the GDBserver side, we see that GDBserver already sent a
T05 packet reply:

    gdbserver: kernel event EXCEPTION_DEBUG_EVENT for pid=4968 tid=1148
    EXCEPTION_SINGLE_STEP
    Child Stopped with signal = 5
    Writing resume reply for LWP 4968.4424:1
    DEBUG: write_prim ($T0505:c8fe2800;04:a0fe2800;08:38164000;thread:1148;#f0)
           -> 55

To recap, on Windows, 'select' only works with sockets, so we have a
wrapper, gdb_select, that uses the GDB serial abstraction to handle
sockets, consoles, pipes, and serial ports.  Each serial descriptor
has a thread associated (we call those the select threads), and those
threads communicate with the main thread by means of standard Windows
events.

It basically goes like this: gdb_select first loops through all fds of
interest, calling their wait_handle hooks, which returns an event that
WaitForMultipleObjects can wait on.  gdb_select then blocks in
WaitForMultipleObjects with all those event handles.  The wait_handle
hook is responsible for arranging for the returned event to become set
once data is available.  This is done by setting the descriptor's
helper thread running, which itself knows how to wait for data from
the type of handle it manages (sockets, pipes, consoles, files, etc.).
Once data arrives, the select thread sets the corresponding event
which unblocks WaitForMultipleObjects within gdb_select.  However, the
wait_handle hook can also apply an optimization: if data is already
pending, then there's no need to set the thread running, and the
descriptors event can be set immediately.  It's around this latter
aspect that lies the bug/race.

Adding some ad hoc debug logs to ser-mingw.c and mingw-hdep.c, we see
the following sequence of events, right after sending
"$vCont;s:1148;c#5e".  Thread 1 is the main thread, and thread 2 is
the socket's helper/select thread.  gdb_select was only passed one
descriptor to wait on, the remote target's socket.
net_windows_select_thread is the entry point of the select threads for
sockets.

 #1 - thread 1: gdb_select: enter
 #2 - thread 2: net_windows_select_thread: WaitForMultipleObjects blocking

gdb_select walked over the wait_handle hooks, and woke up the socket's
helper thread.  The helper thread is now blocked waiting for socket
events.

 #3 - thread 1: gdb_select: WaitForMultipleObjects polling (timeout=0ms)
 #4 - thread 1: gdb_select: WaitForMultipleObjects returned 102 (WAIT_TIMEOUT)

There was no pending data available yet, and gdb_select was passed
timeout==0ms, and so WaitForMultipleObjects times out immediately.

 #5 - thread 2: net_windows_select_thread: WaitForMultipleObjects returned 1

Just afterwards, socket data arrives, and thread 2 wakes up.  Thread 2
calls WSAEnumNetworkEvents, which clears state->sock_event, and marks
the serial's read_event event, telling the main thread that data is
available.

 #6 - thread 1: gdb_select: call serial_done_wait_handle on each serial

gdb_select stops all the helper/select threads.

 #7 - thread 1: gdb_select: return 0 (WAIT_TIMEOUT)

gdb_select in the main thread returns to the caller.

Note that at this point, data is pending on the socket, the serial's
read_event is set, but the socket's sock_event event is not set, until
_further_ data arrives.

Now GDB does its thing and goes back to the event loop.  That calls
gdb_select, but with timeout==INFINITE.

Again, gdb_select calls the socket serial's wait_handle hook.  It
first clears its events, starting from a clean slate:

  ResetEvent (state->base.read_event);
  ResetEvent (state->base.except_event);
  ResetEvent (state->base.stop_select);

That cleared read_event, which was previously set in #5 above.  And
then it checks for pending events, in the sock_event event:

  /* Check any pending events.  This both avoids starting the thread
     unnecessarily, and handles stray FD_READ events (see below).  */
  if (WaitForSingleObject (state->sock_event, 0) == WAIT_OBJECT_0)
    {

That also fails because state->sock_event was cleared in #5 too...

So the wait_handle hook erroneously decides that it needs to start the
helper thread to wait for input:

 #8 - thread 2: net_windows_select_thread: WaitForMultipleObjects blocking
 #9 - thread 1: gdb_select: WaitForMultipleObjects blocking (INFINITE)

But, GDBserver already sent all it had to send, so both threads waits
forever...

At first I thought that net_windows_wait_handle shouldn't be resetting
state->base.read_event or state->base.except_event, but looking
deeper, the pipe and console wait_handle hooks reset all events too.
It actually makes sense that way -- consuming an event from different
threads is bad practice, and, we should always be able to query
pending state without looking at the state->sock_event from within
net_windows_wait_handle.  The end result is much simpler, and makes
net_windows_select_thread look a lot like console_select_thread,
actually.

gdb/
2014-06-11  Pedro Alves  <[email protected]>

	PR remote/17028
	* ser-mingw.c (net_windows_socket_check_pending): New function.
	(net_windows_select_thread): Ignore spurious wakeups.  Use
	net_windows_socket_check_pending.
	(net_windows_wait_handle): Check for pending events with
	ioctlsocket, through net_windows_socket_check_pending, instead of
	checking the socket's event.
ibuclaw pushed a commit that referenced this issue Aug 4, 2014
On async targets, a synchronous attach is done like this:

   #1 - target_attach is called (PTRACE_ATTACH is issued)
   #2 - a continuation is installed
   #3 - we go back to the event loop
   #4 - target reports stop (SIGSTOP), event loop wakes up, and
        attach continuation is called
   #5 - among other things, the continuation calls
        target_terminal_inferior, which removes stdin from the event
        loop

Note that in #3, GDB is still processing user input.  If the user is
fast enough, e.g., with something like:

  echo -e "attach PID\nset xxx=1" | gdb

... then the "set" command is processed before the attach completes.

We get worse behavior even, if input is a tty and therefore
readline/editing is enabled, with e.g.,:

 (gdb) attach PID\nset xxx=1

we then crash readline/gdb, with:

 Attaching to program: attach-wait-input, process 14537
 readline: readline_callback_read_char() called with no handler!
 Aborted
 $

Fix this by calling target_terminal_inferior before #3 above.

The test covers both scenarios by running with editing/readline forced
to both on and off.

gdb/
2014-07-09  Pedro Alves  <[email protected]>

	* infcmd.c (attach_command_post_wait): Don't call
	target_terminal_inferior here.
	(attach_command): Call it here instead.

gdb/testsuite/
2014-07-09  Pedro Alves  <[email protected]>

	* gdb.base/attach-wait-input.exp: New file.
	* gdb.base/attach-wait-input.c: New file.
ibuclaw pushed a commit that referenced this issue Aug 4, 2014
Here's an example, with the new test:

 gdbserver :9999 gdb.threads/kill
 gdb gdb.threads/kill
 (gdb) b 52
 Breakpoint 1 at 0x4007f4: file kill.c, line 52.
 Continuing.

 Breakpoint 1, main () at kill.c:52
 52        return 0; /* set break here */
 (gdb) k
 Kill the program being debugged? (y or n) y

 gdbserver :9999 gdb.threads/kill
 Process gdb.base/watch_thread_num created; pid = 9719
 Listening on port 1234
 Remote debugging from host 127.0.0.1
 Killing all inferiors
 Segmentation fault (core dumped)

Backtrace:

 (gdb) bt
 #0  0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
 #1  0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
 #2  0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
 #3  0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
 #4  0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
 #5  0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
 #6  0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
 #7  0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
 #8  0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
 #9  0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
 #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283

The problem is that linux_wait_for_event deletes lwps that have exited
(even those not passed in as lwps of interest), while the lwp/thread
list is being walked on with find_inferior.  find_inferior can handle
the current iterated inferior being deleted, but not others.

When killing lwps, we don't really care about any of the pending
status handling of linux_wait_for_event.  We can just waitpid the lwps
directly, which is also what GDB does (see
linux-nat.c:kill_wait_callback).  This way the lwps are not deleted
while we're walking the list.  They'll be deleted by linux_mourn
afterwards.

This crash triggers several times when running the testsuite against
GDBserver with the native-gdbserver board (target remote), but as GDB
can't distinguish between GDBserver crashing and "kill" being
sucessful, as in both cases the connection is closed (the 'k' packet
doesn't require a reply), and the inferior is gone, that results in no
FAIL.

The patch adds a generic test that catches the issue with
extended-remote mode (and works fine with native testing too).  Here's
how it fails with the native-extended-gdbserver board without the fix:

 (gdb) info threads
   Id   Target Id         Frame
   6    Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   5    Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   4    Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   3    Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   2    Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 * 1    Thread 15367.15367 main () at .../gdb.threads/kill.c:52
 (gdb) kill
 Kill the program being debugged? (y or n) y
 Remote connection closed
 ^^^^^^^^^^^^^^^^^^^^^^^^
 (gdb) FAIL: gdb.threads/kill.exp: kill

Extended remote should remain connected after the kill.

gdb/gdbserver/
2014-07-11  Pedro Alves  <[email protected]>

	* linux-low.c (kill_wait_lwp): New function, based on
	kill_one_lwp_callback, but use my_waitpid directly.
	(kill_one_lwp_callback, linux_kill): Use it.

gdb/testsuite/
2014-07-11  Pedro Alves  <[email protected]>

	* gdb.threads/kill.c: New file.
	* gdb.threads/kill.exp: New file.
ibuclaw pushed a commit that referenced this issue Aug 4, 2014
The tests at
<https://sourceware.org/ml/gdb-patches/2014-07/msg00277.html> show
that comparing a fully optimized out value's contents with a value
that has not been optimized out, or is partially optimized out crashes
GDB:

 (gdb) bt
 #0  __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:816
 #1  0x00000000005a1914 in memcmp_with_bit_offsets (ptr1=0x202b2f0 "\n", offset1_bits=0, ptr2=0x0, offset2_bits=0, length_bits=32)
     at /home/pedro/gdb/mygit/build/../src/gdb/value.c:678
 #2  0x00000000005a1a05 in value_available_contents_bits_eq (val1=0x2361ad0, offset1=0, val2=0x23683b0, offset2=0, length=32)
     at /home/pedro/gdb/mygit/build/../src/gdb/value.c:717
 #3  0x00000000005a1c09 in value_available_contents_eq (val1=0x2361ad0, offset1=0, val2=0x23683b0, offset2=0, length=4)
     at /home/pedro/gdb/mygit/build/../src/gdb/value.c:769
 #4  0x00000000006033ed in read_frame_arg (sym=0x1b78d20, frame=0x19bca50, argp=0x7fff4aba82b0, entryargp=0x7fff4aba82d0)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:416
 #5  0x0000000000603abb in print_frame_args (func=0x1b78cb0, frame=0x19bca50, num=-1, stream=0x1aea450) at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:671
 #6  0x0000000000604ae8 in print_frame (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, print_args=1, sal=...)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:1205
 #7  0x0000000000604050 in print_frame_info (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, print_args=1, set_current_sal=1)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:857
 #8  0x00000000006029b3 in print_stack_frame (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, set_current_sal=1)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:169
 #9  0x00000000005fc4b8 in print_stop_event (ws=0x7fff4aba8790) at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:6068
 #10 0x00000000005fc830 in normal_stop () at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:6214

The 'ptr2=0x0' in frame #1 is val2->contents, and since git 4f14910:

    gdb/ChangeLog
    2013-11-26  Andrew Burgess  <[email protected]>

        * value.c (allocate_optimized_out_value): Mark value as non-lazy.

... a fully optimized-out value can have it's value contents buffer
NULL.

As a spotgap fix, revert 4f14910, with a comment.  A full fix would
be too invasive for 7.8.

gdb/
2014-07-22  Pedro Alves  <[email protected]>

	* value.c (allocate_optimized_out_value): Don't mark value as
	non-lazy.
ibuclaw pushed a commit that referenced this issue Aug 4, 2014
The TUI currently crashes when the user types <return> in response to
a pagination prompt:

  $ gdb --tui ...
  *the TUI is now active*
  (gdb) set height 2
  (gdb) help
  List of classes of commands:

  Program received signal SIGSEGV, Segmentation fault.
  strlen () at ../sysdeps/x86_64/strlen.S:106
  106             movdqu  (%rax), %xmm12

  (top-gdb) bt
  #0  strlen () at ../sysdeps/x86_64/strlen.S:106
  #1  0x000000000086be5f in xstrdup (s=0x0) at ../src/libiberty/xstrdup.c:33
  #2  0x00000000005163f9 in tui_prep_terminal (notused1=1) at ../src/gdb/tui/tui-io.c:296
  #3  0x000000000077a7ee in _rl_callback_newline () at ../src/readline/callback.c:82
  #4  0x000000000077a853 in rl_callback_handler_install (prompt=0x0, linefunc=0x618b60 <command_line_handler>) at ../src/readline/callback.c:102
  #5  0x0000000000718a5c in gdb_readline_wrapper_cleanup (arg=0xfd14d0) at ../src/gdb/top.c:788
  #6  0x0000000000596d08 in do_my_cleanups (pmy_chain=0xcf0b38 <cleanup_chain>, old_chain=0x1043d10) at ../src/gdb/cleanups.c:155
  #7  0x0000000000596d75 in do_cleanups (old_chain=0x1043d10) at ../src/gdb/cleanups.c:177
  #8  0x0000000000718bd9 in gdb_readline_wrapper (prompt=0x7fffffffcfa0 "---Type <return> to continue, or q <return> to quit---")
      at ../src/gdb/top.c:835
  #9  0x000000000071cf74 in prompt_for_continue () at ../src/gdb/utils.c:1894
  #10 0x000000000071d434 in fputs_maybe_filtered (linebuffer=0x1043db0 "List of classes of commands:\n\n", stream=0xf72e20, filter=1)
      at ../src/gdb/utils.c:2111
  #11 0x000000000071da0f in vfprintf_maybe_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n", args=0x7fffffffd118, filter=1)
      at ../src/gdb/utils.c:2339
  #12 0x000000000071da4a in vfprintf_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n", args=0x7fffffffd118)
      at ../src/gdb/utils.c:2347
  #13 0x000000000071dc72 in fprintf_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n") at ../src/gdb/utils.c:2399
  #14 0x00000000004f90ab in help_list (list=0xe6d100, cmdtype=0x89ad8c "", class=all_classes, stream=0xf72e20)
      at ../src/gdb/cli/cli-decode.c:1038
  #15 0x00000000004f8dba in help_cmd (arg=0x0, stream=0xf72e20) at ../src/gdb/cli/cli-decode.c:946

Git 0017922 added:

    @@ -776,6 +777,12 @@ gdb_readline_wrapper_cleanup (void *arg)

     gdb_assert (input_handler == gdb_readline_wrapper_line);
     input_handler = cleanup->handler_orig;
  +
  +  /* Reinstall INPUT_HANDLER in readline, without displaying a
  +     prompt.  */
  +  if (async_command_editing_p)
  +    rl_callback_handler_install (NULL, input_handler);

and tui_prep_terminal simply misses handling the case of a NULL
rl_prompt.

I also checked that readline's sources do similar checks.

gdb/
2014-07-24  Pedro Alves  <[email protected]>

	* tui/tui-io.c (tui_prep_terminal): Handle NULL rl_prompt.
ibuclaw pushed a commit that referenced this issue Sep 28, 2014
echo 'void f(char *s){}main(){f((char *)1);}'|gcc -g -x c -;../gdb ./a.out -ex 'b f' -ex r
====ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000aaccf at pc 0x96eea7 bp 0x7fff75bdbc90 sp 0x7fff75bdbc80
READ of size 1 at 0x6020000aaccf thread T0
    #0 0x96eea6 in extract_unsigned_integer .../gdb/findvar.c:108
    #1 0x9df02b in val_print_string .../gdb/valprint.c:2513
[...]
0x6020000aaccf is located 1 bytes to the left of 8-byte region [0x6020000aacd0,0x6020000aacd8)
allocated by thread T0 here:
    #0 0x7f45fad26b97 in malloc (/lib64/libasan.so.1+0x57b97)
    #1 0xdb3409 in xmalloc common/common-utils.c:45
    #2 0x9d8cf9 in read_string .../gdb/valprint.c:1845
    #3 0x9defca in val_print_string .../gdb/valprint.c:2502
[..]
====ABORTING

gdb/
2014-08-18  Jan Kratochvil  <[email protected]>

	Fix -fsanitize=address on unreadable inferior strings.
	* valprint.c (val_print_string): Fix access before BUFFER.
ibuclaw pushed a commit that referenced this issue Sep 28, 2014
https://bugzilla.redhat.com/show_bug.cgi?id=1126177

ERROR: AddressSanitizer: SEGV on unknown address 0x000000000050 (pc 0x000000992bef sp 0x7ffff9039530 bp 0x7ffff9039540
T0)
    #0 0x992bee in value_type .../gdb/value.c:925
    #1 0x87c951 in py_print_single_arg python/py-framefilter.c:445
    #2 0x87cfae in enumerate_args python/py-framefilter.c:596
    #3 0x87e0b0 in py_print_args python/py-framefilter.c:968

It crashes because frame_arg::val is documented it may contain NULL
(frame_arg::error is then non-NULL) but the code does not handle it.

Another bug is that py_print_single_arg() calls goto out of its TRY_CATCH
which messes up GDB cleanup chain crashing GDB later.

It is probably 7.7 regression (I have not verified it) due to the introduction
of Python frame filters.

gdb/ChangeLog

	PR python/17355
	* python/py-framefilter.c (py_print_single_arg): Handle NULL FA->VAL.
	Fix goto out of TRY_CATCH.

gdb/testsuite/ChangeLog

	PR python/17355
	* gdb.python/amd64-py-framefilter-invalidarg.S: New file.
	* gdb.python/py-framefilter-invalidarg-gdb.py.in: New file.
	* gdb.python/py-framefilter-invalidarg.exp: New file.
	* gdb.python/py-framefilter-invalidarg.py: New file.
ibuclaw pushed a commit that referenced this issue Sep 28, 2014
The test does a backtrace to see which thread (#2 or #3) is assigned
to which SIGUSR (1 or 2).  If the main thread gets to all_threads_running
before the sigusr threads get to their entry point, then the function
name isn't in the backtrace and the test fails.

Alas this version of the code is within epsilon of what I started with,
and then over-simplified things.
ibuclaw pushed a commit that referenced this issue Nov 30, 2014
…nd crashes readline/GDB

Jan caught an intermittent GDB crash with the annota1.exp test:

 Starting program: .../gdb/testsuite/gdb.base/annota1 ^M
 [...]
 FAIL: gdb.base/annota1.exp: run until main breakpoint (timeout)
 [...]
 readline: readline_callback_read_char() called with no handler!^M
 ERROR: Process no longer exists

All we need to is to continue the inferior in the foreground, and type
a command while the inferior is running.  E.g.:

 (gdb) set annotate 2

 ▒▒pre-prompt
 (gdb)
 ▒▒prompt
 c

 ▒▒post-prompt
 Continuing.

 ▒▒starting

 ▒▒frames-invalid

 *inferior is running now*

 p 1<ret>

 readline: readline_callback_read_char() called with no handler!
 Aborted (core dumped)
 $


When we run a foreground execution command we call
target_terminal_inferior to stop GDB from processing input, and to put
the inferior's terminal settings in effect.  Then we tell readline to
hide the prompt with display_gdb_prompt, which clears readline's input
callback too.  When the target stops, we call target_terminal_ours,
which re-installs stdin in the event loop, and then we redisplay the
prompt, reinstalling the readline callbacks.

However, when annotations are in effect, the "frames-invalid"
annotation code calls target_terminal_ours after 'resume' had already
called target_terminal_inferior:

 (top-gdb) bt
 #0  0x000000000056b82f in annotate_frames_invalid () at gdb/annotate.c:219
 #1  0x000000000072e6cc in reinit_frame_cache () at gdb/frame.c:1705
 #2  0x0000000000594bb9 in registers_changed_ptid (ptid=...) at gdb/regcache.c:612
 #3  0x000000000064cca1 in target_resume (ptid=..., step=1, signal=GDB_SIGNAL_0) at gdb/target.c:2136
 #4  0x00000000005f57af in resume (step=1, sig=GDB_SIGNAL_0) at gdb/infrun.c:2263
 #5  0x00000000005f6051 in proceed (addr=18446744073709551615, siggnal=GDB_SIGNAL_DEFAULT, step=1) at gdb/infrun.c:2613

And then once we hide the prompt and remove readline's input handler
callback, we're in a bad state.  We end up with the target running
supposedly in the foreground, but with stdin still installed on the
event loop.  Any input then calls into readline, which aborts because
no rl_linefunc callback handler is installed:

 Program received signal SIGABRT, Aborted.
 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);

 (top-gdb) bt
 #0  0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 #1  0x0000003b36a36f68 in __GI_abort () at abort.c:89
 During symbol reading, debug info gives source 9 included from file at zero line 0.
 During symbol reading, debug info gives command-line macro definition with non-zero line 19: _STDC_PREDEF_H 1.
 #2  0x0000000000784a25 in rl_callback_read_char () at src/readline/callback.c:116
 #3  0x0000000000619111 in rl_callback_read_char_wrapper (client_data=0x0) at src/gdb/event-top.c:167
 #4  0x00000000006194e7 in stdin_event_handler (error=0, client_data=0x0) at src/gdb/event-top.c:373
 #5  0x00000000006180da in handle_file_event (data=...) at src/gdb/event-loop.c:763
 #6  0x00000000006175c1 in process_event () at src/gdb/event-loop.c:340
 #7  0x0000000000617688 in gdb_do_one_event () at src/gdb/event-loop.c:404
 #8  0x00000000006176d8 in start_event_loop () at src/gdb/event-loop.c:429
 #9  0x0000000000619143 in cli_command_loop (data=0x0) at src/gdb/event-top.c:182
 #10 0x000000000060f4c8 in current_interp_command_loop () at src/gdb/interps.c:318
 #11 0x0000000000610691 in captured_command_loop (data=0x0) at src/gdb/main.c:323
 #12 0x000000000060c385 in catch_errors (func=0x610676 <captured_command_loop>, func_args=0x0, errstring=0x900241 "", mask=RETURN_MASK_ALL)
     at src/gdb/exceptions.c:237
 #13 0x0000000000611b8f in captured_main (data=0x7fffffffd7b0) at src/gdb/main.c:1151
 #14 0x000000000060c385 in catch_errors (func=0x610a8e <captured_main>, func_args=0x7fffffffd7b0, errstring=0x900241 "", mask=RETURN_MASK_ALL)
     at src/gdb/exceptions.c:237
 #15 0x0000000000611bb8 in gdb_main (args=0x7fffffffd7b0) at src/gdb/main.c:1159
 #16 0x000000000045ef57 in main (argc=3, argv=0x7fffffffd8b8) at src/gdb/gdb.c:32

The fix is to make the annotation code call target_terminal_inferior
again after printing, if the inferior's settings were in effect.

While at it, when we're doing output only, instead of
target_terminal_ours, we should call target_terminal_ours_for_output.
The latter doesn't actually remove stdin from the event loop, and also
leaves SIGINT forwarded to the target.

New test included.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/
2014-10-17  Pedro Alves  <[email protected]>

	PR gdb/17472
	* annotate.c (annotate_breakpoints_invalid): Use
	target_terminal_our_for_output instead of target_terminal_ours.
	Give back the terminal to the target.
	(annotate_frames_invalid): Likewise.

gdb/testsuite/
2014-10-17  Pedro Alves  <[email protected]>

	PR gdb/17472
	* gdb.base/annota-input-while-running.c: New file.
	* gdb.base/annota-input-while-running.exp: New file.
ibuclaw pushed a commit that referenced this issue Nov 30, 2014
If all threads in the target were already running when the user does
"c -a", nothing puts the inferior's terminal settings in effect and
removes stdin from the event loop, which we must when running a
foreground command.  The result is that user input afterwards crashes
readline/gdb:

 (gdb) start
 Temporary breakpoint 1 at 0x4005d4: file continue-all-already-running.c, line 23.
 Starting program: continue-all-already-running

 Temporary breakpoint 1, main () at continue-all-already-running.c:23
 23        sleep (10);
 (gdb) c -a&
 Continuing.
 (gdb) c -a
 Continuing.
 p 1
 readline: readline_callback_read_char() called with no handler!
 Aborted (core dumped)
 $

Backtrace:

 Program received signal SIGABRT, Aborted.
 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
 (top-gdb) p 1
 $1 = 1
 (top-gdb) bt
 #0  0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 #1  0x0000003b36a36f68 in __GI_abort () at abort.c:89
 #2  0x0000000000784aa9 in rl_callback_read_char () at readline/callback.c:116
 #3  0x0000000000619181 in rl_callback_read_char_wrapper (client_data=0x0) at gdb/event-top.c:167
 #4  0x0000000000619557 in stdin_event_handler (error=0, client_data=0x0) at gdb/event-top.c:373
 #5  0x000000000061814a in handle_file_event (data=...) at gdb/event-loop.c:763
 #6  0x0000000000617631 in process_event () at gdb/event-loop.c:340
 #7  0x00000000006176f8 in gdb_do_one_event () at gdb/event-loop.c:404
 #8  0x0000000000617748 in start_event_loop () at gdb/event-loop.c:429
 #9  0x00000000006191b3 in cli_command_loop (data=0x0) at gdb/event-top.c:182
 #10 0x000000000060f538 in current_interp_command_loop () at gdb/interps.c:318
 #11 0x0000000000610701 in captured_command_loop (data=0x0) at gdb/main.c:323
 #12 0x000000000060c3f5 in catch_errors (func=0x6106e6 <captured_command_loop>, func_args=0x0, errstring=0x9002c1 "", mask=RETURN_MASK_ALL)
     at gdb/exceptions.c:237
 #13 0x0000000000611bff in captured_main (data=0x7fffffffd780) at gdb/main.c:1151
 #14 0x000000000060c3f5 in catch_errors (func=0x610afe <captured_main>, func_args=0x7fffffffd780, errstring=0x9002c1 "", mask=RETURN_MASK_ALL)
     at gdb/exceptions.c:237
 #15 0x0000000000611c28 in gdb_main (args=0x7fffffffd780) at gdb/main.c:1159
 #16 0x000000000045ef97 in main (argc=5, argv=0x7fffffffd888) at gdb/gdb.c:32
 (top-gdb)

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/
2014-10-17  Pedro Alves  <[email protected]>

	PR gdb/17300
	* infcmd.c (continue_1): If continuing all threads in the
	foreground, make sure the inferior's terminal settings are put in
	effect.

gdb/testsuite/
2014-10-17  Pedro Alves  <[email protected]>

	PR gdb/17300
	* gdb.base/continue-all-already-running.c: New file.
	* gdb.base/continue-all-already-running.exp: New file.
ibuclaw pushed a commit that referenced this issue Nov 30, 2014
I noticed that with:

 $ TERM=dumb ./gdb -q -nx
 <c-x,a>
 Cannot enable the TUI: terminal doesn't support cursor addressing [TERM=dumb]
 (gdb)

The next key the user types is silently eaten.

The problem is that we're throwing an exception while in a readline
callback that isn't prepared for that:

(top-gdb) bt
#0  tui_enable () at /home/pedro/gdb/mygit/build/../src/gdb/tui/tui.c:388
#1  0x000000000051f47b in tui_rl_switch_mode (notused1=1, notused2=1) at /home/pedro/gdb/mygit/build/../src/gdb/tui/tui.c:101
#2  0x0000000000768d6f in _rl_dispatch_subseq (key=1, map=0xd069c0 <emacs_ctlx_keymap>, got_subseq=0) at /home/pedro/gdb/mygit/build/../src/readline/readline.c:774
#3  0x0000000000768acb in _rl_dispatch_callback (cxt=0x1ce6190) at /home/pedro/gdb/mygit/build/../src/readline/readline.c:686
#4  0x000000000078120b in rl_callback_read_char () at /home/pedro/gdb/mygit/build/../src/readline/callback.c:170
#5  0x0000000000619445 in rl_callback_read_char_wrapper (client_data=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/event-top.c:166
#6  0x000000000061981b in stdin_event_handler (error=0, client_data=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/event-top.c:372
#7  0x000000000061840e in handle_file_event (data=...) at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:762
#8  0x00000000006178f5 in process_event () at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:339
#9  0x00000000006179bc in gdb_do_one_event () at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:403
#10 0x0000000000617a0c in start_event_loop () at /home/pedro/gdb/mygit/build/../src/gdb/event-loop.c:428

Here, in _rl_dispatch_subseq:

769
770               rl_executing_keymap = map;
771
772               rl_dispatching = 1;
773               RL_SETSTATE(RL_STATE_DISPATCHING);
774               (*map[key].function)(rl_numeric_arg * rl_arg_sign, key);
775               RL_UNSETSTATE(RL_STATE_DISPATCHING);
776               rl_dispatching = 0;
777
778               /* If we have input pending, then the last command was a prefix
779                  command.  Don't change the state of rl_last_func.  Otherwise,

GDB is called from line 774, but longjmp'ing at that point leaves
rl_dispatching and RL_STATE_DISPATCHING set.

Fix this by wrapping tui_rl_switch_mode in a TRY_CATCH.

gdb/
2014-10-29  Pedro Alves  <[email protected]>

	* tui/tui.c (tui_rl_switch_mode): Wrap tui_enable/tui_disable in
	TRY_CATCH.
ibuclaw pushed a commit that referenced this issue Jan 14, 2015
... instead of relying on libthread_db.

I wrote a test that attaches to a program that constantly spawns
short-lived threads, which exposed several issues.  This is one of
them.

On Linux, we need to attach to all threads of a process (thread group)
individually.  We currently rely on libthread_db to list the threads,
but that is problematic, because libthread_db relies on reading data
structures out of the inferior (which may well be corrupted).  If
threads are being created or exiting just while we try to attach, we
may trip on inconsistencies in the inferior's thread list.  To work
around that, when we see a seemingly corrupt list, we currently retry
a few times:

 static void
 thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
 {
 ...
   if (until_no_new)
     {
       /* Require 4 successive iterations which do not find any new threads.
	  The 4 is a heuristic: there is an inherent race here, and I have
	  seen that 2 iterations in a row are not always sufficient to
	  "capture" all threads.  */
 ...

That heuristic may well fail, and when it does, we end up with threads
in the program that aren't under GDB's control.  That's obviously bad
and results in quite mistifying failures, like e.g., the process dying
for seeminly no reason when a thread that wasn't attached trips on a
breakpoint.

There's really no reason to rely on libthread_db for this nowadays
when we have /proc mounted.  In that case, which is the usual case, we
can list the LWPs from /proc/PID/task/.  In fact, GDBserver is already
doing this.  The patch factors out that code that knows to walk the
task/ directory out of GDBserver, and makes GDB use it too.

Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
them to stop immediately.  Instead, we just tag the LWP as having an
expected stop.  Because we can only set the ptrace options when the
thread stops, we need a new flag in the lwp structure to keep track of
whether we've already set the ptrace options, just like in GDBserver.
Note that nothing issues any ptrace command to the threads between the
PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
described in gdbserver's linux-low.c).

When we attach to a program that has threads exiting while we attach,
it's easy to race with a thread just exiting as we try to attach to
it, like:

  #1 - get current list of threads
  #2 - attach to each listed thread
  #3 - ooops, attach failed, thread is already gone

As this is pretty normal, we shouldn't be issuing a scary warning in
step #3.

When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
we'll see EPERM as well.  That happens when the kernel still has the
thread in its task list, but the thread is marked as dead.
Unfortunately, EPERM is ambiguous and we'll get it also on other
scenarios where the thread isn't dead, and in those cases, it's useful
to get a warning.  To distiguish the cases, when we get an EPERM
failure, we open /proc/PID/status, and check the thread's state -- if
the /proc file no longer exists, or the state is "Z (Zombie)" or "X
(Dead)", we ignore the EPERM error silently; otherwise, we'll warn.
Unfortunately, there seems to be a kernel race here.  Sometimes I get
EPERM, and then the /proc state still indicates "R (Running)"...  If
we wait a bit and retry, we do end up seeing X or Z state, or get an
ESRCH.  I thought of making GDB retry the attach a few times, but even
with a 500ms wait and 4 retries, I still see the warning sometimes.  I
haven't been able to identify the kernel path that causes this yet,
but in any case, it looks like a kernel bug to me.  As this just
results failure to suppress a warning that we've been printing since
about forever anyway, I'm just making the test cope with it, and issue
an XFAIL.

gdb/gdbserver/
2015-01-09  Pedro Alves  <[email protected]>

	* linux-low.c (linux_attach_fail_reason_string): Move to
	nat/linux-ptrace.c, and rename.
	(linux_attach_lwp): Update comment.
	(attach_proc_task_lwp_callback): New function.
	(linux_attach): Adjust to rename and use
	linux_proc_attach_tgid_threads.
	(linux_attach_fail_reason_string): Delete declaration.

gdb/
2015-01-09  Pedro Alves  <[email protected]>

	* linux-nat.c (attach_proc_task_lwp_callback): New function.
	(linux_nat_attach): Use linux_proc_attach_tgid_threads.
	(wait_lwp, linux_nat_filter_event): If not set yet, set the lwp's
	ptrace option flags.
	* linux-nat.h (struct lwp_info) <must_set_ptrace_flags>: New
	field.
	* nat/linux-procfs.c: Include <dirent.h>.
	(linux_proc_get_int): New parameter "warn".  Handle it.
	(linux_proc_get_tgid): Adjust.
	(linux_proc_get_tracerpid): Rename to ...
	(linux_proc_get_tracerpid_nowarn): ... this.
	(linux_proc_pid_get_state): New function, factored out from
	(linux_proc_pid_has_state): ... this.  Add new parameter "warn"
	and handle it.
	(linux_proc_pid_is_gone): New function.
	(linux_proc_pid_is_stopped): Adjust.
	(linux_proc_pid_is_zombie_maybe_warn)
	(linux_proc_pid_is_zombie_nowarn): New functions.
	(linux_proc_pid_is_zombie): Use
	linux_proc_pid_is_zombie_maybe_warn.
	(linux_proc_attach_tgid_threads): New function.
	* nat/linux-procfs.h (linux_proc_get_tgid): Update comment.
	(linux_proc_get_tracerpid): Rename to ...
	(linux_proc_get_tracerpid_nowarn): ... this, and update comment.
	(linux_proc_pid_is_gone): New declaration.
	(linux_proc_pid_is_zombie): Update comment.
	(linux_proc_pid_is_zombie_nowarn): New declaration.
	(linux_proc_attach_lwp_func): New typedef.
	(linux_proc_attach_tgid_threads): New declaration.
	* nat/linux-ptrace.c (linux_ptrace_attach_fail_reason): Adjust to
	use nowarn functions.
	(linux_ptrace_attach_fail_reason_string): Move here from
	gdbserver/linux-low.c and rename.
	(ptrace_supports_feature): If the current ptrace options are not
	known yet, check them now, instead of asserting.
	* nat/linux-ptrace.h (linux_ptrace_attach_fail_reason_string):
	Declare.
ibuclaw pushed a commit that referenced this issue Jan 14, 2015
Running the testsuite with a series that reimplements user-visible
all-stop behavior on top of a target running in non-stop mode revealed
problems related to event starvation avoidance.

For example, I see
gdb.threads/signal-while-stepping-over-bp-other-thread.exp failing.
What happens is that GDB core never gets to see the signal event.  It
ends up processing the events for the same threads over an over,
because Linux's waitpid(-1, ...) returns that first task in the task
list that has an event, starving threads on the tail of the task list.

So I wrote a non-stop mode test originally inspired by
signal-while-stepping-over-bp-other-thread.exp, to stress this
independently of all-stop on top of non-stop.  Fixing it required the
changes described below.  The test will be added in a following
commit.

1) linux-nat.c has code in place that picks an event LWP at random out
of all that have had events.  This is because on the kernel side,
"waitpid(-1, ...)"  just walks the task list linearly looking for the
first that had an event.  But, this code is currently only used in
all-stop mode.  So with a multi-threaded program that has multiple
events triggering debug events in parallel, GDB ends up starving some
threads.

To make the event randomization work in non-stop mode too, the patch
makes us pull out all the already pending events on the kernel side,
with waitpid, before deciding which LWP to report to the core.

There's some code in linux_wait that takes care of leaving events
pending if they were for LWPs the caller is not interested in.  The
patch moves that to linux_nat_filter_event, so that we only have one
place that leaves events pending.  With that in place, conceptually,
the flow is simpler and more normalized:

 #1 - walk the LWP list looking for an LWP with a pending event to report.
 #2 - if no pending event, pull events out of the kernel, and store
      them in the LWP structures as pending.
 #3- goto #1.

2) Then, currently the event randomization code only considers SIGTRAP
(or trap-like) events.  That means that if e.g., have have multiple
threads stepping in parallel that hit a breakpoint that needs stepping
over, and one gets a signal, the signal may end up never getting
processed, because GDB will always be giving priority to the SIGTRAPs.
The patch fixes this by making the randomization code consider all
kinds of pending events.

3) If multiple threads hit a breakpoint, we report one of those, and
"cancel" the others.  Cancelling means decrementing the PC, and
discarding the event.  If the next time the LWP is resumed the
breakpoint is still installed, the LWP should hit it again, and we'll
report the hit then.  The problem I found is that this delays threads
from advancing too much, with the kernel potentially ending up
scheduling the same threads over and over, and others not advancing.
So the patch switches away from cancelling the breakpoints, and
instead remembering that the LWP had stopped for a breakpoint.  If on
resume the breakpoint is still installed, we report it.  If it's no
longer installed, we discard the pending event then.  This is actually
how GDBserver used to handle this before d50171e (Teach linux
gdbserver to step-over-breakpoints), but with the difference that back
then we'd delay adjusting the PC until resuming, which made it so that
"info threads" could wrongly see threads with unadjusted PCs.

gdb/
2015-01-09  Pedro Alves  <[email protected]>

	* breakpoint.c (hardware_breakpoint_inserted_here_p): New
	function.
	* breakpoint.h (hardware_breakpoint_inserted_here_p): New
	declaration.
	* linux-nat.c (linux_nat_status_is_event): Move higher up in file.
	(linux_resume_one_lwp): Store the thread's PC.  Adjust to clear
	stop_reason.
	(check_stopped_by_watchpoint): New function.
	(save_sigtrap): Reimplement.
	(linux_nat_stopped_by_watchpoint): Adjust.
	(linux_nat_lp_status_is_event): Delete.
	(stop_wait_callback): Only call save_sigtrap after storing the
	pending status.
	(status_callback): If the thread had been stopped for a breakpoint
	that has since been removed, discard the event and resume the LWP.
	(count_events_callback, select_event_lwp_callback): Use
	lwp_status_pending_p instead of linux_nat_lp_status_is_event.
	(cancel_breakpoint): Rename to ...
	(check_stopped_by_breakpoint): ... this.  Record whether the LWP
	stopped for a software breakpoint or hardware breakpoint.
	(select_event_lwp): Only give preference to the stepping LWP in
	all-stop mode.  Adjust comments.
	(stop_and_resume_callback): Remove references to new_pending_p.
	(linux_nat_filter_event): Likewise.  Leave exit events of the
	leader thread pending here.  Handle signal short circuiting here.
	Only call save_sigtrap after storing the pending waitstatus.
	(linux_nat_wait_1): Remove 'retry' label.  Remove references to
	new_pending.  Don't handle leaving events the caller is not
	interested in pending here, nor handle signal short-circuiting
	here.  Also give equal priority to all LWPs that have had events
	in non-stop mode.  If reporting a software breakpoint event,
	unadjust the LWP's PC.
	* linux-nat.h (enum lwp_stop_reason): New.
	(struct lwp_info) <stop_pc>: New field.
	(struct lwp_info) <stopped_by_watchpoint>: Delete field.
	(struct lwp_info) <stop_reason>: New field.
	* x86-linux-nat.c (x86_linux_prepare_to_resume): Adjust.
ibuclaw pushed a commit that referenced this issue May 11, 2015
If the user:

   #1 - disables the TUI
   #2 - resizes the terminal
   #3 - and then re-enables the TUI

the next wgetch() returns KEY_RESIZE.  This indicates to the ncurses
client that ncurses detected that the terminal has been resized.  We
don't handle KEY_RESIZE anywhere, so it gets passed on to readline
which interprets it as a multibyte character, and then the end result
is that the first key press after enabling the TUI is misinterpreted.

We shouldn't really need to handle KEY_RESIZE (and not all ncurses
implementations have that).  We have our own SIGWINCH handler, and,
when we re-enable the TUI, we explicitly detect terminal resizes and
resize all windows.  The reason ncurses currently does detects a
resize is that something within tui_enable forces a refresh/display of
some window before we get to do the actual resizing.  Setting a break
on ncurses' 'resizeterm' function helps find the culprit(s):

 (top-gdb) bt
 #0  resizeterm (ToLines=28, ToCols=114) at ../../ncurses/base/resizeterm.c:462
 #1  0x0000003b42812f3f in _nc_update_screensize (sp=0x2674730) at ../../ncurses/tinfo/lib_setup.c:443
 #2  0x0000003b0821cbe0 in doupdate () at ../../ncurses/tty/tty_update.c:726
 #3  0x0000003b08215539 in wrefresh (win=0x2a7bc00) at ../../ncurses/base/lib_refresh.c:65
 #4  0x00000000005257cb in tui_refresh_win (win_info=0xd73d60 <_locator>) at /home/pedro/gdb/mygit/src/gdb/tui/tui-wingeneral.c:60
 #5  0x000000000052265b in tui_show_locator_content () at /home/pedro/gdb/mygit/src/gdb/tui/tui-stack.c:269
 #6  0x00000000005273a6 in tui_set_key_mode (mode=TUI_COMMAND_MODE) at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:321
 #7  0x00000000005278c7 in tui_enable () at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:494
 #8  0x0000000000527011 in tui_rl_switch_mode (notused1=1, notused2=1) at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:108

That is, tui_enable calls tui_set_key_mode before we've resized all
windows, and that refreshes a window as side effect.

And if we're already debugging something (there's a frame), then we'll
instead show a window from within tui_show_frame_info:

 (top-gdb) bt
 #0  resizeterm (ToLines=28, ToCols=114) at ../../ncurses/base/resizeterm.c:462
 #1  0x0000003b42812f3f in _nc_update_screensize (sp=0x202e6c0) at ../../ncurses/tinfo/lib_setup.c:443
 #2  0x0000003b0821cbe0 in doupdate () at ../../ncurses/tty/tty_update.c:726
 #3  0x0000003b08215539 in wrefresh (win=0x2042890) at ../../ncurses/base/lib_refresh.c:65
 #4  0x00000000005257cb in tui_refresh_win (win_info=0xd73d60 <_locator>) at /home/pedro/gdb/mygit/src/gdb/tui/tui-wingeneral.c:60
 #5  0x000000000052265b in tui_show_locator_content () at /home/pedro/gdb/mygit/src/gdb/tui/tui-stack.c:269
 #6  0x0000000000522931 in tui_show_frame_info (fi=0x16b9cc0) at /home/pedro/gdb/mygit/src/gdb/tui/tui-stack.c:364
 #7  0x00000000005278ba in tui_enable () at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:491
 #8  0x0000000000527011 in tui_rl_switch_mode (notused1=1, notused2=1) at /home/pedro/gdb/mygit/src/gdb/tui/tui.c:108

The fix is to resize windows earlier.

gdb/ChangeLog:
2015-02-17  Pedro Alves  <[email protected]>

	* tui/tui.c (tui_enable): Resize windows before anything
	might show a window.
ibuclaw pushed a commit that referenced this issue May 11, 2015
The attached patch fixes the SEGV and lets GDB successfully
load all kernel modules installed by default on RHEL 7.

Valgrind on F-21 x86_64 host has shown me more clear what is the problem:

Reading symbols from /home/jkratoch/t/cordic.ko...Reading symbols from
/home/jkratoch/t/cordic.ko.debug...=================================================================
==22763==ERROR: AddressSanitizer: heap-use-after-free on address 0x6120000461c8 at pc 0x150cdbd bp 0x7fffffffc7e0 sp 0x7fffffffc7d0
READ of size 8 at 0x6120000461c8 thread T0
    #0 0x150cdbc in ppc64_elf_get_synthetic_symtab /home/jkratoch/redhat/gdb-test-asan/bfd/elf64-ppc.c:3282
    #1 0x8c5274 in elf_read_minimal_symbols /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1205
    #2 0x8c55e7 in elf_symfile_read /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1268
[...]
0x6120000461c8 is located 264 bytes inside of 288-byte region [0x6120000460c0,0x6120000461e0)
freed by thread T0 here:
    #0 0x7ffff715454f in __interceptor_free (/lib64/libasan.so.1+0x5754f)
    #1 0xde9cde in xfree common/common-utils.c:98
    #2 0x9a04f7 in do_my_cleanups common/cleanups.c:155
    #3 0x9a05d3 in do_cleanups common/cleanups.c:177
    #4 0x8c538a in elf_read_minimal_symbols /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1229
    #5 0x8c55e7 in elf_symfile_read /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1268
[...]
previously allocated by thread T0 here:
    #0 0x7ffff71547c7 in malloc (/lib64/libasan.so.1+0x577c7)
    #1 0xde9b95 in xmalloc common/common-utils.c:41
    #2 0x8c4da2 in elf_read_minimal_symbols /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1147
    #3 0x8c55e7 in elf_symfile_read /home/jkratoch/redhat/gdb-test-asan/gdb/elfread.c:1268
[...]
SUMMARY: AddressSanitizer: heap-use-after-free /home/jkratoch/redhat/gdb-test-asan/bfd/elf64-ppc.c:3282 ppc64_elf_get_synthetic_symtab
[...]
==22763==ABORTING

A similar case a few lines later I have fixed in 2010 by:
        https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=3f1eff0a2c7f0e7078f011f55b8e7f710aae0cc2

My testcase does not always reproduce it but at least a bit:
 * GDB without ppc64 target (even as a secondary one) is reported as "untested"
 * ASAN-built GDB with ppc64 target always crashes (and PASSes with this fix)
 * unpatched non-ASAN-built GDB with ppc64 target crashes from commandline
 * unpatched non-ASAN-built GDB with ppc64 target PASSes from runtest (?)

gdb/ChangeLog
2015-02-26  Jan Kratochvil  <[email protected]>

	* elfread.c (elf_read_minimal_symbols): Use bfd_alloc for
	bfd_canonicalize_symtab.

gdb/testsuite/ChangeLog
2015-02-26  Jan Kratochvil  <[email protected]>

	* gdb.arch/cordic.ko.bz2: New file.
	* gdb.arch/cordic.ko.debug.bz2: New file.
	* gdb.arch/ppc64-symtab-cordic.exp: New file.
ibuclaw pushed a commit that referenced this issue May 11, 2015
…ume LWP

Ref: https://sourceware.org/ml/gdb-patches/2015-03/msg00060.html

The record-btrace target can hit an assertion here:

 Breakpoint 1, record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>,
     regcache=0x9a0a798, regno=8) at gdb/record-btrace.c:1202
 1202	  gdb_assert (tp != NULL);

 (gdb) p regcache->ptid
 $3 = {pid = 23856, lwp = 0, tid = 0}

The problem is that the linux-nat layer converts the ptid to a
single-process ptid before passing the request down to the inf-ptrace
layer, which loses information, and then record-btrace can't find the
corresponding thread in GDB's thread list:

 (gdb) bt
 #0  record_btrace_fetch_registers (ops=0x974bfc0 <record_btrace_ops>, regcache=0x9a0a798, regno=8)
     at gdb/record-btrace.c:1202
 #1  0x083f4ee2 in delegate_fetch_registers (self=0x974bfc0 <record_btrace_ops>, arg1=0x9a0a798,
     arg2=8) at gdb/target-delegates.c:149
 #2  0x08406562 in target_fetch_registers (regcache=0x9a0a798, regno=8)
     at gdb/target.c:3279
 #3  0x08355255 in regcache_raw_read (regcache=0x9a0a798, regnum=8,
     buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]")
     at gdb/regcache.c:643
 #4  0x083558a7 in regcache_cooked_read (regcache=0x9a0a798, regnum=8,
     buf=0xbfffe6c0 "¨\003\222\tÀ8kIøæÿ¿HO5\b\035]")
     at gdb/regcache.c:734
 #5  0x08355de3 in regcache_cooked_read_unsigned (regcache=0x9a0a798, regnum=8, val=0xbfffe738)
     at gdb/regcache.c:838
 #6  0x0827a106 in i386_linux_resume (ops=0x9737ca0 <linux_ops_saved>, ptid=..., step=1,
     signal=GDB_SIGNAL_0) at gdb/i386-linux-nat.c:670
 #7  0x08280c12 in linux_resume_one_lwp (lp=0x9a0a5b8, step=1, signo=GDB_SIGNAL_0)
     at gdb/linux-nat.c:1529
 #8  0x08281281 in linux_nat_resume (ops=0x98da608, ptid=..., step=1, signo=GDB_SIGNAL_0)
     at gdb/linux-nat.c:1708
 #9  0x0850738e in record_btrace_resume (ops=0x98da608, ptid=..., step=1, signal=GDB_SIGNAL_0)
     at gdb/record-btrace.c:1760
 ...

The fix is just to not lose information, and let the intact ptid reach
record-btrace.c.

Tested on x86-64 Fedora 20, -m32.

gdb/ChangeLog:
2015-03-03  Pedro Alves  <[email protected]>

	* i386-linux-nat.c (i386_linux_resume): Get the ptrace PID out of
	the lwp field of ptid.  Pass the full ptid to get_thread_regcache.
	* inf-ptrace.c (get_ptrace_pid): New function.
	(inf_ptrace_resume): Use it.
	* linux-nat.c (linux_resume_one_lwp): Pass the LWP's ptid ummodified
	to the lower layer.
ibuclaw pushed a commit that referenced this issue May 11, 2015
    This patch changes the heuristic the linespec lexer uses to
    detect a keyword in the input stream.

    Currently, the heuristic is: a word is a keyword if it
    1) points to a string that is a keyword
    2) is followed by a non-identifier character

    This is strictly more correct than using whitespace. For example,
    it allows constructs such as "break foo if(i == 1)". However,
    find_condition_and_thread in breakpoint.c does not support this expanded
    usage. It requires whitespace to follow the keyword.

    The proposed new heuristic is: a word is a keyword if it
    1) points to a string that is a keyword
    2) is followed by whitespace
    3) is not followed by another keyword string followed by whitespace

    This additional complexity allows constructs such as
    "break thread thread 3" and "break thread 3".  In the former case,
    the actual location is a symbol named "thread" to be set on thread #3.
    In the later case, the location is NULL, i.e., the default location,
    to be set on thread #3.

    In order to pass all the new tests added here, I've also had to add a
    new feature to parse_breakpoint_sals, which expands recognition of the
    default location to keywords other than "if", which is the only keyword
    currently permitted with the default (NULL) location, but there is no
    reason to exclude other keywords.

    Consequently, it will be possible to use "break thread 1" or
    "break task 1".

    In addition to all of this, it is now possible to remove the keyword_ok
    state from the linespec parser.

    gdb/ChangeLog

    	* breakpoint.c (parse_breakpoint_sals): Use
    	linespec_lexer_lex_keyword to ascertain if the user specified
    	a NULL location.
    	* linespec.c [IF_KEYWORD_INDEX]: Define.
    	(linespec_lexer_lex_keyword): Export.
    	(struct ls_parser) <keyword_ok>: Remove.
    	A keyword is only a keyword if not followed by another keyword.
    	(linespec_lexer_lex_one): Remove keyword_ok handling.
    	Add comment explaining why the parsing stream is not advanced
    	when a keyword is seen.
    	(parse_linespec): Remove parser->keyword_ok.
    	* linespec.h (linespec_lexer_lex_keyword): Add declaration.

    gdb/testsuite/ChangeLog

    	* gdb.linespec/keywords.c: New file.
    	* gdb.linespec/keywords.exp: New file.
ibuclaw pushed a commit that referenced this issue May 11, 2015
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:

 (gdb) continue
 Continuing.
 src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)

Here:

 (top-gdb) bt
 #0  internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
     at src/gdb/common/errors.c:54
 #1  0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
 #2  0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
 #3  0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
 #4  0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
 #5  0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
 #6  0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
     at src/gdb/linux-thread-db.c:1525
 #7  0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
 #8  0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
 #9  0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
 #10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
 #11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655

I managed to come up with a test that reliably reproduces this.  It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while.  On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds.  So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.

Tested on x86_64 Fedora 20.

gdb/ChangeLog:
2015-04-01  Pedro Alves  <[email protected]>

	* linux-thread-db.c (record_thread): Readd the thread to gdb's
	list if it was marked exited.

gdb/testsuite/ChangeLog:
2015-04-01  Pedro Alves  <[email protected]>

	* gdb.threads/tid-reuse.c: New file.
	* gdb.threads/tid-reuse.exp: New file.
ibuclaw pushed a commit that referenced this issue May 11, 2015
…gnals

Both PRs are triggered by the same use case.

PR18214 is about software single-step targets.  On those, the 'resume'
code that detects that we're stepping over a breakpoint and delivering
a signal at the same time:

  /* Currently, our software single-step implementation leads to different
     results than hardware single-stepping in one situation: when stepping
     into delivering a signal which has an associated signal handler,
     hardware single-step will stop at the first instruction of the handler,
     while software single-step will simply skip execution of the handler.
...
     Fortunately, we can at least fix this particular issue.  We detect
     here the case where we are about to deliver a signal while software
     single-stepping with breakpoints removed.  In this situation, we
     revert the decisions to remove all breakpoints and insert single-
     step breakpoints, and instead we install a step-resume breakpoint
     at the current address, deliver the signal without stepping, and
     once we arrive back at the step-resume breakpoint, actually step
     over the breakpoint we originally wanted to step over.  */

doesn't handle the case of _another_ thread also needing to step over
a breakpoint.  Because the other thread is just resumed at the PC
where it had stopped and a breakpoint is still inserted there, the
thread immediately re-traps the same breakpoint.  This test exercises
that.  On software single-step targets, it fails like this:

 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr2: continue to sigusr1_handler

gdb.log (simplified):

 (gdb) continue
 Continuing.

 Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
 66            callme (); /* set breakpoint thread 2 here */
 (gdb) thread 3
 (gdb) queue-signal SIGUSR1
 (gdb) thread 1
 [Switching to thread 1 (Thread 0x7ffff7fc1740 (LWP 24824))]
 #0  main () at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:106
 106       wait_threads (); /* set wait-threads breakpoint here */
 (gdb) break sigusr1_handler
 Breakpoint 5 at 0x400837: file src/gdb/testsuite/gdb.threads/multiple-step-overs.c, line 31.
 (gdb) continue
 Continuing.
 [Switching to Thread 0x7ffff7fc0700 (LWP 24828)]

 Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
 66            callme (); /* set breakpoint thread 2 here */
 (gdb) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler


For good measure, I made the test try displaced stepping too.  And
then I found it crashes GDB on x86-64 (a hardware step target), but
only when displaced stepping... :

 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr1: continue to sigusr1_handler (PRMS: gdb/18216)
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr2: continue to sigusr1_handler (PRMS: gdb/18216)
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr3: continue to sigusr1_handler (PRMS: gdb/18216)

 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
 4964          if (sr_bp->loc->permanent
 Setting up the environment for debugging gdb.
 Breakpoint 1 at 0x79fcfc: file src/gdb/common/errors.c, line 54.
 Breakpoint 2 at 0x50a26c: file src/gdb/cli/cli-cmds.c, line 217.
 (top-gdb) p sr_bp
 $1 = (struct breakpoint *) 0x0
 (top-gdb) bt
 #0  0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
 #1  0x000000000062a1af in handle_signal_stop (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4715
 #2  0x0000000000629097 in handle_inferior_event (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4165
 #3  0x0000000000627482 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3298
 #4  0x000000000064ad7b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
 #5  0x00000000004c375f in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4658
 #6  0x0000000000648c47 in handle_file_event (file_ptr=0x2e0eaa0, ready_mask=1) at src/gdb/event-loop.c:658

The all-stop-non-stop series fixes this, but meanwhile, this augments
the multiple-step-overs.exp test to cover this, KFAILed.

gdb/testsuite/ChangeLog:
2015-04-08  Pedro Alves  <[email protected]>

	PR gdb/18214
	PR gdb/18216
	* gdb.threads/multiple-step-overs.c (sigusr1_handler): New
	function.
	(main): Install it as SIGUSR1 handler.
	* gdb.threads/multiple-step-overs.exp (setup): Remove 'prefix'
	parameter.  Always use "setup" as prefix.  Toggle "set
	displaced-stepping" off/on depending on global.  Don't switch to
	thread 1 here.
	(top level): Add displaced stepping "off/on" test axis.  Update
	"setup" calls.  Wrap each subtest with with_test_prefix.  Test
	continuing with a queued signal in each thread.
ibuclaw pushed a commit that referenced this issue May 11, 2015
…gdbserver

Fixes:

 -FAIL: gdb.trace/mi-tracepoint-changed.exp: reconnect: break-info 1
 +PASS: gdb.trace/mi-tracepoint-changed.exp: reconnect: tracepoint created
 +PASS: gdb.trace/mi-tracepoint-changed.exp: reconnect: tracepoint on marker is installed
 +PASS: gdb.trace/mi-tracepoint-changed.exp: reconnect: break-info 1


 -FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv1 created
 -FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv2 created
 +PASS: gdb.trace/mi-tsv-changed.exp: upload: tsv1 created
 +PASS: gdb.trace/mi-tsv-changed.exp: upload: tsv2 created

These tests do something like this:

 #0 - start gdb/gdbserver normally
 #1 - setup some things in the debug session
 #2 - disconnect from gdbserver
 #3 - restart gdb
 #4 - reconnect to gdbserver

The problem is that the native-extended-gdbserver board always spawns
a new gdbserver instance in #3 (and has gdb connect to that).  So when
the test gets to #4, it connects to that new instance instead of the
old one:

 (gdb) spawn ../gdbserver/gdbserver --multi :2354
 Listening on port 2354
 target extended-remote localhost:2354
 Remote debugging using localhost:2354
 ...
 spawn ../gdbserver/gdbserver --multi :2355
 Listening on port 2355
 47-target-select extended-remote localhost:2355
 =tsv-created,name="trace_timestamp",initial="0"\n
 47^connected
 (gdb)
 ...
 47-target-select extended-remote localhost:2355
 47^connected
 (gdb)
 FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv1 created
 FAIL: gdb.trace/mi-tsv-changed.exp: upload: tsv2 created

testsuite/ChangeLog:
2015-04-16  Pedro Alves  <[email protected]>

	* boards/native-extended-gdbserver.exp (mi_gdb_start): Don't start
	a new gdbserver if gdbserver_reconnect_p is set.
ibuclaw pushed a commit that referenced this issue May 11, 2015
Hi,
I see the fail in gdb.base/relativedebug.exp on aarch64 box on which
glibc doesn't have debug info,

 bt^M
 #0 0x0000002000061a88 in raise () from /lib/aarch64-linux-gnu/libc.so.6^M
 #1 0x0000002000064efc in abort () from /lib/aarch64-linux-gnu/libc.so.6^M
 #2 0x0000000000400640 in handler (signo=14) at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:25^M
 #3 <signal handler called>^M
 #4 0x00000020000cc478 in ?? () from /lib/aarch64-linux-gnu/libc.so.6^M
 #5 0x0000000000400664 in main () at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:32^M
 (gdb) FAIL: gdb.base/relativedebug.exp: pause found in backtrace

if glibc has debug info, this test doesn't fail.

In sysdeps/unix/sysv/linux/generic/pause.c, __libc_pause calls
__syscall_pause,

  static int
  __syscall_pause (void)
  {
    sigset_t set;

    int rc =
      INLINE_SYSCALL (rt_sigprocmask, 4, SIG_BLOCK, NULL, &set, _NSIG / 8);
    if (rc == 0)
      rc = INLINE_SYSCALL (rt_sigsuspend, 2, &set, _NSIG / 8);

    return rc;
  }

  int
  __libc_pause (void)
  {
    if (SINGLE_THREAD_P)
      return __syscall_pause ();     <--- tail call

    int oldtype = LIBC_CANCEL_ASYNC ();

    int result = __syscall_pause ();

    LIBC_CANCEL_RESET (oldtype);

    return result;
  }

and GDB unwinder is confused by the GCC optimized code,

(gdb) disassemble pause
Dump of assembler code for function pause:
   0x0000007fb7f274c4 <+0>:     stp     x29, x30, [sp,#-32]!
   0x0000007fb7f274c8 <+4>:     mov     x29, sp
   0x0000007fb7f274cc <+8>:     adrp    x0, 0x7fb7fd2000
   0x0000007fb7f274d0 <+12>:    ldr     w0, [x0,#364]
   0x0000007fb7f274d4 <+16>:    stp     x19, x20, [sp,#16]
   0x0000007fb7f274d8 <+20>:    cbnz    w0, 0x7fb7f274e8 <pause+36>

   0x0000007fb7f274dc <+24>:    ldp     x19, x20, [sp,#16]
   0x0000007fb7f274e0 <+28>:    ldp     x29, x30, [sp],#32
   0x0000007fb7f274e4 <+32>:    b       0x7fb7f27434    <---- __syscall_pause

   0x0000007fb7f274e8 <+36>:    bl      0x7fb7f5e080

Note that the program stops in __syscall_pause, but its symbol is
stripped in glibc, so GDB doesn't know where the program stops.
__syscall_pause is a tail call in __libc_pause, so it returns to main
instead of __libc_pause.  As a result, the backtrace is like,

 #0  0x0000007fb7ebca88 in raise () from /lib/aarch64-linux-gnu/libc.so.6
 #1  0x0000007fb7ebfefc in abort () from /lib/aarch64-linux-gnu/libc.so.6
 #2  0x0000000000400640 in handler (signo=14) at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:25
 #3  <signal handler called>
 #4  0x0000007fb7f27478 in ?? () from /lib/aarch64-linux-gnu/libc.so.6   <-- [in __syscall_pause]
 #5  0x0000000000400664 in main () at ../../../binutils-gdb/gdb/testsuite/gdb.base/relativedebug.c:32

looks GDB does nothing wrong here.  I looked back at the test case
gdb.base/relativedebug.exp, which was added
https://sourceware.org/ml/gdb-patches/2006-10/msg00305.html
This test was indented to test the problem that "backtraces no longer
display some libc functions" after separate debug info is installed.
IOW, it makes few sense to test against libc which doesn't have debug
info at all, such as my case.

This patch is to tweak the test case to catch the output of
"info shared", if "(*)" is found for libc.so, which means libc doesn't
have debug info, then skip the test.

gdb/testsuite:

2015-04-30  Yao Qi  <[email protected]>

	* gdb.base/relativedebug.exp: Invoke gdb command
	"info sharedlibrary", and if libc.so doesn't have debug info,
	skip the test.
ibuclaw pushed a commit that referenced this issue Jun 24, 2015
(gdb) PASS: gdb.compile/compile.exp: set unwindonsignal on
compile code *(volatile int *) 0 = 0;
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fba426 in _gdb_expr (__regs=0x7ffff7fb8000) at gdb command line:1
1	gdb command line: No such file or directory.
=================================================================
==10462==ERROR: AddressSanitizer: heap-use-after-free on address 0x621000cf7a3d at pc 0x0000004e46b9 bp 0x7ffdeb0f7a40 sp 0x7ffdeb0f71b8
READ of size 10 at 0x621000cf7a3d thread T0
    #0 0x4e46b8 in printf_common(void*, char const*, __va_list_tag*) [clone .isra.6] (/home/jkratoch/redhat/gdb-clean-asan/gdb/gdb+0x4e46
b8)
    #1 0x4f645e in vasprintf (/home/jkratoch/redhat/gdb-clean-asan/gdb/gdb+0x4f645e)
    #2 0xe5cf00 in xstrvprintf common/common-utils.c:120
    #3 0xe74192 in throw_it common/common-exceptions.c:332
    #4 0xe742f6 in throw_verror common/common-exceptions.c:361
    #5 0xddc89e in verror /home/jkratoch/redhat/gdb-clean-asan/gdb/utils.c:541
    #6 0xe734bd in error common/errors.c:43
    #7 0xafa1d6 in call_function_by_hand_dummy /home/jkratoch/redhat/gdb-clean-asan/gdb/infcall.c:1031
    #8 0xe81858 in compile_object_run compile/compile-object-run.c:119
    #9 0xe7733c in eval_compile_command compile/compile.c:577
    #10 0xe7541e in compile_code_command compile/compile.c:153

It is obvious why that happens, dummy_frame_pop() will call compile objfile
cleanup which will free that objfile and NAME then becomes a stale pointer.

> Is there any reason we release OBJFILE in the dummy frame dtor?  Why
> don't we register a cleanup to release in OBJFILE in compile_object_run?
> together with releasing compile_module?  'struct compile_module' has a
> field objfile, which should be released together with
> 'struct compile_module' instead of dummy_frame.

(gdb) break puts
Breakpoint 2 at 0x3830c6fd30: file ioputs.c, line 34.
(gdb) compile code puts("hello")
Breakpoint 2, _IO_puts (str=0x7ffff7ff8000 "hello") at ioputs.c:34
34      {
The program being debugged stopped while in a function called from GDB.
Evaluation of the expression containing the function
(_gdb_expr) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb) bt
(gdb) _

Now compile_object_run() called from line
	(gdb) compile code puts("hello")
has finished for a long time.  But we still need to have that injected code
OBJFILE valid when GDB is executing it.  Therefore OBJFILE is freed only from
destructor of the frame #1.

At the patched line of call_function_by_hand_dummy() the dummy frame
destructor has not yet been run but it will be run before the fetched NAME
will get used.

gdb/ChangeLog
2015-05-19  Jan Kratochvil  <[email protected]>

	Fix ASAN crash for gdb.compile/compile.exp.
	* infcall.c (call_function_by_hand_dummy): Use xstrdup for NAME.
ibuclaw pushed a commit that referenced this issue Jun 24, 2015
…info

In some cases tui_show_frame_info() may get called while the inferior's
terminal settings are still in effect.  But when we call this function
we absolutely need to have our terminal settings in effect because the
function is responsible for redrawing TUI's windows following a change
in the selected frame or a change in the PC.  If our terminal settings
are not in effect, the screen does not get redrawn properly, causing
temporary display artifacts (which can be fixed via ^L).

This scenario happens most prominently when stepping through a program
in TUI while a watchpoint is in effect.

Here is an example backtrace for when tui_show_frame_info() gets called
while target_terminal_is_inferior() == 1:

  #1  0x00000000004988ee in tui_selected_frame_level_changed_hook (level=0)
  #2  0x0000000000617b99 in select_frame (fi=0x18c9820)
  #3  0x0000000000617c3f in get_selected_frame (message=message@entry=0x0)
  #4  0x00000000004ce534 in update_watchpoint (b=b@entry=0x2d9a760,
      reparse=reparse@entry=0)
  #5  0x00000000004d625e in insert_breakpoints ()
  #6  0x0000000000531cfe in keep_going (ecs=ecs@entry=0x7ffea7884ac0)
  #7  0x00000000005326d7 in process_event_stop_test (ecs=ecs@entry=0x7ffea7884ac0)
  #8  0x000000000053596e in handle_inferior_event_1 (ecs=0x7ffea7884ac0)

The fix is simple: call target_terminal_ours_for_output() before calling
tui_show_frame_info() in TUI's frame-changed hook, making sure to
restore the original terminal settings afterwards.

gdb/ChangeLog:

	* tui/tui-hooks.c (tui_selected_frame_level_changed_hook): Call
	target_terminal_ours_for_output() before calling
	tui_show_frame_info(), and restore the original terminal
	settings afterwards.
ibuclaw pushed a commit that referenced this issue Aug 11, 2015
…event

The tail end of linux_wait_1 isn't expecting that the select_event_lwp
machinery can pick a whole-process exit event to report to GDB.  When
that happens, both gdb and gdbserver end up quite confused:

 ...
 (gdb)
 [Thread 24971.24971] #1 stopped.
 0x0000003615a011f0 in ?? ()
 c&
 Continuing.
 (gdb) [New Thread 24971.24981]
 [New Thread 24983.24983]
 [New Thread 24971.24982]

 [Thread 24983.24983] #3 stopped.
 0x0000003615ebc7cc in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/fork.c:130
 130       pid = ARCH_FORK ();
 [New Thread 24984.24984]
 Error in re-setting breakpoint -16: PC register is not available
 Error in re-setting breakpoint -17: PC register is not available
 Error in re-setting breakpoint -18: PC register is not available
 Error in re-setting breakpoint -19: PC register is not available
 Error in re-setting breakpoint -24: PC register is not available
 Error in re-setting breakpoint -25: PC register is not available
 Error in re-setting breakpoint -26: PC register is not available
 Error in re-setting breakpoint -27: PC register is not available
 Error in re-setting breakpoint -28: PC register is not available
 Error in re-setting breakpoint -29: PC register is not available
 Error in re-setting breakpoint -30: PC register is not available
 PC register is not available
 (gdb)

gdb/gdbserver/ChangeLog:
2015-08-06  Pedro Alves  <[email protected]>

	* linux-low.c (add_lwp): Set waitstatus to TARGET_WAITKIND_IGNORE.
	(linux_thread_alive): Use lwp_is_marked_dead.
	(extended_event_reported): Delete.
	(linux_wait_1): Check if waitstatus is TARGET_WAITKIND_IGNORE
	instead of extended_event_reported.
	(mark_lwp_dead): Don't set the 'dead' flag.  Store the waitstatus
	as well.
	(lwp_is_marked_dead): New function.
	(lwp_running): Use lwp_is_marked_dead.
	* linux-low.h: Delete 'dead' field, and update 'waitstatus's
	comment.
ibuclaw pushed a commit that referenced this issue Aug 13, 2015
Making all-stop run on top of non-stop caused a small regression
in behavior. This was observed on x86_64-linux. The attached testcase
is in C whereas the investigation was done with an Ada program,
but it's the same scenario, and using a C testcase allows wider testing.
Basically: I am debugging a single-threaded program, and currently
stopped inside a function provided by a shared-library, at a line
calling a subprogram provided by a second shared library, and trying
to "next" over that function call.

Before we changed the default all-stop behavior, we had:

    7             Impl_Initialize;  -- Stop here and try "next" over this line
    (gdb) n
    8             return 5;  <<-- OK

But now, "next" just stops much earlier:

    (gdb) n
    0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so

What happens is that next stops at a call instruction, which calls
the function's PLT, and GDB fails to notice that the inferior stepped
into a subroutine, and so decides that we're done. We can see another
symptom of the same issue by looking at the backtrace at the point
GDB stopped:

    (gdb) bt
    #0  0x00007ffff7bd8560 in impl.initialize@plt ()
       from /[...]/lib/libpck.so
    #1  0x00000000f7bd86f9 in ?? ()
    #2  0x00007fffffffdf50 in ?? ()
    #3  0x0000000000401893 in a () at /[...]/a.adb:7
    Backtrace stopped: frame did not save the PC

With a functioning GDB, the backtrace looks like the following instead:

    #0  0x00007ffff7bd8560 in impl.initialize@plt ()
       from /[...]/lib/libpck.so
    #1  0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7
    #2  0x0000000000401893 in a () at /[...]/a.adb:7

Note how, for frame #1, the address looks quite similar, except
for the high-order bits not being set:

    #1  0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7   <<<--  OK
    #1  0x00000000f7bd86f9 in ?? ()                        <<<--  WRONG
              ^^^^
              ||||
              Wrong

Investigating this further led me to displaced stepping.
As we are "next"-ing from a location where a breakpoint is inserted,
we need to step out of it, and since we're on non-stop mode, we need
to do it using displaced stepping. And looking at
amd64-tdep.c:amd64_displaced_step_fixup, I found the code that handles
the return address:

    regcache_cooked_read_unsigned (regs, AMD64_RSP_REGNUM, &rsp);
    retaddr = read_memory_unsigned_integer (rsp, retaddr_len, byte_order);
    retaddr = (retaddr - insn_offset) & 0xffffffffUL;

The mask used to compute retaddr looks wrong to me, keeping only
4 bytes instead of 8, and explains why the high order bits of
the backtrace are unset. What happens is that, after the displaced
stepping has completed, GDB restores that return address at the location
where the program expects it.  But because the top half bits of
the address have been masked out, the return address is now invalid.
The incorrect behavior of the "next" command and the backtrace at
that location are the first symptoms of that.  Another symptom is
that this actually alters the behavior of the program, where a "cont"
from there soon leads to a SEGV when the inferior tries to jump back
to that incorrect return address:

    (gdb) c
    Continuing.

    Program received signal SIGSEGV, Segmentation fault.
    0x00000000f7bd86f9 in ?? ()
    ^^^^^^^^^^^^^^^^^^

This patch fixes the issue by using a mask that seems more appropriate
for this architecture.

gdb/ChangeLog:

        * amd64-tdep.c (amd64_displaced_step_fixup): Fix the mask used to
        compute RETADDR.

gdb/testsuite/ChangeLog:

        * gdb.base/dso2dso-dso2.c, gdb.base/dso2dso-dso2.h,
        gdb.base/dso2dso-dso1.c, gdb.base/dso2dso-dso1.h, gdb.base/dso2dso.c,
        gdb.base/dso2dso.exp: New files.

Tested on x86_64-linux, no regression.
ibuclaw pushed a commit that referenced this issue Sep 26, 2015
…g-bp.exp

Running that test in a loop, I found a gdbserver core dump with the
following back trace:

 Core was generated by `../gdbserver/gdbserver --once --multi :2346'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236
 236       return inferior->regcache_data;
 (gdb) up
 #1  0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31
 31        regcache = (struct regcache *) inferior_regcache_data (thread);
 (gdb) bt
 #0  0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236
 #1  0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31
 #2  0x0000000000409271 in prepare_resume_reply (buf=0x20dd593 "", ptid=..., status=0x20edce0) at src/gdb/gdbserver/remote-utils.c:1147
 #3  0x000000000040ab0a in vstop_notif_reply (event=0x20edcc0, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/server.c:183
 #4  0x0000000000426b38 in notif_write_event (notif=0x66e6c0 <notif_stop>, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/notif.c:69
 #5  0x0000000000426c55 in handle_notif_ack (own_buf=0x20dd590 "T05", packet_len=8) at src/gdb/gdbserver/notif.c:113
 #6  0x000000000041118f in handle_v_requests (own_buf=0x20dd590 "T05", packet_len=8, new_packet_len=0x7fff742c77b8)
     at src/gdb/gdbserver/server.c:2862
 #7  0x0000000000413850 in process_serial_event () at src/gdb/gdbserver/server.c:4148
 #8  0x0000000000413945 in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:4196
 #9  0x000000000041a1ef in handle_file_event (event_file_desc=5) at src/gdb/gdbserver/event-loop.c:429
 #10 0x00000000004199b6 in process_event () at src/gdb/gdbserver/event-loop.c:184
 #11 0x000000000041a735 in start_event_loop () at src/gdb/gdbserver/event-loop.c:547
 #12 0x00000000004123d2 in captured_main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3562
 #13 0x000000000041252e in main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3631

Clearly this means that a thread pushed a stop reply in the event
queue, and then before GDB confused the event, the whole process died,
along with its thread.  But the pending thread event was left
dangling.  When GDB fetched that event, gdbserver looked up the
corresponding thread, but found NULL; not expecting this, gdbserver
crashes when it tries to read this thread's registers.

gdb/gdbserver/
2015-08-21  Pedro Alves  <[email protected]>

	PR gdb/18749
	* inferiors.c (remove_thread): Discard any pending stop reply for
	this thread.
	* server.c (remove_all_on_match_pid): Rename to ...
	(remove_all_on_match_ptid): ... this.  Work with a filter ptid
	instead of a pid.
	(discard_queued_stop_replies): Change parameter to a ptid.  Now
	extern.
	(handle_v_kill, kill_inferior_callback)
	(process_serial_event): Adjust.
	(captured_main): Call initialize_notif before starting the
	program, thus before threads are created.
	* server.h (discard_queued_stop_replies): Declare.
ibuclaw pushed a commit that referenced this issue Apr 10, 2017
I build GDB with asan, and run test case hook-stop.exp, and threadapply.exp,
I got the following asan error,

=================================================================^M
^[[1m^[[31m==2291==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000999c4 at pc 0x000000826022 bp 0x7ffd28a8ff70 sp 0x7ffd28a8ff60^M
^[[1m^[[0m^[[1m^[[34mREAD of size 4 at 0x6160000999c4 thread T0^[[1m^[[0m^M
    #0 0x826021 in release_stop_context_cleanup ../../binutils-gdb/gdb/infrun.c:8203^M
    #1 0x72798a in do_my_cleanups ../../binutils-gdb/gdb/common/cleanups.c:154^M
    #2 0x727a32 in do_cleanups(cleanup*) ../../binutils-gdb/gdb/common/cleanups.c:176^M
    #3 0x826895 in normal_stop() ../../binutils-gdb/gdb/infrun.c:8381^M
    #4 0x815208 in fetch_inferior_event(void*) ../../binutils-gdb/gdb/infrun.c:4011^M
    #5 0x868aca in inferior_event_handler(inferior_event_type, void*) ../../binutils-gdb/gdb/inf-loop.c:44^M
....
^[[1m^[[32m0x6160000999c4 is located 68 bytes inside of 568-byte region [0x616000099980,0x616000099bb8)^M
^[[1m^[[0m^[[1m^[[35mfreed by thread T0 here:^[[1m^[[0m^M
    #0 0x7fb0bc1312ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)^M
    #1 0xb8c62f in xfree(void*) ../../binutils-gdb/gdb/common/common-utils.c:100^M
    #2 0x83df67 in free_thread ../../binutils-gdb/gdb/thread.c:207^M
    #3 0x83dfd2 in init_thread_list() ../../binutils-gdb/gdb/thread.c:223^M
    #4 0x805494 in kill_command ../../binutils-gdb/gdb/infcmd.c:2595^M
....

Detaching from program: /home/yao.qi/SourceCode/gnu/build-with-asan/gdb/testsuite/outputs/gdb.threads/threadapply/threadapply, process 2399^M
=================================================================^M
^[[1m^[[31m==2387==ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000a98c0 at pc 0x00000083fd28 bp 0x7ffd401c3110 sp 0x7ffd401c3100^M
^[[1m^[[0m^[[1m^[[34mREAD of size 4 at 0x6160000a98c0 thread T0^[[1m^[[0m^M
    #0 0x83fd27 in thread_alive ../../binutils-gdb/gdb/thread.c:741^M
    #1 0x844277 in thread_apply_all_command ../../binutils-gdb/gdb/thread.c:1804^M
....
^M
^[[1m^[[32m0x6160000a98c0 is located 64 bytes inside of 568-byte region [0x6160000a9880,0x6160000a9ab8)^M
^[[1m^[[0m^[[1m^[[35mfreed by thread T0 here:^[[1m^[[0m^M
    #0 0x7f59a7e322ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)^M
    #1 0xb8c62f in xfree(void*) ../../binutils-gdb/gdb/common/common-utils.c:100^M
    #2 0x83df67 in free_thread ../../binutils-gdb/gdb/thread.c:207^M
    #3 0x83dfd2 in init_thread_list() ../../binutils-gdb/gdb/thread.c:223^M

This patch fixes the issue by deleting thread_info object if it is
deletable, otherwise, mark it as exited (by set_thread_exited).
Function set_thread_exited is shared from delete_thread_1.  This patch
also moves field "refcount" to private and methods incref and
decref.  Additionally, we stop using "ptid_t" in
"struct current_thread_cleanup" to reference threads, instead we use
"thread_info" directly.  Due to this change, we don't need
restore_current_thread_ptid_changed anymore.

gdb:

2017-04-10  Yao Qi  <[email protected]>

	PR gdb/19942
	* gdbthread.h (thread_info::deletable): New method.
	(thread_info::incref): New method.
	(thread_info::decref): New method.
	(thread_info::refcount): Move it to private.
	* infrun.c (save_stop_context): Call inc_refcount.
	(release_stop_context_cleanup): Likewise.
	* thread.c (set_thread_exited): New function.
	(init_thread_list): Delete "tp" only it is deletable, otherwise
	call set_thread_exited.
	(delete_thread_1): Call set_thread_exited.
	(current_thread_cleanup) <inferior_pid>: Remove.
	<thread>: New field.
	(restore_current_thread_ptid_changed): Removed.
	(do_restore_current_thread_cleanup): Adjust.
	(restore_current_thread_cleanup_dtor): Don't call
	find_thread_ptid.
	(set_thread_refcount): Use dec_refcount.
	(make_cleanup_restore_current_thread): Adjust.
	(thread_apply_all_command): Call inc_refcount.
	(_initialize_thread): Don't call
	observer_attach_thread_ptid_changed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant