Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add netstacklat tool #125

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

simosund
Copy link
Contributor

Initial implementation of the netstacklat tool, which is based on Jesper Brouers's bpftrace script for measuring latency up to certain certain points in the Linux network stack.

The eBPF portion of the program is designed to be compatible with ebpf_exporter. In theory one only has to drop in the netstack.bpf.c file into the ebpf_exporter/examples directory and add the corresponding yaml config file (which I will probably add to this repository as well later), although have not yet verified that it actually works as intended there.

In addition to finishing up existing commits, some points I wish to address before attempting to merge this request is:

  • Figure out how to use vmlinux.h properly in this setup (or if it's better to get definition of sk_buffer some other way)
  • Add option to the userspace program to enable software timestamping on select interface(s) to make it a bit easier to use? Currently you're unlikely to actually get any useful data out of this unless you enable software timestamping on relevant interfaces externally (as skb->tstamp will just be 0).
  • Verify that it works with ebpf_exporter and add yaml config file.

Add the tool netstacklat, which measures latency up to different parts
in the Linux ingress network stack. Base the initial implementation on
a bpftrace script from Jesper Dangaard Brouer, which requiers the
kernel to timestamp ingress packets (i.e. set the tstamp member of the
skb). Hence, the latency recorded by the tool is the difference
between the kernel timestamping point and various later points in the
network stack.

In this initial commit, include the eBPF programs for recording the
networks stack latency at the start of the kernel functions
tcp_v4_do_rcv(), tcp_data_queue(), and udp_queue_rcv_one_skb(). Use a
structure making it easy to extend the tool with additional hook
points in the future. Make the eBPF programs compatible with
Cloudflare's ebpf_exporter, and use the map helpers (maps.bpf.h) from
ebpf_exporter to ensure maps are used in a compatible way.

Open code the histogram maps for different hook points as entirely
separate maps, instead of encoding the different hook points in the
key of a separate map as ebpf_exporter often does. This avoids costly
hashmap lookups, as simple array maps can be used instead of hash
maps.

Also include a minimal user space loader, which loads and attaches the
eBPF programs. Later commits will extend this loader to also report
the recorded latencies stored in the BPF maps.

Signed-off-by: Simon Sundberg <[email protected]>
Add functionality to the user space component to periodically fetch
the BPF maps netstacklat records the values in and print them out.

Base the core program loop on the same setup as pping, where a single
epoll instance is used to support multiple different types of
events. So far it only deals with signal handling (for clean shutdown)
plus a timer (for periodical reporting), but the setup can easily be
extended if the program grows more complex in the future.

Use the (somewhat complicated) bpf_map_lookup_batch to fetch the
entire histogram maps in a single system call (instead of performing a
lookup for each bin index individually).

Signed-off-by: Simon Sundberg <[email protected]>
@simosund
Copy link
Contributor Author

So, I now belive I have a full "minimal working example" that quite closely replicates the bpftrace script and prints out its output in a similar format. There are still many missing features that would be nice to have, such as:

  • Parse some command-line arguments, letting the user configure e.g. the reporting interval (current hard coded to 10s)
  • Automatically enable software RX timestamping (on all interfaces or interfaces provided as command line arguments)
  • Automatically figure out the TAI-offset at start up (currently hard coded as 37 seconds)
  • Figure out clock basis for skb->tstamp (currently assumes CLOCK_REALTIME, but on at least one machine that did not seem to be the case), and possibly if it's valid (right now it assumes any non-zero timestamps are valid)

But figure that this should be sufficent for an early review at least.

One issue worth highlighting is that it's currently not fully compatible with ebpf-exporter in the sense that you can't just drop in the netstacklat.bpf.c file. First of there are some shared defines and enums in netstacklat.h that netstacklat.bpf.c depends on, so unless you also copy over that header (which seems to go against the style of the existing examples at ebpf-exporter/examples) you need to first insert those into the netstacklat.bpf.c file. Furthermore, netstacklat.bpf.c does #include "vmlinux_local.h" as expected for this bpf-examples repository, but the ebpf-exporter examples seem to expect #include <vmlinux.h> instead. Not sure if there's any clever pre-processor magic or other tricks you could use to solve this.

For licensing, I've put my own files under GPL-2.0 as most other examples here. However, I've also included the same version of bits.bpf.h header from bcc/libbpf-tools (same version as ebpf-exporter uses), which then retains its dual LGPL/BSD license, and maps.bpf.h from ebpf-exporter that I put under MIT as that seems to be what ebpf-exporter in general uses.

As I mentioned during the meeting, I intentionally opted to use separate BPF maps for each histogram rather than multiplex multiple histograms in the same map as most of the ebpf-exporter examples seem to do, e.g. here. By doing so I can simply use array maps instead of hashmaps and thereby reduce the overhead for map lookups a bit.

@simosund simosund marked this pull request as ready for review January 14, 2025 10:46
Add a README briefly explaining what this example does.

Signed-off-by: Simon Sundberg <[email protected]>
@simosund
Copy link
Contributor Author

Oh, and ping @netoptimizer if Github hasn't already done so (not intending to rush you, just want to make sure you're aware this PR is ready to be reviewed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant