Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: Batch API fixes #395

Merged
merged 1 commit into from
May 5, 2022
Merged

Conversation

javierhonduco
Copy link
Contributor

@javierhonduco javierhonduco commented May 5, 2022

This PR is based off @kakkoyun's work to use libbpf(go)'s batch APIs.

Context

The main issue we found while working with this API was that they were erroring with EPERM.
After some debugging, we realised that libbpf wasn't handling errors in the way we
expected.

The debugging write-up and more context can be found here,
and the fix is in this PR.

After landing these changes upstream, we pointed to the updated libbpfgo version, as well as added
some regression tests to ensure that libbpfgo
behaves as expected, and to make it easier in the future to write further compatibility tests.

Note: the rest of the batch APIs error handling is still unfixed. Tracking in aquasecurity/libbpfgo#162.

Test plan

Checking that elements are deleted

$ GODEBUG=cgocheck=2 make ENABLE_ASAN=yes && sudo dist/parca-agent --node=test --log-level=debug --kubernetes=false --insecure --store-address=localhost:7070 --systemd-units=docker-0cd1ac1dee8df9859b5dd10b794b38b235ef5745cedfde653896dd09264cf615.scope |& grep "alue and delete bat"
mkdir -p dist/parca-agent.bpf

entries are deleted in batch:

level=debug ts=2022-05-05T15:15:48.916016494Z caller=profiler.go:426 labels="{__cgroup_path__=\"/sys/fs/cgroup/system.slice/docker-0cd1ac1dee8df9859b5dd10b794b38b235ef5745cedfde653896dd09264cf615.scope/\", node=\"test\", systemd_unit=\"docker-0cd1ac1dee8df9859b5dd10b794b38b235ef5745cedfde653896dd09264cf615.scope\"}" msg="get value and delete batch" batchSize=10240 processedCount=143
level=debug ts=2022-05-05T15:15:58.916015714Z caller=profiler.go:426 labels="{__cgroup_path__=\"/sys/fs/cgroup/system.slice/docker-0cd1ac1dee8df9859b5dd10b794b38b235ef5745cedfde653896dd09264cf615.scope/\", node=\"test\", systemd_unit=\"docker-0cd1ac1dee8df9859b5dd10b794b38b235ef5745cedfde653896dd09264cf615.scope\"}" msg="get value and delete batch" batchSize=10240 processedCount=148

Crosschecked with bpftool:

$ sudo bpftool prog | grep parca-agent -C5
	xlated 552B  jited 488B  memlock 4096B
387: perf_event  name do_sample  tag 293576d9ee42b903  gpl
	loaded_at 2022-05-05T08:15:38-0700  uid 0
	xlated 504B  jited 492B  memlock 4096B  map_ids 631,632,634
	btf_id 277
	pids parca-agent(599216)
	metadata:
		name = "parca-agent (https://github.com/parca-dev/parca-agent)"
$ sudo bpftool map dump  id 631 | grep "Found"
Found 114 elements
$ sudo bpftool map dump  id 631 | grep "Found"
Found 130 elements
$ sudo bpftool map dump  id 631 | grep "Found"
Found 0 elements
$ sudo bpftool map dump  id 631 | grep "Found"
Found 0 elements
$ sudo bpftool map dump  id 632 | jq length
144
$ sudo bpftool map dump  id 632 | jq length
2
$ sudo bpftool map dump  id 632 | jq length
15

CPU profiles are still working fine

With:

#include <iostream>
#include <unistd.h>
#include <fcntl.h>


int __attribute__ ((noinline))  top()
{
   for(int i=0; i<100000; i++) {
      int fd = open("/", O_DIRECTORY);
      close(fd);

   }
   return 1;
}


int __attribute__ ((noinline)) c1()
{
   return top();
}

int __attribute__ ((noinline))  b1()
{
   return c1();
}


int __attribute__ ((noinline))  a1()
{
   return b1();
}

int __attribute__ ((noinline)) c2()
{
   return top();
}

int __attribute__ ((noinline))  b2()
{
   return c2();
}


int __attribute__ ((noinline))  a2()
{
   return b2();
}

int __attribute__ ((noinline)) c3()
{
  sleep(1);
   return 1;
}

int __attribute__ ((noinline))  b3()
{
   return c3();
}


int __attribute__ ((noinline))  a3()
{
   return b3();
}


int main()
{
    while(true) {
        std::cout << "Calling a" << std::endl;
        a1();
        a2();
	a3();
    }
    return 0;
}

image

Which is accurate and correct 🎉

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing 💯 Thanks for taking care of it.

I have my comments inlined.

pkg/profiler/profiler.go Outdated Show resolved Hide resolved
pkg/profiler/profiler.go Show resolved Hide resolved
pkg/profiler/profiler.go Show resolved Hide resolved
pkg/profiler/profiler.go Show resolved Hide resolved
pkg/profiler/profiler.go Outdated Show resolved Hide resolved
pkg/profiler/profiler.go Outdated Show resolved Hide resolved
Details in PR (to avoid push forces creating repeating events on the GitHub UI)
@javierhonduco
Copy link
Contributor Author

Review feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants