Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: Add cgroup rate support #2177

Merged
merged 17 commits into from
May 6, 2024
Merged

tetragon: Add cgroup rate support #2177

merged 17 commits into from
May 6, 2024

Conversation

olsajiri
Copy link
Contributor

@olsajiri olsajiri commented Mar 3, 2024

Adding support to monitor cgroup rate for exec, clone and exit events. The rate is configurable.

Copy link

netlify bot commented Mar 3, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 1d0435d
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/66309c21401ab600086f0f4b
😎 Deploy Preview https://deploy-preview-2177--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch from 9ef5863 to 9087d81 Compare March 4, 2024 09:57
@olsajiri olsajiri added the release-note/minor This PR introduces a minor user-visible change label Mar 4, 2024
@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch 6 times, most recently from 7c42c49 to 0745c09 Compare March 10, 2024 22:20
@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch 4 times, most recently from 1504be7 to 21d0856 Compare March 12, 2024 14:20
@olsajiri olsajiri changed the title Pr/olsajiri/cgroup rate tetragon: Add cgroup rate support Mar 12, 2024
@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch 6 times, most recently from d61b1f4 to 362bca3 Compare March 13, 2024 11:55
@kkourt kkourt self-requested a review March 13, 2024 12:31
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! left some first comments :)

bpf/process/bpf_rate.h Outdated Show resolved Hide resolved
bpf/process/bpf_rate.h Show resolved Hide resolved
bpf/process/bpf_rate.h Outdated Show resolved Hide resolved
@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch 8 times, most recently from ff897c8 to 7a0bc3f Compare March 25, 2024 10:49
@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch 3 times, most recently from 5aa8c90 to 8f9f4c8 Compare April 30, 2024 18:31
@olsajiri olsajiri marked this pull request as draft April 30, 2024 21:07
@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch 2 times, most recently from 0985904 to 3ee02af Compare May 4, 2024 10:30
olsajiri added 17 commits May 6, 2024 10:58
Adding throttle message that is going to be triggered when
the cgroup cross the rate limit or comes back.

  message ProcessThrottle {
    // Throttle type
    ThrottleType type = 1;
    // Cgroup name
    string cgroup = 2;
  }

Signed-off-by: Jiri Olsa <[email protected]>
Adding cgrouprate ebpf code that keeps track of cgroup rate
in per cpu hash.

We keep track of received events in 2 intervals and compute
slide window rate based on current time.

If the limit is crossed we throttle the cgroup and in following
changes we will also send an event to user space.

Signed-off-by: Jiri Olsa <[email protected]>
Adding --cgroup-rate=options to configure exec/fork/exit allowed
cgroup rates.

The option is in following format:

  'events,interval'

Example:

  --cgroup-rate="1000,1s"      # 1000/second rate
  --cgroup-rate="10000,5s"     # 10000/5 seconds rate

Signed-off-by: Jiri Olsa <[email protected]>
Adding cgroup rate rmdir sensor to clean up cgroup_rate_map entries.

At the moment it's only removal of the cgroup_rate_map entry,
so doing it unconditionally, because the overhead is negligent.

Signed-off-by: Jiri Olsa <[email protected]>
Adding base cgrouprate object which will be updated in
following changes

Signed-off-by: Jiri Olsa <[email protected]>
Adding cgrouprate logic for checking on cgroup rate limit.

  - cgrouprate channel is fed with cgroups that crossed the
    rate limit through Check function
  - cgrouprate keeps list of active (throttled) cgroups and
    periodically checks if the cgroup rate dropped down
  - throttle stop event is sent if the rate drops down under the
    limit or the cgroup has no traffic for more than 5 seconds

Signed-off-by: Jiri Olsa <[email protected]>
Adding throttle event that is sent from cgroup_rate function
when the rate limit is crossed and the cgroup is not throttled.

Signed-off-by: Jiri Olsa <[email protected]>
Adding several metrics to help debug/diagnose the cgrouprate code.

Signed-off-by: Jiri Olsa <[email protected]>
Create cgrouprate instance for tetragon and for observer test helper
plus adding and configuring related maps in base sensor.

Signed-off-by: Jiri Olsa <[email protected]>
Adding support to control cgroup rate for execve events.

Signed-off-by: Jiri Olsa <[email protected]>
Adding support to control cgroup rate for clone events.

Signed-off-by: Jiri Olsa <[email protected]>
Adding support to control cgroup rate for exit events.

Signed-off-by: Jiri Olsa <[email protected]>
The base sensor now depends on config options, but the option can
be enabled for specific test and base sensor is not recalculated.

Adding base sensor for observer (sensorTest) which contains all
possible programs and maps.

Signed-off-by: Jiri Olsa <[email protected]>
Adding throttle exec/fork event test that sets cgroup rate
limit and spawns bit more processes per second and triggers
throttle start event, then wait for throttle stop event.

Signed-off-by: Jiri Olsa <[email protected]>
Adding throttle processCgroup test that models possible cases
and checks the throttle stop event is properly sent or not.

Signed-off-by: Jiri Olsa <[email protected]>
Adding test for ParseCgroupRate function.

Signed-off-by: Jiri Olsa <[email protected]>
Adding support to display throttle events in getevents like:

  💥 exit    ubuntu-22 /usr/bin/sleep 0.1s 0
  🚀 process ubuntu-22 /usr/bin/sleep 0.1s
  🧬 throttle START session-130.scope-10741
  🧬 throttle STOP  session-130.scope-10741
  🚀 process ubuntu-22 /usr/bin/git diff
  🚀 process ubuntu-22 /usr/bin/pager

Signed-off-by: Jiri Olsa <[email protected]>
@olsajiri olsajiri force-pushed the pr/olsajiri/cgroup_rate branch from 3ee02af to 35dcbb8 Compare May 6, 2024 10:58
Copy link
Member

@tixxdz tixxdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, if Kornilios and Kev are good, LGMT for me ;-)

@olsajiri olsajiri marked this pull request as ready for review May 6, 2024 15:06
@olsajiri olsajiri merged commit 933b576 into main May 6, 2024
47 checks passed
@olsajiri olsajiri deleted the pr/olsajiri/cgroup_rate branch May 6, 2024 15:34
@Trung-DV
Copy link
Contributor

Trung-DV commented May 7, 2024

@kkourt
Copy link
Contributor

kkourt commented May 10, 2024

@Trung-DV We intentionally split changes into multiple patches. This makes things easier to understand when looking at git history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR introduces a minor user-visible change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants