-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create proposal for modern-bfp kernel-side filtering #1867
base: master
Are you sure you want to change the base?
Create proposal for modern-bfp kernel-side filtering #1867
Conversation
Signed-off-by: Steven Brzozowski <[email protected]>
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: stevenbrz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @stevenbrz! It looks like this is your first PR to falcosecurity/libs 🎉 |
Amazing, we will get to it soon likely after the next release is out. Thanks a bunch for all the efforts you are already putting into this! |
First of all, thank you for this proposal! my 2 cents:
At the moment I have no precise idea about the implementation but I would like to better understand the data we see here in this proposal. Starting from (
Looking at the mean event rate I see ~
For what concern drops, it would interesting to understand how the evt rate changes... i expect 2 things:
What is not clear to me at the moment is why we have worse 90-99th performance...maybe it is due to a slightly different load of the machine? If i'm not missing something the kernel filtering patch should always perform better if we only look at the userspace statistics like evt/s and drop percentage like we are doing here...Do you have an explanation for this?
If i understood weel you obtained these results by using a simple filter like filters:
- syscall: 257 # syscall number for `openat` on `x86_64`
arg: 1 # file path
prefixes: ["1", "2", "3", "4", "5", "6"] is this true? |
Just another minor question, have you ever tried this Falco config |
These datapoints are pulled from all of our hosts ranging in size from 32-128 CPUs. Here are some crude charts with the data from the filtering patch. It doesn't look like there's a clear correlation between event rate and drop rate:
We're using the default rule set with about 300 lines of override conditions filtering out noise applicable to our systems. Previously we ran a test with our custom rule set vs a single dummy rule against a synthetic workload and we didn't notice a significant difference in drop rate, so we concluded we should look elsewhere than the ruleset. If you think that could be contributing, I could revisit doing a more realistic test.
Yeah, the stats in the OP are for vanilla Falco 0.36.2 with the same ruleset as mentioned above.
I agree, I'd expect the same result. The only explanation I can think of is some performance regressions introduced between the two versions since the only functional difference between the vanilla version and our patch is some additional config processing at startup and then additional compute per hook, but that would result in overall system slowdown.
Yeah that's right.
Yeah we use this feature as it seemed like a good way to squeeze out some extra performance. |
Ok, thank you for the additional info! Since the problem is already complex and there is a lot of entropy I would like to focus on a unique node, WDYT? Evaluating metrics across nodes with different specs/load could become more difficult than it should. What we could do is:
The load during the tests should be similar otherwise data won't be so relevant. What I'm curious to understand is how the evt/s rate changes with the kernel filtering solution. The idea is to understand how effective the solution is in real-world scenarios. I would also be curious to see some real filters that you could use in production always to understand the real effectiveness of the solution. In the proposal I've seen some example paths like "/proc" and "/sys/fs/cgroup", are these real paths that you are using in production? BTW the current approach used seems fine to me. Of course, I would prefer a whitelist more than a blacklist approach, for example, we could take only paths under The other thing that I had in mind was a filter on the thread-id. Let's say you want only some processes with a certain command line, you could check the command line in userspace and add the tid in a hash table in the kernel, like many tools already do. This filter is more complex to manage because in someway you need to free the hash table from stale entries and you need to synchronize it with the userspace, but on the other side it would allow us a lot of flexibility, you can filter processes using almost any filter we have in user space, for example, you can filter out the children of a certain parent or other things like that. Of course also here we need to understand if we have real concrete use cases in production. Do you see any application of this approach for your use case? Moreover, this other approach is not in conflict with the one you proposed and could also be built easily on top of that. |
I was looking again at some of your data here (https://gist.github.com/stevenbrz/69391aa71b22d205b6add88ae10fb905#file-gistfile1-txt-L124) More in detail I can see that in many cases, it is the
You have 222665600 close exit events in 115 sec... and the |
We urgently need such capabilities. I just wanted to reiterate this!
I agree, bursts are likely a bigger problem.
I agree with all of the above.
I agree as well. It could be useful to base the high-volume tid identification on a robust combination of fields. However, the downside is that the events would need to reach userspace first. From a security perspective, you need to be careful not to accidentally filter out security-relevant information. For a kernel-side filtering goal, especially in cases with high kernel-side drops, it would be more robust to avoid relying on userspace.
This is a general problem for kernel-side filtering, as you only have certain information available at a time. During our last core maintainer call, Andrea, others, and I discussed the proposal a bit. We generally agreed that a hashing approach could be more powerful than string comparison. It would also align this approach with the existing userspace tid suppression and the userspace anomaly detection approach that is currently being developed. Based on that, I researched a bit and would like to share my initial ideas: Starting with kernel 5.16 (would this work for you btw?), we can use an official eBPF-supported approach https://docs.kernel.org/bpf/map_bloom_filter.html. The advantages include using an established, official method instead of something custom. Additionally, a bloom filter approach would be excellent because it is a proven algorithm that is useful for many problems and works effectively in production. For example, with your current approach, you are limited to the number of patterns you can push into the kernel (I read a maximum of 12). This limitation will hinder your ability to achieve the desired results. Instead, with a bloom filter, you could shuffle many distinct patterns into the filter(s), not limited to string types either. However, I also read that you wish to filter for variable-length file prefixes. What I was thinking is to traverse the filename and perform look ups after each sub-directory in the bloom filter and even the full path. For example, for As Andrea pointed out, we would also need correlated filtering, such as not sending the close event for the fds for which we dropped the open call for example. nit: we have robust methods in place to map the syscall string name to the syscall id, hence we can pass the syscall name to the config. |
I can work on doing some more formulaic tests given the bulk results were quite inconclusive yeah. Since our systems run varied workloads, I think it may be better to use synthetic tests like stress-ng so I'll pursue that. As for the feedback on the filtering implementation, I agree hashing is vastly preferable to string comparisons. I feel using a bloom filter may be overkill though since we could definitely just throw every prefix into a hash table and check for its presence kernel-side one dir at a time like you highlighted above. The limitation with number of prefixes comes from the verifier's complexity bound when doing linear string comparisons. We'd still have to do string parsing there but it should be a lot more efficient given we won't have a nested loop. |
More thoughts in favor of using the eBPF bloom filter:
On that note: I would be curious to see if you could move the needle even more when having much much more prefixes. It could also help to avoid accidentally filtering out important information given the prefixes are high-level (for example filtering out the entire
Yes, plus this approach can be easily extended to non strings, for example ips could be a very valuable use case for many adopters. |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
What type of PR is this?
/kind documentation
Any specific area of the project related to this PR?
/area libscap-engine-modern-bpf
/area proposals
What this PR does / why we need it:
In response to feedback in #1557, this adds a proposal for introducing a kernel-side event filtering feature to the modern-bpf driver. I hope this helps facilitate discussion on how we can best implement this feature.
Further detail for motivation and potential implementation can be found in the proposal.
cc. @bajzekm