feat(stdlib): Add `parse_auditd` vrl function #1010

jorgehermo9 · 2024-08-30T18:06:27Z

Closes #66

I have a few things left (that I'm aware of):

Changelog
Documentation PR in Vector repo
Add this function to benchmarks
linux-auditd-parse crate should have a license compatible with Vector (see Issues with the chosen license (LGPL3+) hillu/linux-audit-parser-rs#1)
Better examples
Contribute to linux-auditd-parse for a few things that are left as TODOs

I added a few TODOs in the code for some things I had doubts. I'll appreciate feedback on those.

Currently, logs are parsed into something similar to

{
    "body": {
        "auid": 1000,
        "format": "enriched",
        "kernel": "6.10.4-arch2-1",
        "op": "start",
        "pid": 1240242,
        "res": "success",
        "ses": 2,
        "uid": 0,
        "ver": "4.0.2"
    },
    "enrichment": {
        "AUID": "vrl",
        "UID": "root"
    },
    "sequence": 6439,
    "timestamp": "2024-08-23T14:27:54.618Z",
    "type": "DAEMON_START"
}

Which is a bit more structured from what has been proposed in #66, but I'm open to any suggestion about this format

jszwedko

Awesome! I'm happy to see this work started as it is an oft-requested feature. I left a few comments on the TODOs. I also marked this as draft for now until it is ready for final review. It seems like the biggest blocker at the moment is the licensing of the linux-audit-parser-rs crate.

jszwedko · 2024-09-05T23:47:27Z

src/stdlib/parse_auditd.rs

+fn parse_auditd(bytes: Value) -> Resolved {
+    let bytes = bytes.try_bytes()?;
+    // check if bytes ends with newline, otherwise append it
+    // TODO: make the parser accept bytes without newline in the linux_audit_parser crate (to-be-contributed)


Agreed, it seems like the parser should work without trailing newlines.

depends on hillu/linux-audit-parser-rs#3

jszwedko · 2024-09-05T23:50:58Z

src/stdlib/parse_auditd.rs

+    // Should we use UTC as the timezone? The logs are generated with the system's
+    // timezone... What is the correct behavior? Maybe the system where vector is running
+    // have a different timezone than the system that generated the logs... so it is
+    // not correct to assume that the current system's timezone is the correct one
+    // (with TimeZone::timestamp_millis_opt)
+    // TODO: we should document that the timestamp is parsed into UTC timezone.
+    // TODO: Maybe we should accept in a parameter a custom timezone to parse the timestamp?


I think we should rely on the timezone that is injected into the VRL runtime. See an example in the parse_syslog function:

vrl/src/stdlib/parse_syslog.rs

Lines 6 to 19 in c6f22a1

pub(crate) fn parse_syslog(value: Value, ctx: &Context) -> Resolved {

let message = value.try_bytes_utf8_lossy()?;

let timezone = match ctx.timezone() {

TimeZone::Local => None,

TimeZone::Named(tz) => Some(*tz),

};

let parsed = syslog_loose::parse_message_with_year_exact_tz(

&message,

resolve_year,

timezone,

Variant::Either,

)?;

Ok(message_to_value(parsed))

}

This timezone is configurable in Vector (e.g. on the remap transform: https://vector.dev/docs/reference/configuration/transforms/remap/#timezone).

jszwedko · 2024-09-05T23:51:35Z

src/stdlib/parse_auditd.rs

+    let (enrichment, body): (ObjectMap, ObjectMap) = parsed
+        .body
+        .into_iter()
+        // TODO: improve this auditd crate with a IntoIter implementation for Body and not only


Agreed 👍

depends on hillu/linux-audit-parser-rs#2

jszwedko · 2024-09-05T23:55:28Z

src/stdlib/parse_auditd.rs

+            // TODO: should we store hexadecimals as its integer value or as an hexadecimal string?
+            // TODO: Uppercase hexa or lowercase hex format?


Good question. Do you happen to know how users would typically want to consume these values or what fields typically contain them? I think the answer to this question likely depends on what sorts of operations users need to do on these values (if any).

I don't really know...In the raw log lines this is shown as a0=55b26d44a6a0 so maybe it makes sense to display it as "a0": "0x55b26d44a6a0" in the output value in my opinion. But I would love to know what requesters think about this...

jorgehermo9 · 2024-10-18T17:14:16Z

Hi @jszwedko, I'm planning to address the review's comments now that the licensing issue was discussed.

How can we address the feature-flag thing to comply with the license as you depicted in hillu/linux-audit-parser-rs#1 (comment)? I've added it as an optional dependency in this PR and added to the stdlib feature flag

vrl/Cargo.toml

Line 87 in 95d6951

"dep:linux-audit-parser",

but I don't think it is enough for what you said in that comment. What solution do you propose?

jorgehermo9 · 2024-10-21T11:54:25Z

src/stdlib/parse_auditd.rs

+                        "grantors" => "pam_unix,pam_permit,pam_time",
+                        "acct" => "vrl",
+                        "exe" => "/usr/bin/sudo",
+                        "hostname" => Value::Null,


Fields that are missing in the audit log (for example, hostname="?") are parsed into Null values.

They are shown like this in vector:

{ "body": { "auid": 4294967295, "msg": { "addr": null, "comm": "systemd", "exe": "/usr/lib/systemd/systemd", "hostname": null, "res": "success", "terminal": null, "unit": "wpa_supplicant" }, "pid": 1, "ses": 4294967295, "uid": 0 }, "enrichment": { "AUID": "unset", "UID": "root" }, "sequence": 165, "timestamp": "2024-08-30T18:14:22.240Z", "type": "SERVICE_STOP" }

Should we filter out that fields, or leave the entry with a null value, so people know that it is present but with no value?

jorgehermo9 · 2024-10-21T12:00:16Z

src/stdlib/parse_auditd.rs

+    log.insert("body".into(), Value::from(body));
+
+    // TODO: should we downcase the keys of the enrichment so that they match its body counterparts?
+    if !enrichment.is_empty() {


In auditd, the association between fields and its enriched value is done with capital letters (see this documentation).

But that limitation is because the log format is plain and it really seems like a workaround.

As we can have a more structured representation and the enrichment fields are separated in another map, maybe we can downcase the keys and have something like this instead:

(deleted some object keys to simplify the example)

{ "body": { "auid": 4294967295, "uid": 0 }, "enrichment": { "auid": "unset", "uid": "root" } }

so people can link the body fields and the enrichment fields easily, as this way, they enriched counterparts will exactly match the raw keys, instead of capitalizing the keys before looking them up in the enrichment.

What do you think? I don't really know how people would really consume this so I would appreciate some feedback from people who would actually use this (#66 requesters)

jorgehermo9 · 2024-10-21T12:02:56Z

hillu/linux-audit-parser-rs@8f39651 Added support for nested parsing the msg filed. Now the events are parsed into something similar to

{
    "body": {
        "auid": 4294967295,
        "msg": {
            "addr": null,
            "comm": "systemd",
            "exe": "/usr/lib/systemd/systemd",
            "hostname": null,
            "res": "success",
            "terminal": null,
            "unit": "systemd-update-utmp"
        },
        "pid": 1,
        "ses": 4294967295,
        "uid": 0
    },
    "enrichment": {
        "AUID": "unset",
        "UID": "root"
    },
    "sequence": 174,
    "timestamp": "2024-08-30T18:14:22.454Z",
    "type": "SERVICE_STOP"
}

note that the msg field is now more similar to what is requested in #66

Tested everything with this config

[sources.file]
type = "file"
include = ["/var/log/audit/audit.log"]
data_dir = "./data"

[transforms.auditd]
type = "remap"
inputs = ["file"]
source = """
. = parse_auditd!(.message)
"""

[sinks.console]
type = "console"
inputs = ["auditd"]
encoding.codec = "json"

And it parsed a bunch of auditd logs with no errors at all! (in order to test it, you would need to enable the auditd systemd service)

jorgehermo9 · 2024-10-21T15:47:42Z

src/stdlib/parse_auditd.rs

+                    })
+                    .collect::<Result<ObjectMap, _>>()?,
+            ),
+            // There are a few values that `linux-audit-parser` does not return in its parsing


depends on hillu/linux-audit-parser-rs#4

jszwedko · 2024-10-24T00:56:01Z

Hi @jszwedko, I'm planning to address the review's comments now that the licensing issue was discussed.

How can we address the feature-flag thing to comply with the license as you depicted in hillu/linux-audit-parser-rs#1 (comment)? I've added it as an optional dependency in this PR and added to the stdlib feature flag

vrl/Cargo.toml

Line 87 in 95d6951

"dep:linux-audit-parser",

but I don't think it is enough for what you said in that comment. What solution do you propose?

Hey! Thanks for raising this again. After thinking about it some more, I'd prefer to avoid including LGPL dependencies in Vector, even as optional, since they run the risk of someone accidentally including them without realizing it (or us expanding usage without realizing it). I'd (unfortunately 🙁) prefer to see us find another dependency to parse auditd logs. We could even just leverage the existing support for grok parsing by using specific patterns to parse auditd logs. Example: https://gist.github.com/artbikes/2313040

jszwedko · 2024-10-24T00:56:42Z

Apologies, I know this has been a bit of a long road to get to this point 😓 I'm curious what you think.

jorgehermo9 · 2024-10-24T18:57:17Z

Hi @jszwedko !

I'd prefer to avoid including LGPL dependencies in Vector, even as optional, since they run the risk of someone accidentally including them without realizing it

I totally agree with that. It also was one of my main concerns about this...

prefer to see us find another dependency to parse auditd logs

I don't think there is something in rust yet. If I have enough time, I may end up writing my own parser with a MPL-compatible license. Maybe I can inspire from https://github.com/elastic/go-libaudit/blob/main/auparse/auparse.go as It is licensed as Apache2.0 and I think rewriting it (or inspiring from it) in rust could be feasible.

However, this may delay things a little bit, but I'll give it a try when I find enough time.

Apologies, I know this has been a bit of a long road to get to this point 😓 I'm curious what you think.
Don't worry! Almost all VRL-related work is already done and it won't change a lot just by changing the underlying parser library. I think the same as you and I totally understand it.

Thanks! I may give a bump about this in a few weeks/month if I manage to implement the parser, but pausing it the moment.

jszwedko · 2024-10-24T20:26:03Z

@jorgehermo9 thanks for understanding. I do think we might be able to get by with grok parsing instead of a native Rust parser. It'll be less efficient, but might be easier to get going. Worth considering anyway :)

jorgehermo9 added 11 commits August 25, 2024 20:19

feat(stdlib): Initial implementation of parse_auditd function

2adda59

feat(stdlib): Initial implementation of auditd functiopn

091d59b

feat(stdlib): Initial implementation of auditd functiopn

987fcd2

feat(stdlib): Initial implementation of auditd functiopn

42e44f6

feat(stdlib): Initial implementation of auditd functiopn

890a9c3

feat(stdlib): Initial implementation of auditd functiopn

c86d208

docs: fix example

3c515a6

test(stdlib): improve test for parse_auditd

a64fe7e

Merge with main

4c9da9b

test(stdlib): remove comments from tests

f690472

chore: update LICENSE-3rdparty.csv

95d6951

jorgehermo9 marked this pull request as ready for review August 30, 2024 18:10

jszwedko marked this pull request as draft September 5, 2024 23:46

jszwedko reviewed Sep 5, 2024

View reviewed changes

jorgehermo9 mentioned this pull request Oct 18, 2024

Request to add a VRL function to parse logs generate by auditd vectordotdev/vector#21546

Closed

jorgehermo9 added 3 commits October 21, 2024 13:28

feat: update linux-audit-parser version

0bd04e2

style: add whitespace to test name

7f52c6a

Merge with main

5dd8351

jorgehermo9 commented Oct 21, 2024

View reviewed changes

jorgehermo9 mentioned this pull request Oct 24, 2024

Issues with the chosen license (LGPL3+) hillu/linux-audit-parser-rs#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stdlib): Add `parse_auditd` vrl function #1010

feat(stdlib): Add `parse_auditd` vrl function #1010

jorgehermo9 commented Aug 30, 2024 •

edited

Loading

jszwedko left a comment

jszwedko Sep 5, 2024

jorgehermo9 Oct 20, 2024 •

edited

Loading

jszwedko Sep 5, 2024

jszwedko Sep 5, 2024

jorgehermo9 Oct 20, 2024 •

edited

Loading

jszwedko Sep 5, 2024

jorgehermo9 Oct 21, 2024 •

edited

Loading

jorgehermo9 commented Oct 18, 2024

jorgehermo9 Oct 21, 2024

jorgehermo9 Oct 21, 2024 •

edited

Loading

jorgehermo9 commented Oct 21, 2024 •

edited

Loading

jorgehermo9 Oct 21, 2024

jszwedko commented Oct 24, 2024

jszwedko commented Oct 24, 2024

jorgehermo9 commented Oct 24, 2024

jszwedko commented Oct 24, 2024

	pub(crate) fn parse_syslog(value: Value, ctx: &Context) -> Resolved {
	let message = value.try_bytes_utf8_lossy()?;
	let timezone = match ctx.timezone() {
	TimeZone::Local => None,
	TimeZone::Named(tz) => Some(*tz),
	};
	let parsed = syslog_loose::parse_message_with_year_exact_tz(
	&message,
	resolve_year,
	timezone,
	Variant::Either,
	)?;
	Ok(message_to_value(parsed))
	}

		// TODO: should we store hexadecimals as its integer value or as an hexadecimal string?
		// TODO: Uppercase hexa or lowercase hex format?

feat(stdlib): Add parse_auditd vrl function #1010

Are you sure you want to change the base?

feat(stdlib): Add parse_auditd vrl function #1010

Conversation

jorgehermo9 commented Aug 30, 2024 • edited Loading

jszwedko left a comment

Choose a reason for hiding this comment

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

jorgehermo9 Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

jorgehermo9 Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

jszwedko Sep 5, 2024

Choose a reason for hiding this comment

jorgehermo9 Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

jorgehermo9 commented Oct 18, 2024

jorgehermo9 Oct 21, 2024

Choose a reason for hiding this comment

jorgehermo9 Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

jorgehermo9 commented Oct 21, 2024 • edited Loading

jorgehermo9 Oct 21, 2024

Choose a reason for hiding this comment

jszwedko commented Oct 24, 2024

jszwedko commented Oct 24, 2024

jorgehermo9 commented Oct 24, 2024

jszwedko commented Oct 24, 2024

feat(stdlib): Add `parse_auditd` vrl function #1010

feat(stdlib): Add `parse_auditd` vrl function #1010

jorgehermo9 commented Aug 30, 2024 •

edited

Loading

jorgehermo9 Oct 20, 2024 •

edited

Loading

jorgehermo9 Oct 20, 2024 •

edited

Loading

jorgehermo9 Oct 21, 2024 •

edited

Loading

jorgehermo9 Oct 21, 2024 •

edited

Loading

jorgehermo9 commented Oct 21, 2024 •

edited

Loading