Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the draft of the XDP scheduler testing tool #40

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

freysteinn
Copy link
Contributor

This commit contains the XDP scheduling framework. It consists of a
testing program called xdp_scheduler_tester, XDP and DEQUEUE schedulers,
and trace files that the xdp_scheduler_tester program uses to check the
XDP schedulers for correctness.

The FIFO and PIFO schedulers are fully functional in this commit.
However, it defines flows as UDP port numbers. However, future commits
will change the flow definition to a five-tuple instead.

The WFQ implementation is partially complete. It needs changes on the
dequeue hook and has a few verifier issues preventing the current
version from loading.

Signed-off-by: Frey Alfredsson [email protected]

Copy link
Member

@tohojo tohojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent start! There are some minor-ish comments on the BPF code, and then there's the bit where you're being a bit hacky to work around the lack of the API you need from dequeue. That last bit I think we should just fix by rebasing the kernel code instead of hacking around it; but there's plenty of other stuff for you to update in the meantime :)

As for the test runner, while I think the basic idea of the command file structure is good, I fear the recursive parser/runner thing is a bit too clever. The code is quite hard to follow, and I think it will be too easy to mess it up in subtle ways going forward. It also uses a bunch of unsafe string operations, and it doesn't handle errors properly.

So I'd like to see this restructured in a way that focuses on being obviously correct and readable. I left a bunch of comments throughout with concrete suggestions.

xdp-scheduler-tester/Makefile Outdated Show resolved Hide resolved
xdp-scheduler-tester/bpf_local_helpers.h Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_fifo.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_fifo.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_pifo.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_tester.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_tester.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_tester.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_tester.c Outdated Show resolved Hide resolved
{ "udp", cmd_udp, cmd_udp_init_fn, cmd_udp_fn, at_subcommand },
{ "dequeue", cmd_dequeue, NULL, NULL, at_subcommand },
{}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the commands basically describe a tree structure of allowed commands, right? Would be nice if we could visualise this somehow, it's a bit difficult to follow as-is. Maybe in a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. It describes the tree structure of allowed commands and parameter types.
My first version had the struct's member names in the declaration to see which parameters are what. I think it makes it easier to understand the fields, but it becomes harder to follow the tree's structure.

Yes, I agree. I think a clear comment at the beginning of the code with a visualization would help to explain the command structure and how the different structs chain together. The question is if it would help to change the definitions of the struct in a way to make it more transparent as well.

@freysteinn
Copy link
Contributor Author

This is an excellent start! There are some minor-ish comments on the BPF code, and then there's the bit where you're being a bit hacky to work around the lack of the API you need from dequeue. That last bit I think we should just fix by rebasing the kernel code instead of hacking around it; but there's plenty of other stuff for you to update in the meantime :)

Yes, absolutely. The hacky parts did not work anyway, but I wanted the core structure to see what comments you could give me on it. The rebase will definitely fix that.

As for the test runner, while I think the basic idea of the command file structure is good, I fear the recursive parser/runner thing is a bit too clever. The code is quite hard to follow, and I think it will be too easy to mess it up in subtle ways going forward. It also uses a bunch of unsafe string operations, and it doesn't handle errors properly.

Yes, I think it makes more sense to write a proper parser; it would be cleaner and easier to understand. It should not be that much more work either. I would say that the current version was my way of trying to avoid writing a proper parser, which makes the code harder to follow and hard to change and follow. I have had a few issues with the parser, and I suspect that there are more issues than I found.

So I'd like to see this restructured in a way that focuses on being obviously correct and readable. I left a bunch of comments throughout with concrete suggestions.
The main parts are making the parser code clearer and adding more comments on how the tables and commands work.

The comments have been constructive too. Thanks! :)

@tohojo
Copy link
Member

tohojo commented Mar 13, 2022

Yes, I think it makes more sense to write a proper parser; it would be cleaner and easier to understand. It should not be that much more work either. I would say that the current version was my way of trying to avoid writing a proper parser, which makes the code harder to follow and hard to change and follow. I have had a few issues with the parser, and I suspect that there are more issues than I found.

So before you go down this rabbit hole, I suggest you take a step back and re-evaluate the problem; spending a lot of time writing a parser for a custom DSL is definitely not a productive use of your time. See also Greenspun's tenth rule.

What you're doing with the command language is basically providing a scripting environment to enqueue and dequeue packets and inspect their contents. And you're already exposing functions that the scripting language can call to achieve this...

...so why not just go straight to a full scripting language and simply embed a Lua interpreter? That's really easy to do (see an example here, for instance: https://lucasklassmann.com/blog/2019-02-02-how-to-embeddeding-lua-in-c/ ) and you get things like loops etc for free (so you don't need the 'repeat' command, for instance). I wouldn't even try to do anything fancy with data types, just exposing your existing top-level commands (load_bpf(), enqueue_udp() and dequeue_udp()) should get you the functionality you need to implement your test framework :)

@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch from 8416368 to 8c1bbdf Compare March 28, 2022 12:59
@tohojo
Copy link
Member

tohojo commented Mar 28, 2022

Alright, so there were a couple of reasons for the verifier errors;

  • Failure to check the return value of parse_ip{,6}hdr()
  • Passing pointers into the packet around (so the verifier isn't satisfied that you did bounds checks at the point where you deref them)
  • Failure to drop the packet on error

With the following diff, the program loads for me (but outputs an error):

diff --git a/xdp-scheduler-tester/bpf_local_helpers.h b/xdp-scheduler-tester/bpf_local_helpers.h
index 953d476a12c9..2affc2d1669b 100644
--- a/xdp-scheduler-tester/bpf_local_helpers.h
+++ b/xdp-scheduler-tester/bpf_local_helpers.h
@@ -17,6 +17,7 @@
  *      the dequeued packet on success, or a negative error code on error.
  */
 static long (*bpf_packet_dequeue)(void *ctx, void *map, __u64 flags, __u64 *rank) = (void *) 194;
+static long (*bpf_packet_drop)(void *ctx, void *pkt) = (void *) 195;
 
 
 struct flow_address {
diff --git a/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c b/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
index 776c58337ee1..d8a8fbe09551 100644
--- a/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
+++ b/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
@@ -63,10 +63,14 @@ static __always_inline int parse_packet(struct parsing_context *pctx, struct pac
        p_info->eth_type = parse_ethhdr(&pctx->nh, pctx->data_end, &p_info->eth);
        if (p_info->eth_type == bpf_htons(ETH_P_IP)) {
                p_info->ip_type = parse_iphdr(&pctx->nh, pctx->data_end, &p_info->iph);
+               if (p_info->ip_type < 0)
+                       goto err;
                map_ipv4_to_ipv6(&p_info->flow.saddr.ip, p_info->iph->saddr);
                map_ipv4_to_ipv6(&p_info->flow.daddr.ip, p_info->iph->daddr);
        } else if (p_info->eth_type == bpf_htons(ETH_P_IPV6)) {
                p_info->ip_type = parse_ip6hdr(&pctx->nh, pctx->data_end, &p_info->ip6h);
+               if (p_info->ip_type < 0)
+                       goto err;
                p_info->flow.saddr.ip = p_info->ip6h->saddr;
                p_info->flow.daddr.ip = p_info->ip6h->daddr;
        } else {
@@ -102,7 +106,7 @@ static __always_inline void set_tuple(struct network_tuple *nt, struct packet_in
 
 static __always_inline int schedule_packet(struct parsing_context *pctx)
 {
-       struct packet_info p_info = {0};
+       struct packet_info p_info = {};
        struct network_tuple nt;
        struct flow_state *flow;
        struct flow_state new_flow;
@@ -113,7 +117,7 @@ static __always_inline int schedule_packet(struct parsing_context *pctx)
        if (parse_packet(pctx, &p_info) < 0)
                goto err;
 
-       set_tuple(&nt, &p_info);
+       nt = p_info.flow;
 
        flow = bpf_map_lookup_elem(&flow_state, &nt);
        if (!flow) {
@@ -146,7 +150,7 @@ int enqueue_prog(struct xdp_md *xdp)
                .data = (void *)(long)xdp->data,
                .data_end = (void *)(long)xdp->data_end,
                .pkt_len = (xdp->data_end - xdp->data) & 0xffff,
-               .nh = { .pos = xdp->data },
+               .nh = { .pos = (void *)(long)xdp->data },
        };
        return schedule_packet(&pctx);
 }
@@ -160,7 +164,7 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
        struct flow_state *flow;
        __u64 prio = 0;
 
-       struct xdp_md *pkt;
+       struct xdp_md *pkt = NULL;
 
        pkt = (void *)bpf_packet_dequeue(ctx, &pifo_map, 0, &prio);
        if (!pkt)
@@ -174,7 +178,7 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
        if (parse_packet(&pctx, &p_info) < 0)
                goto err;
 
-       set_tuple(&nt, &p_info);
+       nt = p_info.flow;
 
        flow = bpf_map_lookup_elem(&flow_state, &nt);
        if (!flow)
@@ -189,6 +193,8 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
        bpf_printk("DEQUEUE WFQ with priority %d", prio);
        return pkt;
 err:
+       if (pkt)
+               bpf_packet_drop(ctx, pkt);
        bpf_printk("DEQUEUE packet failed");
        return NULL;
 }

@tohojo
Copy link
Member

tohojo commented Apr 4, 2022

BTW, let me know when you think this PR is ready for another review (in its entirety); I assumed you wanted to incorporate fixes for those verifier errors first...

@freysteinn
Copy link
Contributor Author

BTW, let me know when you think this PR is ready for another review (in its entirety); I assumed you wanted to incorporate fixes for those verifier errors first...

Yes, I am close to pushing a new update with most of the issues here fixed and a functioning WFQ. The only thing missing so far is a way to assign flows weights. My thought was to define flows using a five-tuple, but if one of the fields is all ones, then it would be a wild card for any value for that field. Therefore, if I set the flow weight with UDP packets with src IP ~0, dst IP ~0, src port ~0, and port 8000. Then the flow would be assigned to all packets with the UDP port 8000. Does that sound like a good idea? Would zero be better? Or should I include a bitfield that's part of the tuple that defines which values to ignore?

@tohojo
Copy link
Member

tohojo commented Apr 5, 2022

That would require quite a few lookups for every packet, though? You'd have to check each of the wildcard possibilities for that flow tuple.

Doing this right (i.e., efficiently) is a whole subject in itself, so I'd say drop the wildcards for now and come back to it once everything else works :)

@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch 4 times, most recently from 144e7f1 to 23cf4f3 Compare April 6, 2022 14:21
@freysteinn
Copy link
Contributor Author

That would require quite a few lookups for every packet, though? You'd have to check each of the wildcard possibilities for that flow tuple.

Doing this right (i.e., efficiently) is a whole subject in itself, so I'd say drop the wildcards for now and come back to it once everything else works :)

Yes, you are right. I also talked to Anna and Per about this at our supervisor meeting, and we came to the same conclusion. I would still think an API with options for those that make schedulers in the future would be a good idea. For now, I agree that I should not spend time on it.

@freysteinn
Copy link
Contributor Author

BTW, let me know when you think this PR is ready for another review (in its entirety); I assumed you wanted to incorporate fixes for those verifier errors first...

I have updated this pull request with the newest changes. I would appreciate you looking at this one and checking the issues I left in the commit message.

@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch from 23cf4f3 to e6bc5cb Compare April 15, 2022 13:12
Copy link
Member

@tohojo tohojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments to the BPF code. It looks like you didn't address the previous comments to the userspace code, so I didn't review that again, just removed the "resolved" mark from a few of the old comments.

As for the logging.h inclusion, I merged the PR adding libxdp support to the repository, and also included logging.{h,c} as part of that. So if you rebase against master (in this repo) you should be able to just #include "logging.h" and use it :)

trace_fn init_func;
trace_fn func;
enum argument_type type;
} trace_cmd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still an issue...

xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_fifo.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_pifo.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_pifo.bpf.c Outdated Show resolved Hide resolved
if (fd < 0) {
fprintf(stderr, "socket failed...\n");
exit(EXIT_FAILURE);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not, in fact, seem to be resolved...

xdp-scheduler-tester/bpf_local_helpers.h Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c Outdated Show resolved Hide resolved
xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c Outdated Show resolved Hide resolved
@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch 5 times, most recently from 4f244fd to d402e53 Compare April 27, 2022 11:52
@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch 2 times, most recently from 65f86f1 to 826ea2f Compare May 20, 2022 14:53
@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch from 826ea2f to 31dc573 Compare June 10, 2022 18:55
@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch from 31dc573 to 3c1fe0f Compare June 23, 2022 07:21
@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch 2 times, most recently from 1c2080d to f31e548 Compare October 14, 2022 09:00
This commit contains the XDP scheduling framework. It consists of a
testing program called xdq-tester used to test schedulers using the XDP
and DEQUEUE hooks. It uses trace files written in Lua that the
xdq-tester program uses to check the XDP schedulers for correctness.

The FIFO, SPRIO, and WFQ are fully functional in this commit. The SPRIO
and WFQ have an API to set the weights from the Lua scripts.

This commit's FQ-CoDel contains sparse flow handling and testing. The
CoDel part has a bug in how it adds the metadata section in this version.

Signed-off-by: Frey Alfredsson <[email protected]>
@freysteinn freysteinn force-pushed the xdp_scheduler_tester branch from f31e548 to f305651 Compare October 24, 2022 12:59
I am not sure if I am doing something wrong or if XDQ is not storing the
adjusted metadata pointer after the bpf_xdp_adjust_meta function call.

When I try to access the data_meta pointer in the dequeue hook, I get a
different metadata pointer address from the XDP hook. I expect it to
point eight bytes back from the data pointer. However, it points to the
same address as the data pointer.

Signed-off-by: Frey Alfredsson <[email protected]>
This version gives a full stack trace when there is an issue. It also
has enumeration to check that the order of the packets is correct.

Signed-off-by: Frey Alfredsson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants