-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds the draft of the XDP scheduler testing tool #40
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent start! There are some minor-ish comments on the BPF code, and then there's the bit where you're being a bit hacky to work around the lack of the API you need from dequeue. That last bit I think we should just fix by rebasing the kernel code instead of hacking around it; but there's plenty of other stuff for you to update in the meantime :)
As for the test runner, while I think the basic idea of the command file structure is good, I fear the recursive parser/runner thing is a bit too clever. The code is quite hard to follow, and I think it will be too easy to mess it up in subtle ways going forward. It also uses a bunch of unsafe string operations, and it doesn't handle errors properly.
So I'd like to see this restructured in a way that focuses on being obviously correct and readable. I left a bunch of comments throughout with concrete suggestions.
{ "udp", cmd_udp, cmd_udp_init_fn, cmd_udp_fn, at_subcommand }, | ||
{ "dequeue", cmd_dequeue, NULL, NULL, at_subcommand }, | ||
{} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the commands basically describe a tree structure of allowed commands, right? Would be nice if we could visualise this somehow, it's a bit difficult to follow as-is. Maybe in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. It describes the tree structure of allowed commands and parameter types.
My first version had the struct's member names in the declaration to see which parameters are what. I think it makes it easier to understand the fields, but it becomes harder to follow the tree's structure.
Yes, I agree. I think a clear comment at the beginning of the code with a visualization would help to explain the command structure and how the different structs chain together. The question is if it would help to change the definitions of the struct in a way to make it more transparent as well.
Yes, absolutely. The hacky parts did not work anyway, but I wanted the core structure to see what comments you could give me on it. The rebase will definitely fix that.
Yes, I think it makes more sense to write a proper parser; it would be cleaner and easier to understand. It should not be that much more work either. I would say that the current version was my way of trying to avoid writing a proper parser, which makes the code harder to follow and hard to change and follow. I have had a few issues with the parser, and I suspect that there are more issues than I found.
The comments have been constructive too. Thanks! :) |
So before you go down this rabbit hole, I suggest you take a step back and re-evaluate the problem; spending a lot of time writing a parser for a custom DSL is definitely not a productive use of your time. See also Greenspun's tenth rule. What you're doing with the command language is basically providing a scripting environment to enqueue and dequeue packets and inspect their contents. And you're already exposing functions that the scripting language can call to achieve this... ...so why not just go straight to a full scripting language and simply embed a Lua interpreter? That's really easy to do (see an example here, for instance: https://lucasklassmann.com/blog/2019-02-02-how-to-embeddeding-lua-in-c/ ) and you get things like loops etc for free (so you don't need the 'repeat' command, for instance). I wouldn't even try to do anything fancy with data types, just exposing your existing top-level commands ( |
8416368
to
8c1bbdf
Compare
Alright, so there were a couple of reasons for the verifier errors;
With the following diff, the program loads for me (but outputs an error): diff --git a/xdp-scheduler-tester/bpf_local_helpers.h b/xdp-scheduler-tester/bpf_local_helpers.h
index 953d476a12c9..2affc2d1669b 100644
--- a/xdp-scheduler-tester/bpf_local_helpers.h
+++ b/xdp-scheduler-tester/bpf_local_helpers.h
@@ -17,6 +17,7 @@
* the dequeued packet on success, or a negative error code on error.
*/
static long (*bpf_packet_dequeue)(void *ctx, void *map, __u64 flags, __u64 *rank) = (void *) 194;
+static long (*bpf_packet_drop)(void *ctx, void *pkt) = (void *) 195;
struct flow_address {
diff --git a/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c b/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
index 776c58337ee1..d8a8fbe09551 100644
--- a/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
+++ b/xdp-scheduler-tester/xdp_scheduler_wfq.bpf.c
@@ -63,10 +63,14 @@ static __always_inline int parse_packet(struct parsing_context *pctx, struct pac
p_info->eth_type = parse_ethhdr(&pctx->nh, pctx->data_end, &p_info->eth);
if (p_info->eth_type == bpf_htons(ETH_P_IP)) {
p_info->ip_type = parse_iphdr(&pctx->nh, pctx->data_end, &p_info->iph);
+ if (p_info->ip_type < 0)
+ goto err;
map_ipv4_to_ipv6(&p_info->flow.saddr.ip, p_info->iph->saddr);
map_ipv4_to_ipv6(&p_info->flow.daddr.ip, p_info->iph->daddr);
} else if (p_info->eth_type == bpf_htons(ETH_P_IPV6)) {
p_info->ip_type = parse_ip6hdr(&pctx->nh, pctx->data_end, &p_info->ip6h);
+ if (p_info->ip_type < 0)
+ goto err;
p_info->flow.saddr.ip = p_info->ip6h->saddr;
p_info->flow.daddr.ip = p_info->ip6h->daddr;
} else {
@@ -102,7 +106,7 @@ static __always_inline void set_tuple(struct network_tuple *nt, struct packet_in
static __always_inline int schedule_packet(struct parsing_context *pctx)
{
- struct packet_info p_info = {0};
+ struct packet_info p_info = {};
struct network_tuple nt;
struct flow_state *flow;
struct flow_state new_flow;
@@ -113,7 +117,7 @@ static __always_inline int schedule_packet(struct parsing_context *pctx)
if (parse_packet(pctx, &p_info) < 0)
goto err;
- set_tuple(&nt, &p_info);
+ nt = p_info.flow;
flow = bpf_map_lookup_elem(&flow_state, &nt);
if (!flow) {
@@ -146,7 +150,7 @@ int enqueue_prog(struct xdp_md *xdp)
.data = (void *)(long)xdp->data,
.data_end = (void *)(long)xdp->data_end,
.pkt_len = (xdp->data_end - xdp->data) & 0xffff,
- .nh = { .pos = xdp->data },
+ .nh = { .pos = (void *)(long)xdp->data },
};
return schedule_packet(&pctx);
}
@@ -160,7 +164,7 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
struct flow_state *flow;
__u64 prio = 0;
- struct xdp_md *pkt;
+ struct xdp_md *pkt = NULL;
pkt = (void *)bpf_packet_dequeue(ctx, &pifo_map, 0, &prio);
if (!pkt)
@@ -174,7 +178,7 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
if (parse_packet(&pctx, &p_info) < 0)
goto err;
- set_tuple(&nt, &p_info);
+ nt = p_info.flow;
flow = bpf_map_lookup_elem(&flow_state, &nt);
if (!flow)
@@ -189,6 +193,8 @@ void *dequeue_prog(struct dequeue_ctx *ctx)
bpf_printk("DEQUEUE WFQ with priority %d", prio);
return pkt;
err:
+ if (pkt)
+ bpf_packet_drop(ctx, pkt);
bpf_printk("DEQUEUE packet failed");
return NULL;
} |
BTW, let me know when you think this PR is ready for another review (in its entirety); I assumed you wanted to incorporate fixes for those verifier errors first... |
Yes, I am close to pushing a new update with most of the issues here fixed and a functioning WFQ. The only thing missing so far is a way to assign flows weights. My thought was to define flows using a five-tuple, but if one of the fields is all ones, then it would be a wild card for any value for that field. Therefore, if I set the flow weight with UDP packets with src IP ~0, dst IP ~0, src port ~0, and port 8000. Then the flow would be assigned to all packets with the UDP port 8000. Does that sound like a good idea? Would zero be better? Or should I include a bitfield that's part of the tuple that defines which values to ignore? |
That would require quite a few lookups for every packet, though? You'd have to check each of the wildcard possibilities for that flow tuple. Doing this right (i.e., efficiently) is a whole subject in itself, so I'd say drop the wildcards for now and come back to it once everything else works :) |
144e7f1
to
23cf4f3
Compare
Yes, you are right. I also talked to Anna and Per about this at our supervisor meeting, and we came to the same conclusion. I would still think an API with options for those that make schedulers in the future would be a good idea. For now, I agree that I should not spend time on it. |
I have updated this pull request with the newest changes. I would appreciate you looking at this one and checking the issues I left in the commit message. |
23cf4f3
to
e6bc5cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments to the BPF code. It looks like you didn't address the previous comments to the userspace code, so I didn't review that again, just removed the "resolved" mark from a few of the old comments.
As for the logging.h inclusion, I merged the PR adding libxdp support to the repository, and also included logging.{h,c}
as part of that. So if you rebase against master (in this repo) you should be able to just #include "logging.h"
and use it :)
trace_fn init_func; | ||
trace_fn func; | ||
enum argument_type type; | ||
} trace_cmd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still an issue...
if (fd < 0) { | ||
fprintf(stderr, "socket failed...\n"); | ||
exit(EXIT_FAILURE); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not, in fact, seem to be resolved...
4f244fd
to
d402e53
Compare
65f86f1
to
826ea2f
Compare
826ea2f
to
31dc573
Compare
31dc573
to
3c1fe0f
Compare
1c2080d
to
f31e548
Compare
This commit contains the XDP scheduling framework. It consists of a testing program called xdq-tester used to test schedulers using the XDP and DEQUEUE hooks. It uses trace files written in Lua that the xdq-tester program uses to check the XDP schedulers for correctness. The FIFO, SPRIO, and WFQ are fully functional in this commit. The SPRIO and WFQ have an API to set the weights from the Lua scripts. This commit's FQ-CoDel contains sparse flow handling and testing. The CoDel part has a bug in how it adds the metadata section in this version. Signed-off-by: Frey Alfredsson <[email protected]>
f31e548
to
f305651
Compare
I am not sure if I am doing something wrong or if XDQ is not storing the adjusted metadata pointer after the bpf_xdp_adjust_meta function call. When I try to access the data_meta pointer in the dequeue hook, I get a different metadata pointer address from the XDP hook. I expect it to point eight bytes back from the data pointer. However, it points to the same address as the data pointer. Signed-off-by: Frey Alfredsson <[email protected]>
This version gives a full stack trace when there is an issue. It also has enumeration to check that the order of the packets is correct. Signed-off-by: Frey Alfredsson <[email protected]>
This commit contains the XDP scheduling framework. It consists of a
testing program called xdp_scheduler_tester, XDP and DEQUEUE schedulers,
and trace files that the xdp_scheduler_tester program uses to check the
XDP schedulers for correctness.
The FIFO and PIFO schedulers are fully functional in this commit.
However, it defines flows as UDP port numbers. However, future commits
will change the flow definition to a five-tuple instead.
The WFQ implementation is partially complete. It needs changes on the
dequeue hook and has a few verifier issues preventing the current
version from loading.
Signed-off-by: Frey Alfredsson [email protected]