Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(iroh): AsyncUdpSocket backpressure when connection type is *not* mixed #3117

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

matheus23
Copy link
Contributor

@matheus23 matheus23 commented Jan 10, 2025

Description

With the previous logic, when relay_url is None, then relay_pending is false.
The same is true with udp_addr and udp_pending.
However, we only ever report WouldBlock if udp_pending and relay_pending.
So this means we only ever report WouldBlock if relay_url isn't None and udp_addr isn't None, so when we're in mixed connection mode.
Because in e.g. direct connection mode, relay_url is None, thus relay_pending is false and we never return Err(WouldBlock).

Notes & open questions

I'm curious about netsim numbers. No changes

Maybe this should have a test. EDIT: Hm this is really hard to test. I can't find a way to 'induce' WouldBlock in the relay and/or socket without fairly invasive changes that would allow replacing the socket & relay implementations.

Change checklist

  • Self-review.
  • [ ] Documentation updates following the style guide, if relevant.
  • [ ] Tests if relevant. Very hard to test this.
  • All breaking changes documented.

@matheus23 matheus23 changed the title fix: Fix AsyncUdpSocket backpressure when connection type is mixed fix: Fix AsyncUdpSocket backpressure when connection type is *not* mixed Jan 10, 2025
@matheus23 matheus23 requested a review from flub January 10, 2025 10:36
Copy link

github-actions bot commented Jan 10, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3117/docs/iroh/

Last updated: 2025-01-13T16:16:01Z

@matheus23 matheus23 self-assigned this Jan 10, 2025
Copy link

github-actions bot commented Jan 10, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 6c9d9b1

@matheus23
Copy link
Contributor Author

matheus23 commented Jan 10, 2025

Netsim numbers:

test case throughput_gbps throughput_transfer
iroh 1_to_1 1.38 1.38
iroh 1_to_3 4.52 4.52
iroh 1_to_5 6.90 6.91
iroh 1_to_10 11.89 11.91
iroh 2_to_2 2.77 2.78
iroh 2_to_4 5.57 5.58
iroh 2_to_6 8.26 8.27
iroh 2_to_10 13.56 13.59

vs. latest merged PR:

test case throughput_gbps throughput_transfer
iroh 1_to_1 1.38 1.38
iroh 1_to_3 4.27 4.28
iroh 1_to_5 6.99 7.00
iroh 1_to_10 12.01 12.03
iroh 2_to_2 3.06 3.07
iroh 2_to_4 5.86 5.87
iroh 2_to_6 9.20 9.22
iroh 2_to_10 13.62 13.65

@matheus23
Copy link
Contributor Author

matheus23 commented Jan 10, 2025

I haven't managed to get the send_pending metric to increase yet.

This means we practically (?) never run into the udp_pending && relay_pending case.

I still think this code is more correct, but it really doesn't make much of a difference.

@matheus23 matheus23 changed the title fix: Fix AsyncUdpSocket backpressure when connection type is *not* mixed fix: AsyncUdpSocket backpressure when connection type is *not* mixed Jan 10, 2025
@flub
Copy link
Contributor

flub commented Jan 10, 2025

I had previously held off this because I was worried that you do not know the next packet that is going to be sent. But I think my logic was wrong: you do know the next packet: it will be the same one. So I think it's interesting to try this out.

However, does it make #3067 more critical?

Looking at the perf numbers I assume this has no impact and is all just noise?

You might have more luck seeing the blocked metric go up if you test relay-only since then you have a single path. You can even go further and setup a relay server which rate-limits client connections so it is easier to fill up the magicsock pipes.

I think it makes some sense that this is difficult to trigger, because Quinn is using congestion control and packet pacing to control the speed at which packets are being sent as well. Before WouldBlock comes into the picture. So you need to have a place where your flow control, congestion control and packet pacing all allow for more then the internal magicsock buffers are able to flush.

iroh/src/magicsock.rs Outdated Show resolved Hide resolved
@matheus23
Copy link
Contributor Author

You might have more luck seeing the blocked metric go up if you test relay-only since then you have a single path. You can even go further and setup a relay server which rate-limits client connections so it is easier to fill up the magicsock pipes.

Tried relay-only with rate limits. No effect :/ send_pending == 0.

@matheus23
Copy link
Contributor Author

I had previously held off this because I was worried that you do not know the next packet that is going to be sent. But I think my logic was wrong: you do know the next packet: it will be the same one. So I think it's interesting to try this out.

But then why return Err(WouldBlock) at all? And why do it for only the case where the connection type is mixed?

@matheus23 matheus23 changed the title fix: AsyncUdpSocket backpressure when connection type is *not* mixed fix(iroh): AsyncUdpSocket backpressure when connection type is *not* mixed Jan 13, 2025
@matheus23 matheus23 requested a review from flub January 13, 2025 10:03
@matheus23 matheus23 added this to the v0.31.0 milestone Jan 13, 2025
iroh/src/magicsock.rs Outdated Show resolved Hide resolved
iroh/src/magicsock/metrics.rs Outdated Show resolved Hide resolved
@matheus23 matheus23 modified the milestones: v0.31.0, v0.32.0 Jan 14, 2025
@flub
Copy link
Contributor

flub commented Jan 14, 2025

I had previously held off this because I was worried that you do not know the next packet that is going to be sent. But I think my logic was wrong: you do know the next packet: it will be the same one. So I think it's interesting to try this out.

But then why return Err(WouldBlock) at all? And why do it for only the case where the connection type is mixed?

Good questions.

I think my above statement is a bit wrong. The reason we didn't do this is because we do not know where the next packet is going to be sent. If you have a direct path to one node, but relay paths another node you do not want to be blocking the direct path because the relay path is congested.

Which still makes this change somewhat debatable (and I'm not sure I should have approved). We should certainly do a lot more testing before we commit to a change like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

3 participants