-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research the hole-punching solution for DSN. #2111
Comments
Hi, after digging into the
or is there any other solutions? |
You have enumerated possible options. However, centralized infrastructure (like bootstrap nodes) is a single point of failure. It's not critical though because relays only improve network capabilities rather than provide core services. Embedded relay functions for nodes or farmers with strict limits seem like a more robust solution. Hybrid solutions are also possible, for example - we start testing with dedicated relay nodes, measure the effect, and upgrade with mass relays. |
Thank you for responding. I'm also thinking of embedded relay for nodes & farmers.
I'm wondering if you guys have made some progress on this topic? I've already looked through some related networking issues and prs but got little help, plz kindly remind me if i missed something. |
This status should be derived automatically using autonat protocol
I don't think libp2p supports QoS for protocols.
It's a possible security check. However, each relay will have a "connection limit per peer" as each peer has now. It will prevent reconnection from the same malicious peer.
We didn't have any progress here besides adding |
I mean, the behaviour shouuld be something like
Yes, and this may require customized relay behaviour, for example, check the supported protocols in
I was worried that a node make reservation request to every node it reaches, but after thinking about the situation once again, i realized that it seems to be a normal scene in p2p networking, so just forget this.
Ok, I will check the source codes, the examples and other projects depend on Thank you very much for the information. |
Hi After reading rust-libp2p/examples/relay-server rust-libp2p/examples/dcutr go-libp2p/p2p/host/autorelay kubo/core/node/libp2p/relay.go,and testing, I have some thoughts about the hole-punching. First, I would describe the two basic roles as:
Some conclusions:
|
I would start following this document and its stages: https://docs.libp2p.io/concepts/nat/hole-punching/ We have already added the first ingredient (autonat) for nat detection. The next logical step is enabling the relayed connections without DCUTR.
The end goal is to improve the network connectivity by including private farmer's caches in the global pool. |
I'll clarify that the goal is to use relays ONLY for hole punching purposes and not to send traffic through those relays in case direct connection is possible. This is important. |
There are some default configurable limits for relay protocol Config, but no directly bandwidth controls.
That's correct. My previous concern here is: what will the relayer act if 2 transports are enabled (tcp & quic) while only 1 public addr is reported by However, after reading the source codes, I think there will not be problem. I'll check again later.
|
I have the same view here: in our case, relay peers only help others to negotiate and get connected directly. But default |
It will require a workaround to stop relaying piece traffic before we add DCUTR but it seems possible. |
FYI: we're disabling QUIC here: #2647 |
Again, about the port usage in dcutr:
so, the listener port are always used in dcutr instead of the dialer port, and I think that's why port_reuse are required. There maybe a workaround if we handle the dialer port mannually, but that seems a little bit complicated. |
Maybe we can just mannually close the circuit relayed connection on any dcutr error to enforce direct connections? However, this method will not function if both ends disable dcutr. In this case, we can set |
Setting number of bytes well under 1M will make it useless for piece retrieval, but should be sufficient for hole punching purposes. |
I'm not sure I follow your 'port reuse' statements. We use the events you described as well as port translation, however, if you set Using bandwidth limits to prevent piece transfer will lead to a massive amount of errors and significant network degradation (we should set the limits anyway). I suggest implementing a "connection-check" in the
Step 3) could be optimized: We can either try to reuse DCUTR or as you suggested previously - implement our own similar protocol. |
A default DCUTR upgrading case:
So,IMO, the main point is: Hope I make my thoughts clear. Correct me if I made any mistake.
Yes,
For the hole-punching/port_reuse/autonat topic, I found another discussion from libp2p/specs: Consider only reusing TCP port when hole punching #389
Maybe we can enforce the relay connections to upgrade to dcutr, regardless of what kind of requests they will send, so that we don't do any checks? |
I think your workflow misses the
Autonat will likely stop functioning correctly. Consider the following situation outside of the relay-dcutr case: a public peer A establishes a connection to another public peer B, having received a new connection - peer B will try to confirm its observed address. The current autonat settings lead to choosing the server from the connected peer list. B will request A to connect again (autonat probe). If we set port reuse then the probe will issue a second connection with a duplicate tuple (address A, port A, address B, port B) and fail with this reason. We encountered this error when were adding autonat. I found a comment from DCUTR specification contributor without details in the discussion you noted: libp2p/specs#389 (comment)
We likely can force the upgrade. But how will you prevent piece exchanges via relayed connection during the upgrade? When we issue a Kademlia request and establish a connection we immediately start getting a heavy piece from the remote peer. Did you mean another synchronization here? |
I think I know what we missed before:
So if we can find an easy way to make the
Agreed, and that's why I suggest drop the already known addrs from hole-punching for the autonat probe.
I have no idea about this part now 😂 |
Can we wrap the AFAIK, kad manages its connected peers based on There's also one concern, is the kad protocol the only one we don't want on relayed connections? |
I started a discussion in the upstream repository with both our questions about DCUTR and autonat: libp2p/rust-libp2p#5291 |
Kademlia is not an issue here because it uses a rather small number of bytes. I meant this line: https://github.com/subspace/subspace/blob/443b30652f64da1d91fe630758d5ee4168b565b9/crates/subspace-networking/src/utils/piece_provider.rs#L72 Piece requests consume the main traffic. |
Ok, it finally calls |
To keep the other issue clean, I just comment here. It seems that iroh-p2p uses port_reuse as well in the latest code, see build_transport. Did you see any previous version that enables DCUTR without port_reuse? Please share the commit / tag / branch, so maybe I can test the code. |
Hmm, It's correct. My local iroh code turned out to be very old. That version had both dcutr and no port_reuse. |
FYI: We don't use |
Got that. I followed this line. However, they all fell on the handle_command call, and I think there won't be much difference. I'll be more careful next time. |
Hi, I wrote a demo to show one possible way of solving the holepunch vs autonat problem, and avoiding any kad traffic based on relayed connections. The demo includes:
A minimum showcase group of peers can be started by:
|
First of all, thank you for the demo - I appreciate the effort. Did you try it within a distributed cluster (i.e. AWS + home)? I understood your solution as follows:
If there are other key pieces, please, feel free to add them.
Peer A (private), peer B(any), peer R(public relay), other peers. A listens on DCUTR port and "regular port".
|
Yes, I myself started the showcase group of peers above on two machines for testing:
The peers worked as expected.
Most are correct except 2 points:
The main reason for 1) is: That is to say, for now, the HolePunchTransport is just a simple wrapper that receives The bad part of this decision is: all the peers with dcutr enabled in my demo actually listen to 2 ports (one regular & one for dcutr).
Yes, I tried to do so, but all the protocols that can be used in multiaddr are pre-defined as a enum and there are no such things like
Do you mean, if we can format a multiaddr with |
Application level protocol can manage connections. Track existing and start new ones. Potentially, it can contain all the logic that you have spread between swarm and behavior events. I wonder, are there any drawbacks to using that. Maybe it's too inconvenient. |
@nazar-pc What do you think about having an additional separate port for DCUTR? |
Ah, you mean implement a new appilcation level The only problem is the cost & difficulty to maintain our customized The demo was aimed at finding the key points & edge cases, and it is obviously not the final solution. |
I don't like it to be honest and after skimming above it will not help with hole punching on the ports we want, it would basically be a separate port with 1 connection per peer on that port, is that correct? If so then this is not how it should work IMO. |
I think I should list some conclusions here after all these long comments. Basic conclusions
In above, the How could we achive the goals?First of all, the basic progress:
If we don't want to change the progress too much, then we should:
Two queistions:
I suggest i) or ii)
If you guys agree with the above, or tell me what you prefer, I think I can complete a new demo in a couple of days. |
TL;DRAfter the research conducted with a major contribution from @dtynn we chose to not proceed with DCUTR at this moment. Rationale
SummaryHaving the perfect implementation we will have a more robust network (+15% of accessible peers) but decrease our expected performance. Special thanks to @dtynn which showed the possibility of DCUTR solution in our case with the demo. |
@shamil-gadelshin we had port reuse enabled by default with latest libp2p upgrade, also we will use connections for a little longer with piece retrieval improvements, would it make sense to revisit this in your opinion? |
The main recent change is "potentially several pieces from a single peer (and connection)". The previous assumptions are valid as well: potentially many connection errors and delays in piece retrieval. Not every DSN access pattern will benefit from the change (gateway will likely suffer from it) but it can still be beneficial as an option. Statistics on "Average downloaded pieces the single connection" could help us decide whether to return to this feature. |
The change I was talking about is port reuse, not the way we retrieve pieces. |
It seems that libp2p/rust-libp2p#4568 made "port reuse" is safe for autonat. We had 2 connections between peers reserved (+1 for autonat) but I believe the benefits of that extra connection are small. |
Relates to #1022
The text was updated successfully, but these errors were encountered: