loadbalancer: Add a P2C host selector #2743

bryce-anderson · 2023-11-07T19:00:06Z

Motivation

We have a round-robin selector but also want to support
a form of random selection. P2C is a common random
selection algorithm that gives us the ability to also distinguish
hosts based on a load metric, or score, which can help us
distribute load more evenly and avoid hosts that can be
considered outliers by a variety of mechanisms.

Modifications

Make Host an interface and rename the existing version
to DefaultHost to make testing easier and the contract
explicit. This doesn't mean the contract won't change as we
continue to iterate.
Add the P2CSelector type and it's tests.

Note: we are not yet integrating this selection algorithm into the
NewRoundRobinLoadBalancer which will entail a bunch of
questions around the API for doing so. That will be done as a
followup to minimize the patch size.

bryce-anderson · 2023-11-07T19:01:05Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultHost.java

+import static java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater;
+import static java.util.stream.Collectors.toList;
+
+final class DefaultHost<Addr, C extends LoadBalancedConnection> implements Host<Addr, C> {


With the exception of implementing address(), marking the methods public and @Override, this should just be a rename.

bryce-anderson · 2023-11-07T20:25:14Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+            final boolean t1Healthy = t1.isActiveAndHealthy();
+            final boolean t2Healthy = t2.isActiveAndHealthy();


Right now this follows the pattern set by the RoundRobinSelector which is to consider health only for making new connections. I think that is born of the only current circuit breaker being the L4 consecutive failures and the current hesitancy to mark a connection as inactive.
I think we should change that behavior in that we should consider health for any connections and also allow an unhealthy connection to be selected as a last resort (likely configurable). This allows us to have the same expected behavior for all circuit breakers and we don't need to special case the L4. Additionally, I think the L4 case is even improved because if we can't connect to a host there is a good chance existing connections are either not responsive or draining.
However, that is a reasonably big reinterpretation of what health means. I'd like to change it for RR and P2C at the same time, or maybe switch RR first and then merge this. Feedback appreciated on that.

I agree on that - RR is a good battle tested source for feedback, but if we are designing building blocks of a LB abstraction we should re-consider some of the design decisions and evaluate tradeoffs.
My mental model about this was

List of hosts as seen from SD

Subset of hosts as identified by any feedback controller (e.g., L4 / L7)

Select among these

Fallback strategy (default to what RR is doing)

I agree. I think we can potentially go further than RR and offer a panic mode, maybe in two flavors: existing connections only and even for new connections, that at least lets the LB try. At least for the latter it's probably important to add some restrictions at the connection factory level to avoid connection storms so we should proceed with caution there.

+1 for improving and interating on this area and shaping definition of "unhealthy" state.

if we can't connect to a host there is a good chance existing connections are either not responsive or draining

In order to keep track of use-cases and make sure we don't miss anything, it's worth mentioning that ServerListenContext API allows users to pause acceptance of new connections without affecting pre-existing connections. We should account for that as well.

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/Host.java

tkountis · 2023-11-09T17:15:52Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/Host.java

+     * Whether the host is both considered active by service discovery and healthy by the failure
+     * detection mechanisms.
+     */
+    boolean isActiveAndHealthy();


It's not a comment for this PR but the overall effort.
I think we can provide a richer API that closer resembles what a control plane could also advertise about an endpoint (ie. Condition enum -- Terminating, NotReady, Ready) and a separate binary data plane status (healthy / unhealthy)

+1. I had some thoughts about already but agree that we should change it in it's own PR.

Agree as well. To add some context from xDS/EDS we definitely have this information available. "Health" from the perspective of the protocol is richer than a binary decision so it could be an interesting exercise understanding if we care.

tkountis · 2023-11-09T17:21:45Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+            final boolean t1Healthy = t1.isActiveAndHealthy();
+            final boolean t2Healthy = t2.isActiveAndHealthy();


I agree on that - RR is a good battle tested source for feedback, but if we are designing building blocks of a LB abstraction we should re-consider some of the design decisions and evaluate tradeoffs.
My mental model about this was

List of hosts as seen from SD

Subset of hosts as identified by any feedback controller (e.g., L4 / L7)

Select among these

Fallback strategy (default to what RR is doing)

mgodave · 2023-11-09T18:50:18Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+                // try to make a new one if the host is healthy. If it's not healthy, we fail
+                // and let the higher level retries decide what to do.
+                if (!host.isActiveAndHealthy()) {
+                    return noActiveHostsException(hosts);


nit: this isn't an exception it's a failed Single. I might expect return noActiveHosts(hosts).

Sorry, I didn't see this until I had merged. I'll clean it up in a follow up.

bryce-anderson · 2023-11-09T19:11:33Z

The failing checks were flaky tests, so merging.

idelpivnitskiy

LGTM, have 2 questions:

idelpivnitskiy · 2023-11-11T18:54:43Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+        }
+    }
+
+    @Nullable


This method always returns a non-null value, this annotation is not required

idelpivnitskiy · 2023-11-11T18:54:50Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+    @Nullable
+    private Single<C> p2c(int size, List<Host<ResolvedAddress, C>> hosts, Random random, Predicate<C> selector,
+                          boolean forceNewConnectionAndReserve, @Nullable ContextMap contextMap) {
+        for (int j = maxEffort; j > 0; j--) {


Will it make sense to limit efforts based on number of hosts? For example, if there are only 2 hosts with maxEffort > 1, should we try again or fail fast?

idelpivnitskiy · 2023-11-11T18:54:59Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/P2CSelector.java

+            if (t1Healthy) {
+                return t1.newConnection(selector, forceNewConnectionAndReserve, contextMap);
+            } else if (t2Healthy) {
+                return t2.newConnection(selector, forceNewConnectionAndReserve, contextMap);


While you clarified reasoning to move re-iteration in a separate big redesign, will it make sense for the time being to follow the following logic?

Try t1.pickConnection, if not

Check t1 health, if ok do t1.newConnection if not:

Try t2.pickConnection, if not

Check t2 health, if ok do t1.newConnection if not:

go to the next effort

bryce-anderson added 2 commits November 2, 2023 18:05

Add P2C selection

652ec78

Add tests for the P2CSelector

905e488

bryce-anderson requested review from daschl, mgodave, tkountis, idelpivnitskiy and Scottmitch November 7, 2023 19:00

bryce-anderson commented Nov 7, 2023

View reviewed changes

Make quality checkers happy

3acafcc

bryce-anderson changed the title ~~Bl anderson/add p2c~~ loadbalancer: Add a P2C selector type for the new load balancing architecture Nov 7, 2023

bryce-anderson changed the title ~~loadbalancer: Add a P2C selector type for the new load balancing architecture~~ loadbalancer: Add a P2C host selector Nov 7, 2023

tkountis approved these changes Nov 9, 2023

View reviewed changes

bryce-anderson added 2 commits November 9, 2023 11:16

Feedback

8fed41e

package private method

b33fbbd

mgodave reviewed Nov 9, 2023

View reviewed changes

mgodave approved these changes Nov 9, 2023

View reviewed changes

bryce-anderson merged commit 34da727 into apple:main Nov 9, 2023
14 of 15 checks passed

bryce-anderson deleted the bl_anderson/add_p2c branch November 9, 2023 19:12

idelpivnitskiy reviewed Nov 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loadbalancer: Add a P2C host selector #2743

loadbalancer: Add a P2C host selector #2743

bryce-anderson commented Nov 7, 2023 •

edited

Loading

bryce-anderson Nov 7, 2023

bryce-anderson Nov 7, 2023 •

edited

Loading

tkountis Nov 9, 2023

bryce-anderson Nov 9, 2023

idelpivnitskiy Nov 11, 2023

tkountis Nov 9, 2023

bryce-anderson Nov 9, 2023

mgodave Nov 9, 2023

tkountis Nov 9, 2023

mgodave Nov 9, 2023

bryce-anderson Nov 9, 2023

bryce-anderson commented Nov 9, 2023

idelpivnitskiy left a comment

idelpivnitskiy Nov 11, 2023

idelpivnitskiy Nov 11, 2023

idelpivnitskiy Nov 11, 2023

		final boolean t1Healthy = t1.isActiveAndHealthy();
		final boolean t2Healthy = t2.isActiveAndHealthy();

loadbalancer: Add a P2C host selector #2743

loadbalancer: Add a P2C host selector #2743

Conversation

bryce-anderson commented Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

bryce-anderson Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryce-anderson commented Nov 9, 2023

idelpivnitskiy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryce-anderson commented Nov 7, 2023 •

edited

Loading

bryce-anderson Nov 7, 2023 •

edited

Loading