Skip to content

Commit

Permalink
loadbalancer-experimental: Add some documentation for DefaultLoadBala…
Browse files Browse the repository at this point in the history
…ncer (#2837)

Motivation:

We want to provide some documentation for the intended purpose,
capabilities, etc for DefaultLoadBalancer.

Modifications:

- Add a readme.adoc
  • Loading branch information
bryce-anderson authored Feb 19, 2024
1 parent 13ec9b2 commit d2794c4
Showing 1 changed file with 209 additions and 0 deletions.
209 changes: 209 additions & 0 deletions servicetalk-loadbalancer-experimental/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
= DefaultLoadBalancer

== What is DefaultLoadBalancer?

https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java[DefaultLoadBalancer]
is a refactor of the ServiceTalk
https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/RoundRobinLoadBalancer.java[RoundRobinLoadBalancer]
that is intended to provide a more flexible foundation for building load balancers. It also serves as the basis for new
features that were not possible with RoundRobinLoadBalancer including host scoring, additional selection algorithms, and
outlier detectors.

=== Relationship Between the LoadBalancer and Connections

The load balancer is structured as a series of https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/Host.java[Host]'s
and those hosts can have many connections. The number of connections to each Host is driven by the number of connections
required to satisfy the request load. Usage of the HTTP/2 protocol will dramatically shrink the necessary number of
connections to each backend, often to 1, and is strongly encouraged.

[source,mermaid]
----
flowchart LR
Dispatcher --> |select connection| LoadBalancer
ServiceDiscovery --> |update host set| LoadBalancer
LoadBalancer --> H0[Host:addr-0]
LoadBalancer --> H1[Host:addr-1]
LoadBalancer --> H2[Host:addr-2]
H0 --> C00[Connection0:addr-0]
H0 --> C01[ConnectionN:addr-0]
H1 --> C10[Connection0:addr-1]
H1 --> C11[ConnectionN:addr-1]
H2 --> C20[Connection0:addr-2]
H2 --> C21[ConnectionN:addr-2]
----

=== More Host Selection Algorithms

A primary goal of DefaultLoadBalancer was to open the door to alternative host selection algorithms. The
RoundRobinLoadBalancer is limited to round-robin while DefaultLoadBalancer comes out of the box with multiple choices
and the flexibility to add more if necessary.

==== Power of two Choices (P2C)

Power of two choices (https://ieeexplore.ieee.org/document/963420[P2C]) is an algorithm that allows load balancers to
bias traffic toward hosts with a better 'score' while retaining the fairness of random selection. This algorithm is
a random selection process but with a twist: it selects two hosts at random and from those two pick the 'best' where
best is defined by a scoring system. ServiceTalk uses an Exponentially Weighted Moving Average
(http://www.eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf[EWMA]) scoring system by default.
EWMA takes into account the recent latency, failures, and outstanding request load of the destination host when
computing a hosts score.

The P2C selection algorithm has the positive properties

* having significantly more fair request distribution between hosts than simple random
* biasing away from low performing hosts
* avoiding the order coalescing problems associated with round-robin

The major downside of P2C are that it's less trivial to understand.

[source,mermaid]
----
stateDiagram-v2
state "pick 2 hosts" as p2c
state "evaluate hosts health" as evalhealth
state "pick by score" as pickscore
[*] --> p2c
p2c --> evalhealth
evalhealth --> p2c: neither host healthy
evalhealth --> [*]: one host healthy
evalhealth --> pickscore: both healthy
pickscore --> [*]: select best scored host
----

==== Round-Robin

While round-robin is a very common algorithm that is easy to understand. From a local perspective it is an extremely
fair algorithm assuming each request is essentially the same 'cost'. It's downsides include unwanted correlation effects
due to its sequential ordering and inability to score hosts other than outright failure.

[source,mermaid]
----
stateDiagram-v2
state "advance index (i)" as select_index
state "evaluate health" as eval_health
[*] --> select_index
select_index --> eval_health: select host(i)
eval_health --> select_index: host unhealthy
eval_health --> [*]: host healthy
----

== xDS Outlier Detection

=== What is xDS

In addition to being more modular DefaultLoadBalancer is being introduced with resiliency features found in the xDS
specification that was pioneered by the Envoy project and is being https://github.com/cncf/xds[advanced by the CNCF].
xDS defines a control plane that allows for the distributed configuration of key components like certificates,
load balancers, etc. In this documentation we will focus on the elements that are relevant to load balancing, referred
to as CDS.

=== Failure Detection Algorithms

DefaultLoadBalancer was designed to mitigate failures at both layer 4 (connection layer) and layer 7 (request layer).
It supports the following xDS compatible outlier detectors:

* Consecutive failures: ejects hosts that exceed the configured consecutive failures.
* Success rate failure detection: statistical outlier ejection based on success rate.
* Failure percentage outlier detection: ejects hosts with a failure rate that exceeds a fixed threshold.

In addition to the xDS defined outlier detectors DefaultLoadBalancer continues to support consecutive connection failure
detection with active probing: when a host is found to be unreachable it is marked as unhealthy until a new connection
can be established to the host outside the request path.

=== Connection Selection with Outlier Detectors

Connection acquisition failures, request failures, and response latency are used to optimize traffic flows to avoid
unhealthy hosts and _bias_ traffic to hosts with better response times in order to adjust to the observed capacity of
backends. The relationship is shown in the following diagram:

[source,mermaid]
----
flowchart TD
classDef unhealthy stroke:#f00
classDef healthy stroke:#0f0
HttpRequester
ServiceDiscovery
subgraph LoadBalancer
direction TB
HostSet[Host Set]
HostSelector[P2C HostSelector\npicks candidate hosts\nand selects the best option]
HealthChecker[Outlier Detector\ncomputes host health based\non statistics from all hosts]
end
subgraph Host0
HealthIndicator0[HealthIndicator\nmonitors individual host statistics]:::unhealthy
subgraph ConnectionPool0[Connection Pool]
Connection00[Connection0]
Connection01[Connection1]
end
%% helps orient the chart
HealthIndicator0 ~~~ ConnectionPool0
end
subgraph Host1
HealthIndicator1[HealthIndicator\nmonitors individual host statistics]:::healthy
subgraph ConnectionPool1[Connection Pool]
Connection10[Connection0]
Connection11[Connection1]
end
%% helps to orient the chart
HealthIndicator1 ~~~ ConnectionPool1
end
%% make sure our hosts and close to the LoadBalancer
LoadBalancer ~~~ Host0 & Host1
HttpRequester ~~~ LoadBalancer
HostSelector -.-x |unhealthy host\navoided| HealthIndicator0
HttpRequester -.-> |select connection| HostSelector -.-> |healthy host\npreferred| HealthIndicator1
HealthIndicator1 -.-> Connection11
%% to make the boxes the same size
HealthIndicator0 ~~~ Connection01
HealthChecker <--> HealthIndicator0 & HealthIndicator1
ServiceDiscovery --> |update host set| HostSet --> |rebuild HostSelector \nwith new host set| HostSelector
----

=== Connection Acquisition Workflow

By default, ServiceTalk attempts to minimize the connection load to each host. If the situation arises where there is
not a session capable of serving a request then connection acquisition can happen on the request path. The session
acquisition flow is roughly like this:

[source,mermaid]
----
sequenceDiagram
participant requester
participant load-balancer
participant host
participant connection-factory
requester->>load-balancer: request a connection
load-balancer->>host: select a connection from the host
host->>host: use connection from pool
host-->>requester: connection reuse
host->>connection-factory: create new connection
connection-factory->>host: connection created and added to pool
host->>requester: connection returned
----

== Future Capabilities

=== Weighted Load Balancing

Not all hosts are created equal! Due to different underlying hardware platforms, other tenants on the same host, or even
just a bad cache day, we often find that not all instances of a service have the same capacity. The P2C selection
algorithm can approximate this, but it is only inferred. With
https://github.com/bryce-anderson/servicetalk/blob/bl_anderson/default_loadbalancer_docs/servicetalk-client-api/src/main/java/io/servicetalk/client/api/ServiceDiscoverer.java[ServiceDiscoverer]
or control-plane support we can explicitly propagate weight information to ServiceTalk's DefaultLoadBalancer. Adding
weight support to the host selection process will let users leverage this data.

=== Priority Groups

Priority groups are another notion of weights in load balancing. Priority groups are a feature of the xDS protocol that
lets users partition backend instances into groups that have a relative priority to each other. The load balancer will
use hosts from as many priority groups as necessary to maintain a minimum specified number of healthy backends. A
common use case is to specify a primary destination, usually locally, for latency and transit cost reasons while
maintaining a set of backup destinations to use in the case of local disruptions.

=== Subsetting

When the sizes of two connected clusters grow the number of connections can become burdensome if the load balancer
maintains a full mesh network. Sub-setting can reduce the connection count by only creating connections to a subset of
backends. There are a number of ways to determine ths subset which can range from simple random sub-setting, which is
trivial to implement but suffers from load variance, to more intricate models.

0 comments on commit d2794c4

Please sign in to comment.