-
Notifications
You must be signed in to change notification settings - Fork 183
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
loadbalancer-experimental: Add some documentation for DefaultLoadBala…
…ncer (#2837) Motivation: We want to provide some documentation for the intended purpose, capabilities, etc for DefaultLoadBalancer. Modifications: - Add a readme.adoc
- Loading branch information
1 parent
13ec9b2
commit d2794c4
Showing
1 changed file
with
209 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,209 @@ | ||
= DefaultLoadBalancer | ||
|
||
== What is DefaultLoadBalancer? | ||
|
||
https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java[DefaultLoadBalancer] | ||
is a refactor of the ServiceTalk | ||
https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/RoundRobinLoadBalancer.java[RoundRobinLoadBalancer] | ||
that is intended to provide a more flexible foundation for building load balancers. It also serves as the basis for new | ||
features that were not possible with RoundRobinLoadBalancer including host scoring, additional selection algorithms, and | ||
outlier detectors. | ||
|
||
=== Relationship Between the LoadBalancer and Connections | ||
|
||
The load balancer is structured as a series of https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/Host.java[Host]'s | ||
and those hosts can have many connections. The number of connections to each Host is driven by the number of connections | ||
required to satisfy the request load. Usage of the HTTP/2 protocol will dramatically shrink the necessary number of | ||
connections to each backend, often to 1, and is strongly encouraged. | ||
|
||
[source,mermaid] | ||
---- | ||
flowchart LR | ||
Dispatcher --> |select connection| LoadBalancer | ||
ServiceDiscovery --> |update host set| LoadBalancer | ||
LoadBalancer --> H0[Host:addr-0] | ||
LoadBalancer --> H1[Host:addr-1] | ||
LoadBalancer --> H2[Host:addr-2] | ||
H0 --> C00[Connection0:addr-0] | ||
H0 --> C01[ConnectionN:addr-0] | ||
H1 --> C10[Connection0:addr-1] | ||
H1 --> C11[ConnectionN:addr-1] | ||
H2 --> C20[Connection0:addr-2] | ||
H2 --> C21[ConnectionN:addr-2] | ||
---- | ||
|
||
=== More Host Selection Algorithms | ||
|
||
A primary goal of DefaultLoadBalancer was to open the door to alternative host selection algorithms. The | ||
RoundRobinLoadBalancer is limited to round-robin while DefaultLoadBalancer comes out of the box with multiple choices | ||
and the flexibility to add more if necessary. | ||
|
||
==== Power of two Choices (P2C) | ||
|
||
Power of two choices (https://ieeexplore.ieee.org/document/963420[P2C]) is an algorithm that allows load balancers to | ||
bias traffic toward hosts with a better 'score' while retaining the fairness of random selection. This algorithm is | ||
a random selection process but with a twist: it selects two hosts at random and from those two pick the 'best' where | ||
best is defined by a scoring system. ServiceTalk uses an Exponentially Weighted Moving Average | ||
(http://www.eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf[EWMA]) scoring system by default. | ||
EWMA takes into account the recent latency, failures, and outstanding request load of the destination host when | ||
computing a hosts score. | ||
|
||
The P2C selection algorithm has the positive properties | ||
|
||
* having significantly more fair request distribution between hosts than simple random | ||
* biasing away from low performing hosts | ||
* avoiding the order coalescing problems associated with round-robin | ||
|
||
The major downside of P2C are that it's less trivial to understand. | ||
|
||
[source,mermaid] | ||
---- | ||
stateDiagram-v2 | ||
state "pick 2 hosts" as p2c | ||
state "evaluate hosts health" as evalhealth | ||
state "pick by score" as pickscore | ||
[*] --> p2c | ||
p2c --> evalhealth | ||
evalhealth --> p2c: neither host healthy | ||
evalhealth --> [*]: one host healthy | ||
evalhealth --> pickscore: both healthy | ||
pickscore --> [*]: select best scored host | ||
---- | ||
|
||
==== Round-Robin | ||
|
||
While round-robin is a very common algorithm that is easy to understand. From a local perspective it is an extremely | ||
fair algorithm assuming each request is essentially the same 'cost'. It's downsides include unwanted correlation effects | ||
due to its sequential ordering and inability to score hosts other than outright failure. | ||
|
||
[source,mermaid] | ||
---- | ||
stateDiagram-v2 | ||
state "advance index (i)" as select_index | ||
state "evaluate health" as eval_health | ||
[*] --> select_index | ||
select_index --> eval_health: select host(i) | ||
eval_health --> select_index: host unhealthy | ||
eval_health --> [*]: host healthy | ||
---- | ||
|
||
== xDS Outlier Detection | ||
|
||
=== What is xDS | ||
|
||
In addition to being more modular DefaultLoadBalancer is being introduced with resiliency features found in the xDS | ||
specification that was pioneered by the Envoy project and is being https://github.com/cncf/xds[advanced by the CNCF]. | ||
xDS defines a control plane that allows for the distributed configuration of key components like certificates, | ||
load balancers, etc. In this documentation we will focus on the elements that are relevant to load balancing, referred | ||
to as CDS. | ||
|
||
=== Failure Detection Algorithms | ||
|
||
DefaultLoadBalancer was designed to mitigate failures at both layer 4 (connection layer) and layer 7 (request layer). | ||
It supports the following xDS compatible outlier detectors: | ||
|
||
* Consecutive failures: ejects hosts that exceed the configured consecutive failures. | ||
* Success rate failure detection: statistical outlier ejection based on success rate. | ||
* Failure percentage outlier detection: ejects hosts with a failure rate that exceeds a fixed threshold. | ||
|
||
In addition to the xDS defined outlier detectors DefaultLoadBalancer continues to support consecutive connection failure | ||
detection with active probing: when a host is found to be unreachable it is marked as unhealthy until a new connection | ||
can be established to the host outside the request path. | ||
|
||
=== Connection Selection with Outlier Detectors | ||
|
||
Connection acquisition failures, request failures, and response latency are used to optimize traffic flows to avoid | ||
unhealthy hosts and _bias_ traffic to hosts with better response times in order to adjust to the observed capacity of | ||
backends. The relationship is shown in the following diagram: | ||
|
||
[source,mermaid] | ||
---- | ||
flowchart TD | ||
classDef unhealthy stroke:#f00 | ||
classDef healthy stroke:#0f0 | ||
HttpRequester | ||
ServiceDiscovery | ||
subgraph LoadBalancer | ||
direction TB | ||
HostSet[Host Set] | ||
HostSelector[P2C HostSelector\npicks candidate hosts\nand selects the best option] | ||
HealthChecker[Outlier Detector\ncomputes host health based\non statistics from all hosts] | ||
end | ||
subgraph Host0 | ||
HealthIndicator0[HealthIndicator\nmonitors individual host statistics]:::unhealthy | ||
subgraph ConnectionPool0[Connection Pool] | ||
Connection00[Connection0] | ||
Connection01[Connection1] | ||
end | ||
%% helps orient the chart | ||
HealthIndicator0 ~~~ ConnectionPool0 | ||
end | ||
subgraph Host1 | ||
HealthIndicator1[HealthIndicator\nmonitors individual host statistics]:::healthy | ||
subgraph ConnectionPool1[Connection Pool] | ||
Connection10[Connection0] | ||
Connection11[Connection1] | ||
end | ||
%% helps to orient the chart | ||
HealthIndicator1 ~~~ ConnectionPool1 | ||
end | ||
%% make sure our hosts and close to the LoadBalancer | ||
LoadBalancer ~~~ Host0 & Host1 | ||
HttpRequester ~~~ LoadBalancer | ||
HostSelector -.-x |unhealthy host\navoided| HealthIndicator0 | ||
HttpRequester -.-> |select connection| HostSelector -.-> |healthy host\npreferred| HealthIndicator1 | ||
HealthIndicator1 -.-> Connection11 | ||
%% to make the boxes the same size | ||
HealthIndicator0 ~~~ Connection01 | ||
HealthChecker <--> HealthIndicator0 & HealthIndicator1 | ||
ServiceDiscovery --> |update host set| HostSet --> |rebuild HostSelector \nwith new host set| HostSelector | ||
---- | ||
|
||
=== Connection Acquisition Workflow | ||
|
||
By default, ServiceTalk attempts to minimize the connection load to each host. If the situation arises where there is | ||
not a session capable of serving a request then connection acquisition can happen on the request path. The session | ||
acquisition flow is roughly like this: | ||
|
||
[source,mermaid] | ||
---- | ||
sequenceDiagram | ||
participant requester | ||
participant load-balancer | ||
participant host | ||
participant connection-factory | ||
requester->>load-balancer: request a connection | ||
load-balancer->>host: select a connection from the host | ||
host->>host: use connection from pool | ||
host-->>requester: connection reuse | ||
host->>connection-factory: create new connection | ||
connection-factory->>host: connection created and added to pool | ||
host->>requester: connection returned | ||
---- | ||
|
||
== Future Capabilities | ||
|
||
=== Weighted Load Balancing | ||
|
||
Not all hosts are created equal! Due to different underlying hardware platforms, other tenants on the same host, or even | ||
just a bad cache day, we often find that not all instances of a service have the same capacity. The P2C selection | ||
algorithm can approximate this, but it is only inferred. With | ||
https://github.com/bryce-anderson/servicetalk/blob/bl_anderson/default_loadbalancer_docs/servicetalk-client-api/src/main/java/io/servicetalk/client/api/ServiceDiscoverer.java[ServiceDiscoverer] | ||
or control-plane support we can explicitly propagate weight information to ServiceTalk's DefaultLoadBalancer. Adding | ||
weight support to the host selection process will let users leverage this data. | ||
|
||
=== Priority Groups | ||
|
||
Priority groups are another notion of weights in load balancing. Priority groups are a feature of the xDS protocol that | ||
lets users partition backend instances into groups that have a relative priority to each other. The load balancer will | ||
use hosts from as many priority groups as necessary to maintain a minimum specified number of healthy backends. A | ||
common use case is to specify a primary destination, usually locally, for latency and transit cost reasons while | ||
maintaining a set of backup destinations to use in the case of local disruptions. | ||
|
||
=== Subsetting | ||
|
||
When the sizes of two connected clusters grow the number of connections can become burdensome if the load balancer | ||
maintains a full mesh network. Sub-setting can reduce the connection count by only creating connections to a subset of | ||
backends. There are a number of ways to determine ths subset which can range from simple random sub-setting, which is | ||
trivial to implement but suffers from load variance, to more intricate models. |