From d2794c47afaf6324924de6725d9dc79951c34027 Mon Sep 17 00:00:00 2001
From: Bryce Anderson <bryce_anderson@apple.com>
Date: Mon, 19 Feb 2024 14:24:15 -0700
Subject: [PATCH] loadbalancer-experimental: Add some documentation for
 DefaultLoadBalancer (#2837)

Motivation:

We want to provide some documentation for the intended purpose,
capabilities, etc for DefaultLoadBalancer.

Modifications:

- Add a readme.adoc
---
 .../README.adoc                               | 209 ++++++++++++++++++
 1 file changed, 209 insertions(+)
 create mode 100644 servicetalk-loadbalancer-experimental/README.adoc

diff --git a/servicetalk-loadbalancer-experimental/README.adoc b/servicetalk-loadbalancer-experimental/README.adoc
new file mode 100644
index 0000000000..57a06264c6
--- /dev/null
+++ b/servicetalk-loadbalancer-experimental/README.adoc
@@ -0,0 +1,209 @@
+= DefaultLoadBalancer
+
+== What is DefaultLoadBalancer?
+
+https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java[DefaultLoadBalancer]
+is a refactor of the ServiceTalk
+https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/RoundRobinLoadBalancer.java[RoundRobinLoadBalancer]
+that is intended to provide a more flexible foundation for building load balancers. It also serves as the basis for new
+features that were not possible with RoundRobinLoadBalancer including host scoring, additional selection algorithms, and
+outlier detectors.
+
+=== Relationship Between the LoadBalancer and Connections
+
+The load balancer is structured as a series of https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/Host.java[Host]'s
+and those hosts can have many connections. The number of connections to each Host is driven by the number of connections
+required to satisfy the request load. Usage of the HTTP/2 protocol will dramatically shrink the necessary number of
+connections to each backend, often to 1, and is strongly encouraged.
+
+[source,mermaid]
+----
+flowchart LR
+    Dispatcher --> |select connection| LoadBalancer
+    ServiceDiscovery --> |update host set| LoadBalancer
+    LoadBalancer --> H0[Host:addr-0]
+    LoadBalancer --> H1[Host:addr-1]
+    LoadBalancer --> H2[Host:addr-2]
+    H0 --> C00[Connection0:addr-0]
+    H0 --> C01[ConnectionN:addr-0]
+    H1 --> C10[Connection0:addr-1]
+    H1 --> C11[ConnectionN:addr-1]
+    H2 --> C20[Connection0:addr-2]
+    H2 --> C21[ConnectionN:addr-2]
+
+----
+
+=== More Host Selection Algorithms
+
+A primary goal of DefaultLoadBalancer was to open the door to alternative host selection algorithms. The
+RoundRobinLoadBalancer is limited to round-robin while DefaultLoadBalancer comes out of the box with multiple choices
+and the flexibility to add more if necessary.
+
+==== Power of two Choices (P2C)
+
+Power of two choices (https://ieeexplore.ieee.org/document/963420[P2C]) is an algorithm that allows load balancers to
+bias traffic toward hosts with a better 'score' while retaining the fairness of random selection. This algorithm is
+a random selection process but with a twist: it selects two hosts at random and from those two pick the 'best' where
+best is defined by a scoring system. ServiceTalk uses an Exponentially Weighted Moving Average
+(http://www.eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf[EWMA]) scoring system by default.
+EWMA takes into account the recent latency, failures, and outstanding request load of the destination host when
+computing a hosts score.
+
+The P2C selection algorithm has the positive properties
+
+* having significantly more fair request distribution between hosts than simple random
+* biasing away from low performing hosts
+* avoiding the order coalescing problems associated with round-robin
+
+The major downside of P2C are that it's less trivial to understand.
+
+[source,mermaid]
+----
+stateDiagram-v2
+    state "pick 2 hosts" as p2c
+    state "evaluate hosts health" as evalhealth
+    state "pick by score" as pickscore
+    [*] --> p2c
+    p2c --> evalhealth
+    evalhealth --> p2c: neither host healthy
+    evalhealth --> [*]: one host healthy
+    evalhealth --> pickscore: both healthy
+    pickscore --> [*]: select best scored host
+----
+
+==== Round-Robin
+
+While round-robin is a very common algorithm that is easy to understand. From a local perspective it is an extremely
+fair algorithm assuming each request is essentially the same 'cost'. It's downsides include unwanted correlation effects
+due to its sequential ordering and inability to score hosts other than outright failure.
+
+[source,mermaid]
+----
+stateDiagram-v2
+    state "advance index (i)" as select_index
+    state "evaluate health" as eval_health
+    [*] --> select_index
+    select_index --> eval_health: select host(i)
+    eval_health --> select_index: host unhealthy
+    eval_health --> [*]: host healthy
+----
+
+== xDS Outlier Detection
+
+=== What is xDS
+
+In addition to being more modular DefaultLoadBalancer is being introduced with resiliency features found in the xDS
+specification that was pioneered by the Envoy project and is being https://github.com/cncf/xds[advanced by the CNCF].
+xDS defines a control plane that allows for the distributed configuration of key components like certificates,
+load balancers, etc. In this documentation we will focus on the elements that are relevant to load balancing, referred
+to as CDS.
+
+=== Failure Detection Algorithms
+
+DefaultLoadBalancer was designed to mitigate failures at both layer 4 (connection layer) and layer 7 (request layer).
+It supports the following xDS compatible outlier detectors:
+
+* Consecutive failures: ejects hosts that exceed the configured consecutive failures.
+* Success rate failure detection: statistical outlier ejection based on success rate.
+* Failure percentage outlier detection: ejects hosts with a failure rate that exceeds a fixed threshold.
+
+In addition to the xDS defined outlier detectors DefaultLoadBalancer continues to support consecutive connection failure
+detection with active probing: when a host is found to be unreachable it is marked as unhealthy until a new connection
+can be established to the host outside the request path.
+
+=== Connection Selection with Outlier Detectors
+
+Connection acquisition failures, request failures, and response latency are used to optimize traffic flows to avoid
+unhealthy hosts and _bias_ traffic to hosts with better response times in order to adjust to the observed capacity of
+backends. The relationship is shown in the following diagram:
+
+[source,mermaid]
+----
+flowchart TD
+    classDef unhealthy stroke:#f00
+    classDef healthy stroke:#0f0
+    HttpRequester
+    ServiceDiscovery
+    subgraph LoadBalancer
+        direction TB
+        HostSet[Host Set]
+        HostSelector[P2C HostSelector\npicks candidate hosts\nand selects the best option]
+        HealthChecker[Outlier Detector\ncomputes host health based\non statistics from all hosts]
+    end
+    subgraph Host0
+        HealthIndicator0[HealthIndicator\nmonitors individual host statistics]:::unhealthy
+        subgraph ConnectionPool0[Connection Pool]
+            Connection00[Connection0]
+            Connection01[Connection1]
+        end
+        %% helps orient the chart
+        HealthIndicator0 ~~~ ConnectionPool0
+    end
+    subgraph Host1
+        HealthIndicator1[HealthIndicator\nmonitors individual host statistics]:::healthy
+        subgraph ConnectionPool1[Connection Pool]
+            Connection10[Connection0]
+            Connection11[Connection1]
+        end
+        %% helps to orient the chart
+        HealthIndicator1 ~~~ ConnectionPool1
+    end
+    %% make sure our hosts and close to the LoadBalancer
+    LoadBalancer ~~~ Host0 & Host1
+    HttpRequester ~~~ LoadBalancer
+    HostSelector -.-x |unhealthy host\navoided| HealthIndicator0
+    HttpRequester -.-> |select connection| HostSelector -.-> |healthy host\npreferred| HealthIndicator1
+    HealthIndicator1 -.-> Connection11
+    %% to make the boxes the same size
+    HealthIndicator0 ~~~ Connection01
+    HealthChecker <--> HealthIndicator0 & HealthIndicator1
+    ServiceDiscovery --> |update host set| HostSet --> |rebuild HostSelector \nwith new host set| HostSelector
+----
+
+=== Connection Acquisition Workflow
+
+By default, ServiceTalk attempts to minimize the connection load to each host. If the situation arises where there is
+not a session capable of serving a request then connection acquisition can happen on the request path. The session
+acquisition flow is roughly like this:
+
+[source,mermaid]
+----
+sequenceDiagram
+    participant requester
+    participant load-balancer
+    participant host
+    participant connection-factory
+    requester->>load-balancer: request a connection
+    load-balancer->>host: select a connection from the host
+    host->>host: use connection from pool
+    host-->>requester: connection reuse
+    host->>connection-factory: create new connection
+    connection-factory->>host: connection created and added to pool
+    host->>requester: connection returned
+----
+
+== Future Capabilities
+
+=== Weighted Load Balancing
+
+Not all hosts are created equal! Due to different underlying hardware platforms, other tenants on the same host, or even
+just a bad cache day, we often find that not all instances of a service have the same capacity. The P2C selection
+algorithm can approximate this, but it is only inferred. With
+https://github.com/bryce-anderson/servicetalk/blob/bl_anderson/default_loadbalancer_docs/servicetalk-client-api/src/main/java/io/servicetalk/client/api/ServiceDiscoverer.java[ServiceDiscoverer]
+or control-plane support we can explicitly propagate weight information to ServiceTalk's DefaultLoadBalancer. Adding
+weight support to the host selection process will let users leverage this data.
+
+=== Priority Groups
+
+Priority groups are another notion of weights in load balancing. Priority groups are a feature of the xDS protocol that
+lets users partition backend instances into groups that have a relative priority to each other. The load balancer will
+use hosts from as many priority groups as necessary to maintain a minimum specified number of healthy backends. A
+common use case is to specify a primary destination, usually locally, for latency and transit cost reasons while
+maintaining a set of backup destinations to use in the case of local disruptions.
+
+=== Subsetting
+
+When the sizes of two connected clusters grow the number of connections can become burdensome if the load balancer
+maintains a full mesh network. Sub-setting can reduce the connection count by only creating connections to a subset of
+backends. There are a number of ways to determine ths subset which can range from simple random sub-setting, which is
+trivial to implement but suffers from load variance, to more intricate models.
\ No newline at end of file