From d2794c47afaf6324924de6725d9dc79951c34027 Mon Sep 17 00:00:00 2001 From: Bryce Anderson Date: Mon, 19 Feb 2024 14:24:15 -0700 Subject: [PATCH] loadbalancer-experimental: Add some documentation for DefaultLoadBalancer (#2837) Motivation: We want to provide some documentation for the intended purpose, capabilities, etc for DefaultLoadBalancer. Modifications: - Add a readme.adoc --- .../README.adoc | 209 ++++++++++++++++++ 1 file changed, 209 insertions(+) create mode 100644 servicetalk-loadbalancer-experimental/README.adoc diff --git a/servicetalk-loadbalancer-experimental/README.adoc b/servicetalk-loadbalancer-experimental/README.adoc new file mode 100644 index 0000000000..57a06264c6 --- /dev/null +++ b/servicetalk-loadbalancer-experimental/README.adoc @@ -0,0 +1,209 @@ += DefaultLoadBalancer + +== What is DefaultLoadBalancer? + +https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java[DefaultLoadBalancer] +is a refactor of the ServiceTalk +https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/RoundRobinLoadBalancer.java[RoundRobinLoadBalancer] +that is intended to provide a more flexible foundation for building load balancers. It also serves as the basis for new +features that were not possible with RoundRobinLoadBalancer including host scoring, additional selection algorithms, and +outlier detectors. + +=== Relationship Between the LoadBalancer and Connections + +The load balancer is structured as a series of https://github.com/apple/servicetalk/blob/main/servicetalk-loadbalancer-experimental/src/main/java/io/servicetalk/loadbalancer/Host.java[Host]'s +and those hosts can have many connections. The number of connections to each Host is driven by the number of connections +required to satisfy the request load. Usage of the HTTP/2 protocol will dramatically shrink the necessary number of +connections to each backend, often to 1, and is strongly encouraged. + +[source,mermaid] +---- +flowchart LR + Dispatcher --> |select connection| LoadBalancer + ServiceDiscovery --> |update host set| LoadBalancer + LoadBalancer --> H0[Host:addr-0] + LoadBalancer --> H1[Host:addr-1] + LoadBalancer --> H2[Host:addr-2] + H0 --> C00[Connection0:addr-0] + H0 --> C01[ConnectionN:addr-0] + H1 --> C10[Connection0:addr-1] + H1 --> C11[ConnectionN:addr-1] + H2 --> C20[Connection0:addr-2] + H2 --> C21[ConnectionN:addr-2] + +---- + +=== More Host Selection Algorithms + +A primary goal of DefaultLoadBalancer was to open the door to alternative host selection algorithms. The +RoundRobinLoadBalancer is limited to round-robin while DefaultLoadBalancer comes out of the box with multiple choices +and the flexibility to add more if necessary. + +==== Power of two Choices (P2C) + +Power of two choices (https://ieeexplore.ieee.org/document/963420[P2C]) is an algorithm that allows load balancers to +bias traffic toward hosts with a better 'score' while retaining the fairness of random selection. This algorithm is +a random selection process but with a twist: it selects two hosts at random and from those two pick the 'best' where +best is defined by a scoring system. ServiceTalk uses an Exponentially Weighted Moving Average +(http://www.eckner.com/papers/Algorithms%20for%20Unevenly%20Spaced%20Time%20Series.pdf[EWMA]) scoring system by default. +EWMA takes into account the recent latency, failures, and outstanding request load of the destination host when +computing a hosts score. + +The P2C selection algorithm has the positive properties + +* having significantly more fair request distribution between hosts than simple random +* biasing away from low performing hosts +* avoiding the order coalescing problems associated with round-robin + +The major downside of P2C are that it's less trivial to understand. + +[source,mermaid] +---- +stateDiagram-v2 + state "pick 2 hosts" as p2c + state "evaluate hosts health" as evalhealth + state "pick by score" as pickscore + [*] --> p2c + p2c --> evalhealth + evalhealth --> p2c: neither host healthy + evalhealth --> [*]: one host healthy + evalhealth --> pickscore: both healthy + pickscore --> [*]: select best scored host +---- + +==== Round-Robin + +While round-robin is a very common algorithm that is easy to understand. From a local perspective it is an extremely +fair algorithm assuming each request is essentially the same 'cost'. It's downsides include unwanted correlation effects +due to its sequential ordering and inability to score hosts other than outright failure. + +[source,mermaid] +---- +stateDiagram-v2 + state "advance index (i)" as select_index + state "evaluate health" as eval_health + [*] --> select_index + select_index --> eval_health: select host(i) + eval_health --> select_index: host unhealthy + eval_health --> [*]: host healthy +---- + +== xDS Outlier Detection + +=== What is xDS + +In addition to being more modular DefaultLoadBalancer is being introduced with resiliency features found in the xDS +specification that was pioneered by the Envoy project and is being https://github.com/cncf/xds[advanced by the CNCF]. +xDS defines a control plane that allows for the distributed configuration of key components like certificates, +load balancers, etc. In this documentation we will focus on the elements that are relevant to load balancing, referred +to as CDS. + +=== Failure Detection Algorithms + +DefaultLoadBalancer was designed to mitigate failures at both layer 4 (connection layer) and layer 7 (request layer). +It supports the following xDS compatible outlier detectors: + +* Consecutive failures: ejects hosts that exceed the configured consecutive failures. +* Success rate failure detection: statistical outlier ejection based on success rate. +* Failure percentage outlier detection: ejects hosts with a failure rate that exceeds a fixed threshold. + +In addition to the xDS defined outlier detectors DefaultLoadBalancer continues to support consecutive connection failure +detection with active probing: when a host is found to be unreachable it is marked as unhealthy until a new connection +can be established to the host outside the request path. + +=== Connection Selection with Outlier Detectors + +Connection acquisition failures, request failures, and response latency are used to optimize traffic flows to avoid +unhealthy hosts and _bias_ traffic to hosts with better response times in order to adjust to the observed capacity of +backends. The relationship is shown in the following diagram: + +[source,mermaid] +---- +flowchart TD + classDef unhealthy stroke:#f00 + classDef healthy stroke:#0f0 + HttpRequester + ServiceDiscovery + subgraph LoadBalancer + direction TB + HostSet[Host Set] + HostSelector[P2C HostSelector\npicks candidate hosts\nand selects the best option] + HealthChecker[Outlier Detector\ncomputes host health based\non statistics from all hosts] + end + subgraph Host0 + HealthIndicator0[HealthIndicator\nmonitors individual host statistics]:::unhealthy + subgraph ConnectionPool0[Connection Pool] + Connection00[Connection0] + Connection01[Connection1] + end + %% helps orient the chart + HealthIndicator0 ~~~ ConnectionPool0 + end + subgraph Host1 + HealthIndicator1[HealthIndicator\nmonitors individual host statistics]:::healthy + subgraph ConnectionPool1[Connection Pool] + Connection10[Connection0] + Connection11[Connection1] + end + %% helps to orient the chart + HealthIndicator1 ~~~ ConnectionPool1 + end + %% make sure our hosts and close to the LoadBalancer + LoadBalancer ~~~ Host0 & Host1 + HttpRequester ~~~ LoadBalancer + HostSelector -.-x |unhealthy host\navoided| HealthIndicator0 + HttpRequester -.-> |select connection| HostSelector -.-> |healthy host\npreferred| HealthIndicator1 + HealthIndicator1 -.-> Connection11 + %% to make the boxes the same size + HealthIndicator0 ~~~ Connection01 + HealthChecker <--> HealthIndicator0 & HealthIndicator1 + ServiceDiscovery --> |update host set| HostSet --> |rebuild HostSelector \nwith new host set| HostSelector +---- + +=== Connection Acquisition Workflow + +By default, ServiceTalk attempts to minimize the connection load to each host. If the situation arises where there is +not a session capable of serving a request then connection acquisition can happen on the request path. The session +acquisition flow is roughly like this: + +[source,mermaid] +---- +sequenceDiagram + participant requester + participant load-balancer + participant host + participant connection-factory + requester->>load-balancer: request a connection + load-balancer->>host: select a connection from the host + host->>host: use connection from pool + host-->>requester: connection reuse + host->>connection-factory: create new connection + connection-factory->>host: connection created and added to pool + host->>requester: connection returned +---- + +== Future Capabilities + +=== Weighted Load Balancing + +Not all hosts are created equal! Due to different underlying hardware platforms, other tenants on the same host, or even +just a bad cache day, we often find that not all instances of a service have the same capacity. The P2C selection +algorithm can approximate this, but it is only inferred. With +https://github.com/bryce-anderson/servicetalk/blob/bl_anderson/default_loadbalancer_docs/servicetalk-client-api/src/main/java/io/servicetalk/client/api/ServiceDiscoverer.java[ServiceDiscoverer] +or control-plane support we can explicitly propagate weight information to ServiceTalk's DefaultLoadBalancer. Adding +weight support to the host selection process will let users leverage this data. + +=== Priority Groups + +Priority groups are another notion of weights in load balancing. Priority groups are a feature of the xDS protocol that +lets users partition backend instances into groups that have a relative priority to each other. The load balancer will +use hosts from as many priority groups as necessary to maintain a minimum specified number of healthy backends. A +common use case is to specify a primary destination, usually locally, for latency and transit cost reasons while +maintaining a set of backup destinations to use in the case of local disruptions. + +=== Subsetting + +When the sizes of two connected clusters grow the number of connections can become burdensome if the load balancer +maintains a full mesh network. Sub-setting can reduce the connection count by only creating connections to a subset of +backends. There are a number of ways to determine ths subset which can range from simple random sub-setting, which is +trivial to implement but suffers from load variance, to more intricate models. \ No newline at end of file