Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate the fast and slow commands in the second phase, dynamically adjust the fast and slow command settings according to the access time, and dynamically adjust the number of coroutines and pika threads #2566

Open
chejinge opened this issue Mar 31, 2024 · 19 comments
Assignees
Labels
4.0.0 ✏️ Feature New feature or request

Comments

@chejinge
Copy link
Collaborator

Which PikiwiDB functionalities are relevant/related to the feature request?

No response

Description

动态调整运行过程中的快慢命令

Proposed solution

动态调整运行过程中的快慢命令

Alternatives considered

动态调整运行过程中的快慢命令

@chejinge chejinge added the ✏️ Feature New feature or request label Mar 31, 2024
@chejinge
Copy link
Collaborator Author

刘欢负责推进

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Liu Huan is responsible for promoting

@happy-v587
Copy link
Contributor

关于快慢命令的几个问题和方案需要确定下:

方案:

1、计划添加是否开启自动发现慢命令参数,默认开启,也支持关闭。最终实现三种模式的慢命令设置:纯自动配置(程序自动哦发现)、纯手动配置(现在的方式)、手动配置 + 自动配置。这样是否可以?

2、关于慢命令执行时间设置:添加慢命令执行时间阈值参数,默认1s,是否可行?未来这个值可基于当前负载动态计算出来

3、慢命令如何更优判定 如果一个cmd的请求超出这个时间阈值,就立刻判断为慢命令吗?还是做一些平滑计算?同理如何从慢命令集合摘除?目前想到的一些策略如下:

  • 一次执行时间 大于/小于 阈值就 添加/摘除
  • 连续N次执行时间 大于/小于 阈值就 添加/摘除
  • 滑动窗口内N次执行时间 大于/小于 阈值就 添加/摘除
  • 滑动窗口内平均执行时间 大于/小于 阈值就 添加/摘除

性能:

4、对于第3的实现,可能需要申请N个全局map+atomic(提高并发)做数据统计,这样pika性能应该会有下降,需要权衡下

其他:

5、如果一个 cmd 的某个 key 是 bigkey,那么这个 cmd 就被判定为慢命令是否合理?未来是否会考虑到key维度?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Several questions and solutions regarding fast and slow commands need to be determined:

plan:

  1. Plan to add the parameter Whether to enable automatic discovery of slow commands. It is enabled by default and can also be disabled. Finally, three modes of slow command settings are implemented: pure automatic configuration (automatic discovery by the program), pure manual configuration (current method), manual configuration + automatic configuration. Is this okay?

  2. Regarding the slow command execution time setting: Add the slow command execution time threshold parameter, the default is 1s, is it feasible? In the future this value can be calculated dynamically based on the current load

  3. How to better determine slow commands? If a cmd request exceeds this time threshold, will it be immediately determined to be a slow command? Or do some smoothing calculations? In the same way, how to remove it from the slow command set? Some strategies that come to mind so far include:

  • One execution time greater than/less than threshold is add/remove?
  • N consecutive execution times greater than/less than, the threshold is add/remove?
  • The threshold of N execution time greater than/less than within the sliding window is add/remove?
  • The average execution time in the sliding window is greater than/less than, and the threshold is add/remove?

#Performance:

  1. For the third implementation, it may be necessary to apply for N global map+atomic (to improve concurrency) for data statistics. In this way, the performance of Pika should be reduced, and it needs to be weighed.

other:

  1. If a key of a cmd is bigkey, is it reasonable for this cmd to be judged as a slow command? Will the key dimension be considered in the future?

@happy-v587
Copy link
Contributor

关于动态调整线程数的一些问题和方案:

方案

1、线程数动态调整策略:判断线程池中queue的task个数,如果挤压超过某个阈值就添加新的thread,如果长时间空闲就回收thread。添加可以一次多个,回收逐次-1

2、快慢命令使用统一线程池:仿照rocksdb,支持Priority::HIGH、Priority::LOW

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Some issues and solutions about dynamically adjusting the number of threads:

plan

  1. Dynamic adjustment strategy for the number of threads: determine the number of tasks in the queue in the thread pool, add new threads if the squeeze exceeds a certain threshold, and recycle threads if they are idle for a long time. You can add more than one at a time, and recycle -1 each time.

  2. Fast and slow commands use a unified thread pool: modeled after rocksdb, supporting Priority::HIGH and Priority::LOW

@happy-v587
Copy link
Contributor

@AlexStocks @chejinge PTAL

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


@AlexStocks @chejinge ASKED

@AlexStocks
Copy link
Contributor

@lidongxu

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


@李东兴

@AlexStocks
Copy link
Contributor

AlexStocks commented Jun 14, 2024

wsy:
1 hgetall 这种天然当做慢命令;
2 采样计算平均时长;
3 一些热点 key 访问频繁,则隔离出来当做慢命令处理;

refer doc:
1 https://tech.meituan.com/2020/07/01/kv-squirrel-cellar.html
2 https://tech.meituan.com/2024/03/15/kv-squirrel-cellar.html

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


wsy:
1 hgetall is naturally used as a slow command;
2 Sampling calculation average duration;
3. Some hot keys are frequently accessed, so they are isolated and processed as slow commands;

@D-Seeker
Copy link

etcd3.5的读写性能优化方案有参考价值吗
refer doc: https://www.cnblogs.com/tencent-cloud-native/p/14893209.html

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Does etcd3.5’s read and write performance optimization plan have any reference value?
refer doc: https://www.cnblogs.com/tencent-cloud-native/p/14893209.html

@happy-v587
Copy link
Contributor

happy-v587 commented Jun 21, 2024

快慢key

https://tech.meituan.com/2020/07/01/kv-squirrel-cellar.html
image

限流:

  • 访问量
  • 带宽的占用量

热点key(怎么判断出来的?就是上面的限流key吗?)调度:

  • 把热点 Slot 迁移到这个热点主从

队列 + 线程池 + 动态调整

https://tech.meituan.com/2020/07/01/kv-squirrel-cellar.html
image
https://tech.meituan.com/2024/03/15/kv-squirrel-cellar.html
image
image

多队列:4 个队列 + 4 个线程池的结构,将请求分成 4 类

  • 读快
  • 读慢
  • 写快
  • 写慢

线程池数量调整

  • 调度线程:实时统计每个线程池的负载,来评估每个线程池是否需要增加或减少线程并做出调度动作
  • 空闲线程池:空闲线程池用来存放当前空闲的可用于调配的线程资源

当调度线程评估后决定做线程资源调配时,它就会发送调度指令到相应队列中,当线程池里的线程获取并执行了这个指令后,就实现了线程资源的调配。比如,它想给读快线程池增加线程,就会给空闲线程池的队列发送一个调度指令,空闲线程池的线程取到这个指令后,就会将自己加入到读快队列的线程池里面,去处理读快队列的请求。

当调度线程想对读慢线程池调减线程时,它会向读慢队列发送一个调度指令,读慢队列的线程获取到这个指令后,就会离开读慢线程池加入到空闲线程池。通过调度线程准实时的毫秒级负载统计、调度,我们实现了线程池资源的快速动态分配。对于每一个线程池的共享线程,也不再需要去轮询其他线程池的队列了,只需要专心处理自己队列的请求即可,大幅降低了线程池资源调度的 CPU 消耗。

@bigdaronlee163
Copy link
Contributor

喜马拉雅:快慢命令分离

为什么要做快慢命令分离?

我们发现线上很多执行本应该很快的命令也会经常超时,原因是早期的 Pika 线程模型是通过 Dispatch 线程分发客户端连接请求给 worker 线程,然后 worker 线程负责同步处理命令请求。这就带来两个问题:

  1. 如果一个客户端的命令阻塞,那么这个 worker 线程上所有客户端发起的命令都会被阻塞。
  2. worker 线程负载不均衡。假如有多个客户端,但只有一个大流量的客户端发送命令,那么底层也只有一个 worker 线程处于高负载状态,其它 worker 线程则都处于低负载状态,发挥不了 Pika 多线程的优势。

针对上述问题,其实大家很容易想到使用线程池模型,但光线程池模型也不能完全解决问题。举个例子,假如有一个客户端执行的都是比较耗时的命令(如 HGETALL),这时候线程池中的线程还是全都会被耗时的命令阻塞,那么那些执行快的命令也会被阻塞。所以,我们想到的解决方案是采用快慢双线程池模型。

如何做快慢命令分离

如下图,创建两个线程池,快慢命令根据不同业务场景可灵活配置,假设一个用户执行的都是 get/set 比较快的命令,另一个用户执行的都是类似 hgetall 很慢的命令,那么两个命令会分发到不同的线程执行,即使 hgetlall 命令导致执行的线程池阻塞,也完全不会影响 get/set 命令的响应时间。这样就降低了快慢命令之间的互相影响。

cf9e4a01c72d45e1efyy3dac59da5f62

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Himalaya: Separation of fast and slow commands

Why do we need to separate fast and slow commands?

We found that many online commands that should be executed quickly often times out. The reason is that the early Pika thread model distributed client connection requests to worker threads through Dispatch threads, and then the worker threads were responsible for synchronously processing command requests. This brings up two problems:

  1. If a client's command is blocked, then all commands initiated by the client on this worker thread will be blocked.
  2. The load of worker threads is unbalanced. If there are multiple clients, but only one high-traffic client sends commands, then only one worker thread on the bottom layer is in a high-load state, and the other worker threads are in a low-load state, and the advantages of Pika's multi-threading cannot be used.

In response to the above problems, it is easy for everyone to think of using the thread pool model, but the thread pool model alone cannot completely solve the problem. For example, if there is a client that executes relatively time-consuming commands (such as HGETALL), then all threads in the thread pool will still be blocked by the time-consuming commands, then those commands that execute quickly will will also be blocked. Therefore, the solution we came up with is to use the fast and slow dual thread pool model.

How to separate fast and slow commands

As shown in the figure below, create two thread pools. The fast and slow commands can be flexibly configured according to different business scenarios. Assume that one user executes relatively fast get/set commands, and the other user executes very slow commands like hgetall. Then both Each command will be distributed to different threads for execution. Even if the hgetlall command causes the execution thread pool to be blocked, it will not affect the response time of the get/set command at all. This reduces the interaction between fast and slow commands.

cf9e4a01c72d45e1efyy3dac59da5f62

@QlQlqiqi
Copy link
Contributor

QlQlqiqi commented Aug 2, 2024

动态调整 thread pool 大小

思路:

  1. 将原来的快慢 2 个线程池分为 1 个(工作)线程池,也就是如上 5 个队列共用一个线程池,每个队列默认的线程数可被设置。
  2. 在 PikaClientConn::DoBackgroundTask 中记录 cmd 执行时间到 g_pika_server 中,记录数据为:
std::map<std::string, std::atomic<uint32_t>> op2duration_sum;
std::map<std::string, std::atomic<uint32_t>> op2num;

即:记录每个操作的执行时间总和和执行次数。

  1. 在定时任务(设置 1s 间隔)中增加一个调度功能,根据从上次调度到这次调度之间记录的数据,
    1. 通过每种类型的队列的 avg_duration 决定给哪个队列增加/减少 1 个线程,且如果这次增加一个线程后,下次的 avg_duration 比这次多 1/n*0.5 (n 为增加前的线程数)及其以上,则继续增加一个线程,直到不满足条件或没有空闲线程为止;
    2. 如果某个队列减少 1 个线程,但是下次的 avg_duration 却比这次少 1/n*0.2 以下,则减少 1 个线程,直到最低线程数或不满足要求。
  2. (原来是一个线程池用一个锁)线程池中使用线程数个锁,每个线程对应一个锁和一种状态(读快,读慢,写快,写慢,空闲),调度线程通过设置对应线程的状态来实现调度操作。
  3. ThreadPool 中 Schedule 函数增加一个参数,标记该命令属于哪个队列。

问题:

  1. 因为要多记录一些数据,那么性能损耗是一定会有的。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Dynamically adjust thread pool size

Idea:

  1. Divide the original two thread pools, fast and slow, into one (worker) thread pool, that is, the five queues above share one thread pool, and the default number of threads for each queue can be set.
  2. Record the cmd execution time in PikaClientConn::DoBackgroundTask to g_pika_server. The recorded data is:
std::map<std::string, std::atomic<uint32_t>> op2duration_sum;
std::map<std::string, std::atomic<uint32_t>> op2num;

That is: record the total execution time and number of executions of each operation.

  1. Add a scheduling function to the scheduled task (set 1s interval). Based on the data recorded between the last scheduling and this scheduling,
    1. Use the avg_duration of each type of queue to decide which queue to add/reduce 1 thread. If a thread is added this time, the next avg_duration will be 1/n*0.5 more than this time (n is the thread before the increase) number) and above, continue to add a thread until the condition is not met or there are no idle threads;
    2. If a queue is reduced by 1 thread, but the next avg_duration is less than 1/n*0.2 this time, then 1 thread is reduced until the minimum number of threads or the requirement is not met.
  2. (It turns out that one thread pool uses one lock) Several thread locks are used in the thread pool. Each thread corresponds to a lock and a state (fast reading, slow reading, fast writing, slow writing, idle). The scheduling thread passes Set the status of the corresponding thread to implement scheduling operations.
  3. The Schedule function in ThreadPool adds a parameter to mark which queue the command belongs to.

question:

  1. Because more data needs to be recorded, there will definitely be performance loss.
  2. You can construct a default command-to-queue type mapping data structure based on the data in pika_command.cc.
  3. I feel that pika does not need to distinguish the instructions as finely as a certain group. It can only be divided into two types: fast and slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.0.0 ✏️ Feature New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants