You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using litellm router, i noticed the router initialization is taking quite a while, basically scale linearly with the input model_list.
After profiling the code, it seems that the bottleneck is in set_model_list where we need to _create_deployment in a loop. Perhaps we can use a PoolProcessExecutor to parallelize this part. So, on a machine with more CPU cores, it benefits the initialization time.
Motivation, pitch
Mainly our team are trying to create a router for different groups (per group router), when the number of models grows the initialization time becomes noticeable. Thus, would like to speed it up if possible. Based on my testing, a naive wrapping of PoolProcessExecutor can brings benefit to the initialization.
Here is a testing result i have, on a 6 cores vm:
Time to create router with 10 models: 1.3750 seconds
Time to create router with 20 models: 2.8792 seconds Time to create router with 30 models: 3.9720 seconds Time to create router with 40 models: 5.1541 seconds
Time to create router with 50 models: 6.8256 seconds
After parallelization:
Time to create router with 10 models: 0.7394 seconds
Time to create router with 20 models: 0.8583 seconds
Time to create router with 30 models: 1.0763 seconds
Time to create router with 40 models: 1.5589 seconds
Time to create router with 50 models: 1.8502 seconds
Are you a ML Ops Team?
Yes
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered:
The Feature
When using litellm router, i noticed the router initialization is taking quite a while, basically scale linearly with the input model_list.
After profiling the code, it seems that the bottleneck is in set_model_list where we need to _create_deployment in a loop. Perhaps we can use a PoolProcessExecutor to parallelize this part. So, on a machine with more CPU cores, it benefits the initialization time.
Motivation, pitch
Mainly our team are trying to create a router for different groups (per group router), when the number of models grows the initialization time becomes noticeable. Thus, would like to speed it up if possible. Based on my testing, a naive wrapping of PoolProcessExecutor can brings benefit to the initialization.
Here is a testing result i have, on a 6 cores vm:
Time to create router with 10 models: 1.3750 seconds
Time to create router with 20 models: 2.8792 seconds Time to create router with 30 models: 3.9720 seconds Time to create router with 40 models: 5.1541 seconds
Time to create router with 50 models: 6.8256 seconds
After parallelization:
Time to create router with 10 models: 0.7394 seconds
Time to create router with 20 models: 0.8583 seconds
Time to create router with 30 models: 1.0763 seconds
Time to create router with 40 models: 1.5589 seconds
Time to create router with 50 models: 1.8502 seconds
Are you a ML Ops Team?
Yes
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: