You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What behavior of the library made you think about the improvement?
The Outlines library currently doesn't support efficient batch processing of multiple prompts with different schemas, especially when using the vLLM integration. Each prompt must be processed individually, which doesn't take advantage of vLLM's batch processing capabilities. The current API structure requires creating a Generator object for each unique schema, making it cumbersome to process a batch of prompts with varying schemas.
Another example specifically for JSON function calling looks like this. We have to create multiple generator for each schema, which is not ideal and doesn't use multi threading or batch processing capabilities of vllm.
We would like Outlines to support batch processing of multiple prompts with different schemas in a single call, especially when using vLLM. This would allow users to leverage vLLM's efficient batch processing capabilities and simplify the API for handling multiple prompts with varying schemas.
What behavior of the library made you think about the improvement?
The Outlines library currently doesn't support efficient batch processing of multiple prompts with different schemas, especially when using the vLLM integration. Each prompt must be processed individually, which doesn't take advantage of vLLM's batch processing capabilities. The current API structure requires creating a Generator object for each unique schema, making it cumbersome to process a batch of prompts with varying schemas.
For example, with the current implementation:
Another example specifically for JSON function calling looks like this. We have to create multiple generator for each schema, which is not ideal and doesn't use multi threading or batch processing capabilities of vllm.
How would you like it to behave?
We would like Outlines to support batch processing of multiple prompts with different schemas in a single call, especially when using vLLM. This would allow users to leverage vLLM's efficient batch processing capabilities and simplify the API for handling multiple prompts with varying schemas.
A potential API could look like this:
A potential API for json it might look like this
The text was updated successfully, but these errors were encountered: