Add support for batch processing with multiple prompts and schemas, especially for vLLM integration #1373

soumyasmruti · 2025-01-11T00:56:09Z

What behavior of the library made you think about the improvement?

The Outlines library currently doesn't support efficient batch processing of multiple prompts with different schemas, especially when using the vLLM integration. Each prompt must be processed individually, which doesn't take advantage of vLLM's batch processing capabilities. The current API structure requires creating a Generator object for each unique schema, making it cumbersome to process a batch of prompts with varying schemas.

For example, with the current implementation:

from outlines import Generator, models
from pydantic import BaseModel

cclass Schema1(BaseModel):
    field1: str

class Schema2(BaseModel):
    field2: int

model = models.vllm("model_name")

generator1 = Generator(model, Schema1)
generator2 = Generator(model, Schema2)

result1 = generator1("prompt1")
result2 = generator2("prompt2")
result3 = generator1("prompt3")

Another example specifically for JSON function calling looks like this. We have to create multiple generator for each schema, which is not ideal and doesn't use multi threading or batch processing capabilities of vllm.

json_schema1 = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    }
}

json_schema2 = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"}
    }
}

generator1 = outlines.generate.json(
    model,
    json_schema1,
    sampler=multinomial(top_k=50, top_p=0.95, temperature=0.7)
)

generator2 = outlines.generate.json(
    model,
    json_schema2,
    sampler=multinomial(top_k=0, top_p=0.95, temperature=0.7)
)

result1 = json.dumps(generator1("Generate a person's details", max_tokens=100))
result2 = json.dumps(generator2("Generate a book's details", max_tokens=100))

How would you like it to behave?

We would like Outlines to support batch processing of multiple prompts with different schemas in a single call, especially when using vLLM. This would allow users to leverage vLLM's efficient batch processing capabilities and simplify the API for handling multiple prompts with varying schemas.

A potential API could look like this:

from outlines import models
from pydantic import BaseModel

class Schema1(BaseModel):
    field1: str

class Schema2(BaseModel):
    field2: int

model = models.vllm("model_name")

prompts = ["prompt1", "prompt2", "prompt3"]
schemas = [Schema1, Schema2, Schema1]

batch_generator = Generator(model)
results = batch_generator(prompts, schemas)

A potential API for json it might look like this

mport json
import outlines
from outlines import models
from outlines.samplers import multinomial

model = models.vllm("model_name")

json_schema1 = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    }
}

json_schema2 = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"}
    }
}

prompts = [
    "Generate a person's details",
    "Generate a book's details",
    "Generate another person's details"
]

schemas = [json_schema1, json_schema2, json_schema1]

sampler = multinomial(top_k=50, top_p=0.95, temperature=0.7)

batch_generator = Generator(model)
results = batch_generator.json(prompts, schemas, sampler=sampler, max_tokens=100)

# results would be a list of JSON strings
for result in results:
    print(json.loads(result))

soumyasmruti added the enhancement label Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for batch processing with multiple prompts and schemas, especially for vLLM integration #1373

Add support for batch processing with multiple prompts and schemas, especially for vLLM integration #1373

soumyasmruti commented Jan 11, 2025

Add support for batch processing with multiple prompts and schemas, especially for vLLM integration #1373

Add support for batch processing with multiple prompts and schemas, especially for vLLM integration #1373

Comments

soumyasmruti commented Jan 11, 2025

What behavior of the library made you think about the improvement?

How would you like it to behave?