Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for batch processing with multiple prompts and schemas, especially for vLLM integration #1373

Open
soumyasmruti opened this issue Jan 11, 2025 · 0 comments

Comments

@soumyasmruti
Copy link

What behavior of the library made you think about the improvement?

The Outlines library currently doesn't support efficient batch processing of multiple prompts with different schemas, especially when using the vLLM integration. Each prompt must be processed individually, which doesn't take advantage of vLLM's batch processing capabilities. The current API structure requires creating a Generator object for each unique schema, making it cumbersome to process a batch of prompts with varying schemas.

For example, with the current implementation:

from outlines import Generator, models
from pydantic import BaseModel

cclass Schema1(BaseModel):
    field1: str

class Schema2(BaseModel):
    field2: int

model = models.vllm("model_name")

generator1 = Generator(model, Schema1)
generator2 = Generator(model, Schema2)

result1 = generator1("prompt1")
result2 = generator2("prompt2")
result3 = generator1("prompt3")

Another example specifically for JSON function calling looks like this. We have to create multiple generator for each schema, which is not ideal and doesn't use multi threading or batch processing capabilities of vllm.

json_schema1 = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    }
}

json_schema2 = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"}
    }
}

generator1 = outlines.generate.json(
    model,
    json_schema1,
    sampler=multinomial(top_k=50, top_p=0.95, temperature=0.7)
)

generator2 = outlines.generate.json(
    model,
    json_schema2,
    sampler=multinomial(top_k=0, top_p=0.95, temperature=0.7)
)

result1 = json.dumps(generator1("Generate a person's details", max_tokens=100))
result2 = json.dumps(generator2("Generate a book's details", max_tokens=100))

How would you like it to behave?

We would like Outlines to support batch processing of multiple prompts with different schemas in a single call, especially when using vLLM. This would allow users to leverage vLLM's efficient batch processing capabilities and simplify the API for handling multiple prompts with varying schemas.

A potential API could look like this:

from outlines import models
from pydantic import BaseModel

class Schema1(BaseModel):
    field1: str

class Schema2(BaseModel):
    field2: int

model = models.vllm("model_name")

prompts = ["prompt1", "prompt2", "prompt3"]
schemas = [Schema1, Schema2, Schema1]

batch_generator = Generator(model)
results = batch_generator(prompts, schemas)

A potential API for json it might look like this

mport json
import outlines
from outlines import models
from outlines.samplers import multinomial

model = models.vllm("model_name")

json_schema1 = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    }
}

json_schema2 = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"}
    }
}

prompts = [
    "Generate a person's details",
    "Generate a book's details",
    "Generate another person's details"
]

schemas = [json_schema1, json_schema2, json_schema1]

sampler = multinomial(top_k=50, top_p=0.95, temperature=0.7)

batch_generator = Generator(model)
results = batch_generator.json(prompts, schemas, sampler=sampler, max_tokens=100)

# results would be a list of JSON strings
for result in results:
    print(json.loads(result))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant