Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document differences between applications, models, addons and assistants #108

Open
adubovik opened this issue May 15, 2024 · 0 comments
Open
Assignees

Comments

@adubovik
Copy link
Contributor

adubovik commented May 15, 2024

Highlight the differences between different types of DIAL deployments: applications, models, addons and assistants.

Extends the section by adding an advice on how to choose the right type of deployment for a given use case.

Alternatively add a dedicated section to FAQ (#4)

Models

DIAL model is the right choice when one needs to integrate an existing model into DIAL framework.
To make it work, the "model" component must adapt the native model API to the DIAL API. We call such components "adapters". DIAL includes adapters for OpenAI, Bedrock and VertexAI models.

Consider for example a self-hosted llama3 model. One needs to write a server that implements DIAL API chat completions endpoint. The endpoint must convert DIAL API request to Llama3 request, call the model and convert the model response back to DIAL API response.

The underlying model shouldn't be limited to the text-to-text modality. The DIAL API supports attachments of any arbitrary media type, so text-to-image or image-to-text models could be easily adapted into DIAL API (see for example adapters for GPT-4V and DALL·E 3 models).

A model deployment implements a particular model with its own context limitations and pricing per token. The core config allows to specify these attributes in limits and pricing sections.

Token model usage is reported in the usage field of the response.
DIAL Core computes the price of a request using information from usage and pricing.

Moreover, the underlying model might be served by multiple instances. Core config allows to specify upstream endpoints for load balancing and failover.

See Core config documentation for more details.

Applications

DIAL application is the right choice when one needs to implement a custom complex logic.

For example, a typical RAG application within a single request might process multiple documents, compute embeddings, save the embeddings in a vector store, query the store, call a language model to generate a final response, which is finally returned to the user.

Since an application may call multiple different models, they don't have neither a pricing model, nor a context limitations as regular models do.

Applications report usage of the upstream models in usage_per_model response field, which maps model names to their corresponding tokens usages.

Addons

DIAL addons could only be used by DIAL Assistant.

DIAL Assistant is configured by a list of addons it could use.
DIAL Addon is implemented by a server which follows a certain OpenAPI specification. This OpenAPI specification is used by DIAL Assistant to communicate with the addon.

DIAL addon is analogous and in fact compatible with OpenAI plugins.
Any OpenAI plugin (which is typically identified by .well-known/ai-plugin.json config) could be used as a DIAL addon.

DIAL addon is the right choice when one needs extend the functionality of DIAL Assistant with a new tool.

Assistants

DIAL Assistant is an assistant with predefined set of addons and a system prompt.

It's the right choice when one needs to define an assistant for a specific task or domain.

AleksNeStu pushed a commit to AleksNeStu/ai-dial that referenced this issue Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants