Document differences between applications, models, addons and assistants #108

adubovik · 2024-05-15T10:36:34Z

Highlight the differences between different types of DIAL deployments: applications, models, addons and assistants.

Extends the section by adding an advice on how to choose the right type of deployment for a given use case.

Alternatively add a dedicated section to FAQ (#4)

Models

DIAL model is the right choice when one needs to integrate an existing model into DIAL framework.
To make it work, the "model" component must adapt the native model API to the DIAL API. We call such components "adapters". DIAL includes adapters for OpenAI, Bedrock and VertexAI models.

Consider for example a self-hosted llama3 model. One needs to write a server that implements DIAL API chat completions endpoint. The endpoint must convert DIAL API request to Llama3 request, call the model and convert the model response back to DIAL API response.

The underlying model shouldn't be limited to the text-to-text modality. The DIAL API supports attachments of any arbitrary media type, so text-to-image or image-to-text models could be easily adapted into DIAL API (see for example adapters for GPT-4V and DALL·E 3 models).

A model deployment implements a particular model with its own context limitations and pricing per token. The core config allows to specify these attributes in limits and pricing sections.

Token model usage is reported in the usage field of the response.
DIAL Core computes the price of a request using information from usage and pricing.

Moreover, the underlying model might be served by multiple instances. Core config allows to specify upstream endpoints for load balancing and failover.

See Core config documentation for more details.

Applications

DIAL application is the right choice when one needs to implement a custom complex logic.

For example, a typical RAG application within a single request might process multiple documents, compute embeddings, save the embeddings in a vector store, query the store, call a language model to generate a final response, which is finally returned to the user.

Since an application may call multiple different models, they don't have neither a pricing model, nor a context limitations as regular models do.

Applications report usage of the upstream models in usage_per_model response field, which maps model names to their corresponding tokens usages.

Addons

DIAL addons could only be used by DIAL Assistant.

DIAL Assistant is configured by a list of addons it could use.
DIAL Addon is implemented by a server which follows a certain OpenAPI specification. This OpenAPI specification is used by DIAL Assistant to communicate with the addon.

DIAL addon is analogous and in fact compatible with OpenAI plugins.
Any OpenAI plugin (which is typically identified by .well-known/ai-plugin.json config) could be used as a DIAL addon.

DIAL addon is the right choice when one needs extend the functionality of DIAL Assistant with a new tool.

Assistants

DIAL Assistant is an assistant with predefined set of addons and a system prompt.

It's the right choice when one needs to define an assistant for a specific task or domain.

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to AI DIAL May 15, 2024

adubovik assigned sr-remsha May 15, 2024

AleksNeStu pushed a commit to AleksNeStu/ai-dial that referenced this issue Sep 15, 2024

feat: supported embeddings API (epam#108)

3684f86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document differences between applications, models, addons and assistants #108

Document differences between applications, models, addons and assistants #108

adubovik commented May 15, 2024 •

edited

Loading

Document differences between applications, models, addons and assistants #108

Document differences between applications, models, addons and assistants #108

Comments

adubovik commented May 15, 2024 • edited Loading

Models

Applications

Addons

Assistants

adubovik commented May 15, 2024 •

edited

Loading