You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DIAL model is the right choice when one needs to integrate an existing model into DIAL framework.
To make it work, the "model" component must adapt the native model API to the DIAL API. We call such components "adapters". DIAL includes adapters for OpenAI, Bedrock and VertexAI models.
Consider for example a self-hosted llama3 model. One needs to write a server that implements DIAL API chat completions endpoint. The endpoint must convert DIAL API request to Llama3 request, call the model and convert the model response back to DIAL API response.
The underlying model shouldn't be limited to the text-to-text modality. The DIAL API supports attachments of any arbitrary media type, so text-to-image or image-to-text models could be easily adapted into DIAL API (see for example adapters for GPT-4V and DALL·E 3 models).
A model deployment implements a particular model with its own context limitations and pricing per token. The core config allows to specify these attributes in limits and pricing sections.
Token model usage is reported in the usage field of the response.
DIAL Core computes the price of a request using information from usage and pricing.
Moreover, the underlying model might be served by multiple instances. Core config allows to specify upstream endpoints for load balancing and failover.
DIAL application is the right choice when one needs to implement a custom complex logic.
For example, a typical RAG application within a single request might process multiple documents, compute embeddings, save the embeddings in a vector store, query the store, call a language model to generate a final response, which is finally returned to the user.
Since an application may call multiple different models, they don't have neither a pricing model, nor a context limitations as regular models do.
Applications report usage of the upstream models in usage_per_model response field, which maps model names to their corresponding tokens usages.
Addons
DIAL addons could only be used by DIAL Assistant.
DIAL Assistant is configured by a list of addons it could use.
DIAL Addon is implemented by a server which follows a certain OpenAPI specification. This OpenAPI specification is used by DIAL Assistant to communicate with the addon.
DIAL addon is analogous and in fact compatible with OpenAI plugins.
Any OpenAI plugin (which is typically identified by .well-known/ai-plugin.json config) could be used as a DIAL addon.
DIAL addon is the right choice when one needs extend the functionality of DIAL Assistant with a new tool.
Assistants
DIAL Assistant is an assistant with predefined set of addons and a system prompt.
It's the right choice when one needs to define an assistant for a specific task or domain.
The text was updated successfully, but these errors were encountered:
Highlight the differences between different types of DIAL deployments: applications, models, addons and assistants.
Extends the section by adding an advice on how to choose the right type of deployment for a given use case.
Alternatively add a dedicated section to FAQ (#4)
Models
DIAL model is the right choice when one needs to integrate an existing model into DIAL framework.
To make it work, the "model" component must adapt the native model API to the DIAL API. We call such components "adapters". DIAL includes adapters for OpenAI, Bedrock and VertexAI models.
Consider for example a self-hosted llama3 model. One needs to write a server that implements DIAL API chat completions endpoint. The endpoint must convert DIAL API request to Llama3 request, call the model and convert the model response back to DIAL API response.
The underlying model shouldn't be limited to the text-to-text modality. The DIAL API supports attachments of any arbitrary media type, so text-to-image or image-to-text models could be easily adapted into DIAL API (see for example adapters for GPT-4V and DALL·E 3 models).
A model deployment implements a particular model with its own context limitations and pricing per token. The core config allows to specify these attributes in
limits
andpricing
sections.Token model usage is reported in the
usage
field of the response.DIAL Core computes the price of a request using information from
usage
andpricing
.Moreover, the underlying model might be served by multiple instances. Core config allows to specify
upstream
endpoints for load balancing and failover.See Core config documentation for more details.
Applications
DIAL application is the right choice when one needs to implement a custom complex logic.
For example, a typical RAG application within a single request might process multiple documents, compute embeddings, save the embeddings in a vector store, query the store, call a language model to generate a final response, which is finally returned to the user.
Since an application may call multiple different models, they don't have neither a pricing model, nor a context limitations as regular models do.
Applications report usage of the upstream models in
usage_per_model
response field, which maps model names to their corresponding tokens usages.Addons
DIAL addons could only be used by DIAL Assistant.
DIAL Assistant is configured by a list of addons it could use.
DIAL Addon is implemented by a server which follows a certain OpenAPI specification. This OpenAPI specification is used by DIAL Assistant to communicate with the addon.
DIAL addon is analogous and in fact compatible with OpenAI plugins.
Any OpenAI plugin (which is typically identified by
.well-known/ai-plugin.json
config) could be used as a DIAL addon.DIAL addon is the right choice when one needs extend the functionality of DIAL Assistant with a new tool.
Assistants
DIAL Assistant is an assistant with predefined set of addons and a system prompt.
It's the right choice when one needs to define an assistant for a specific task or domain.
The text was updated successfully, but these errors were encountered: