-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add _inference API #2329
Add _inference API #2329
Conversation
"inference.delete_model": { | ||
"request": [], | ||
"response": [ | ||
"response definition inference.delete_model:Response / body / instance_of - Non-leaf type cannot be used here: '_types:AcknowledgedResponseBase'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this same warning in other APIs where AcknowledgedResponseBase
is used. Have I misused the class?
|
||
export class Response { | ||
body: { | ||
predicted_value: InferenceResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicted_value
will be of different types depending on which model is used, it could be a text embedding result or a ELSER sparse vector result. The TaskType
URL parameter to the API tells the client what the expected result type is, e.g. if I call POST _inference/text_embedding/model-foo
I know this field will contain a text embedding result.
An alternative option is to name the field by the result type rather than using predicted_value
all the time?
{
text_embedding: [0....]
}
{
sparse_vector: { foo : 1.0, ...}
}
From a client's perspective which is better or which is easier to parse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should include a field in the response that indicates what type it is. OpenAI does this using the object
field. I wonder if it'd be easier for clients to key off of that field rather than remembering the task type in the request URL 🤷♂️
Something like
{
"type": "text_embedding",
"<some field>": [
{
"embedding": [0.123, -0.456, ...]
},
{
"embedding": [0.123, -0.456, ...]
}
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently InferenceResult
is defined as being either an array (dense vector) or a dictionary (spare vector). This is enough for clients to disambiguate between the two variants.
I'm wondering however is there may be in the future other inference types that will return another kind of dictionary where values would not all be float
. If this is the case, then we must have some disambiguation information.
The best would be something like:
// Container is an aggregation of mutually exclusive variants
/** @variants container */
type InferenceResult {
text_embedding?: SparseVector,
dense_vector?: DenseVector
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidkyle for SparseVector
should we call it SparseEmbedding
and have the field be sparse_embedding
?
The TaskType
in the java code is SparseEmbedding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks for reviewing @jonathan-buttner. I've renamed the field sparse_embedding
but kept the type as SparseVector
. The reasoning being that the mapping types in Elasticsearch are sparse_vector
and dense_vector
. A text_embedding
goes in a dense_vector
and a sparse_embedding
in a sparse_vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need to include this now but will we support an array for the input
request field?
If so, we might want to structure the responses something like this that way we don't have a breaking change in the response format:
{
"<some field>": [
{
...
},
{
....
}
]
}
For text embeddings, how do we feel about a structure like this?
{
"<some field>": [
{
"embedding": [0.123, -0.456, ...]
},
{
"embedding": [0.123, -0.456, ...]
}
]
}
Having an array of objects in the response gives us more flexibility to add a field later if we need to. If they are just arrays it'll be harder to add a new field.
|
||
export class Response { | ||
body: { | ||
predicted_value: InferenceResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should include a field in the response that indicates what type it is. OpenAI does this using the object
field. I wonder if it'd be easier for clients to key off of that field rather than remembering the task type in the request URL 🤷♂️
Something like
{
"type": "text_embedding",
"<some field>": [
{
"embedding": [0.123, -0.456, ...]
},
{
"embedding": [0.123, -0.456, ...]
}
]
}
5321e62
to
13b9e8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we actually need task_type
in the URL path for all these APIs:
- does the task type define an isolated namespace for model_id, i.e. can we have both
text_embedding/foobar
andsparse_vector/foobar
? - the task_type is also present in some of the response bodies (in
ModelConfigContainer
), and could equally be made part ofModelConfig
, or maybe we don't even need it there since it is a characteristic of the chosen service?
|
||
export class Response { | ||
body: { | ||
predicted_value: InferenceResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently InferenceResult
is defined as being either an array (dense vector) or a dictionary (spare vector). This is enough for clients to disambiguate between the two variants.
I'm wondering however is there may be in the future other inference types that will return another kind of dictionary where values would not all be float
. If this is the case, then we must have some disambiguation information.
The best would be something like:
// Container is an aggregation of mutually exclusive variants
/** @variants container */
type InferenceResult {
text_embedding?: SparseVector,
dense_vector?: DenseVector
}
The response is different depending on task_type. If we know the task_type there is validation we can perform that the model is of the correct type and the input is properly constructed. Additionally, the input may be structured differently for certain task_types. I can see a cases where it would be preferable to elide task_type, one example is GET model by model Id without specifying the task type. That isn't implemented yet, when it is the spec will be updated.
|
@swallez I've made the changes you requested can you take another look please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I left a comment about the redundancy between ModelConfig
and ModelConfigContainer
and how we can address it.
/** | ||
* Represents a model as returned by the GET API | ||
*/ | ||
export class ModelConfigContainer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of overlap between ModelConfig
and ModelConfigContainer
.
Do we need task_type
and model_id
here, considering that the user provided them in the request? If we remove them, then we can keep only ModelConfig
.
If it's deemed necessary to repeat this information, then we should replace service
, service_settings
and task_settings
with a single config: ModelConfig
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @swallez, your comments have shown that the GET models API is not modelled correctly.
In future the GET inference model API will support wild card expansion. For example get all the text embedding models:
GET _inference/text_embedding/*
To support this GetModelResponse
should be an Array of ModelConfigContainer
not a single value. Once we have an array of model configs it is clear why model_id
and task_type
are required in the response because model_id
was not included in the request. The same applies to task_type
if we were to ask for all models regardless of task the response should contain the task type.
If it's deemed necessary to repeat this information, then we should replace service, service_settings and task_settings with a single config: ModelConfig field.
++ I like to reuse. How can I incorporate the fields from ModelConfig
into ModelConfigContainer
without having ModelConfig
as a nested object? I want just the fields defined in ModelConfig
to appear in ModelConfigContainer
.
export class ModelConfigContainer {
/**
* The model Id
*/
model_id: string
/**
* The model's task type
*/
task_type: TaskType
include all the fields defined in ModelConfig
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation regarding wildcards.
Regarding reuse, we can have ModelConfigContainer extends ModelConfig
with additional fields to achieve what you want.
Now there's nothing wrong with nesting (composition). Some of the target languages for client libraries (e.g. Go) don't support inheritance and will "flatten" inheritance hierarchies and produce unrelated types. So if we want to allow roundtripping (e.g. scenarios like get a ModelConfig, tweak it and Post it back), composition makes it easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I pushed the ModelConfigContainer extends ModelConfig
change.
The inability to round trip config is a common problem in Elasticsearch and it's quite wrong headed that you can't GET then PUT. I will ask for some more opinions as I'd prefer not to change the format and the nested config object makes it harder to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We looked at the round-tripping problem and realised there are other reason the prevent round-tripping anyway. The configuration contains secret settings such as an API key. The API key is included on PUT to create the model but because it is a secret it is never returned on GET. This means that you can't simply copy and paste the config as you need to insert the API key.
Our preference is to have the GET response contain flat fields as we think that is more readable than nesting the config part in another object.
Ping @swallez, could you take another look at this PR please |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
I've removed the backport 8.11 label as there have been many changes to the API since 8.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
(cherry picked from commit 4c51be9)
* Add _inference API (#2329) (cherry picked from commit 4c51be9) --------- Co-authored-by: David Kyle <[email protected]> Co-authored-by: Quentin Pradet <[email protected]>
Add the _inference CRUD APIs.
Each model has a
service_settings
andtask_settings
configuration with is related to the service used to evaluate the model. These options are specific to the service,m for example if using the HuggingFace service the options are different to using the ELSER service. TheServiceSettings
andTaskSettings
classes used to represent these options are currently empty as they are not explicitly modelled, I'm happy for these to be untyped at the moment and can be considered loose maps of <string, string>.The APIs are not in serverless yet, the new APIs are tagged
visibility=private
for serverless availability.