Add _inference API #2329

davidkyle · 2023-11-03T15:08:44Z

Add the _inference CRUD APIs.

Each model has a service_settings and task_settings configuration with is related to the service used to evaluate the model. These options are specific to the service,m for example if using the HuggingFace service the options are different to using the ELSER service. The ServiceSettings and TaskSettings classes used to represent these options are currently empty as they are not explicitly modelled, I'm happy for these to be untyped at the moment and can be considered loose maps of <string, string>.

The APIs are not in serverless yet, the new APIs are tagged visibility=private for serverless availability.

davidkyle · 2023-11-06T09:51:25Z

output/schema/validation-errors.json

+    "inference.delete_model": {
+      "request": [],
+      "response": [
+        "response definition inference.delete_model:Response / body / instance_of - Non-leaf type cannot be used here: '_types:AcknowledgedResponseBase'"


I this same warning in other APIs where AcknowledgedResponseBase is used. Have I misused the class?

davidkyle · 2023-11-06T11:18:54Z

specification/inference/inference/InferenceResponse.ts

+
+export class Response {
+  body: {
+    predicted_value: InferenceResult


predicted_value will be of different types depending on which model is used, it could be a text embedding result or a ELSER sparse vector result. The TaskType URL parameter to the API tells the client what the expected result type is, e.g. if I call POST _inference/text_embedding/model-foo I know this field will contain a text embedding result.

An alternative option is to name the field by the result type rather than using predicted_value all the time?

{ text_embedding: [0....] } { sparse_vector: { foo : 1.0, ...} }

From a client's perspective which is better or which is easier to parse?

I wonder if we should include a field in the response that indicates what type it is. OpenAI does this using the object field. I wonder if it'd be easier for clients to key off of that field rather than remembering the task type in the request URL 🤷‍♂️

Something like

{ "type": "text_embedding", "<some field>": [ { "embedding": [0.123, -0.456, ...] }, { "embedding": [0.123, -0.456, ...] } ] }

Currently InferenceResult is defined as being either an array (dense vector) or a dictionary (spare vector). This is enough for clients to disambiguate between the two variants.

I'm wondering however is there may be in the future other inference types that will return another kind of dictionary where values would not all be float. If this is the case, then we must have some disambiguation information.

The best would be something like:

// Container is an aggregation of mutually exclusive variants /** @variants container */ type InferenceResult { text_embedding?: SparseVector, dense_vector?: DenseVector }

@davidkyle for SparseVector should we call it SparseEmbedding and have the field be sparse_embedding?

The TaskType in the java code is SparseEmbedding.

👍 thanks for reviewing @jonathan-buttner. I've renamed the field sparse_embedding but kept the type as SparseVector. The reasoning being that the mapping types in Elasticsearch are sparse_vector and dense_vector. A text_embedding goes in a dense_vector and a sparse_embedding in a sparse_vector

jonathan-buttner

Not sure if we need to include this now but will we support an array for the input request field?

If so, we might want to structure the responses something like this that way we don't have a breaking change in the response format:

{
  "<some field>": [
    {
     ...
    },
    {
     .... 
    }
  ]
}

For text embeddings, how do we feel about a structure like this?

{
  "<some field>": [
    {
      "embedding": [0.123, -0.456, ...]
    },
    {
      "embedding": [0.123, -0.456, ...]
    }
  ]
}

Having an array of objects in the response gives us more flexibility to add a field later if we need to. If they are just arrays it'll be harder to add a new field.

specification/inference/_types/Results.ts

jonathan-buttner · 2023-11-08T14:06:58Z

specification/inference/inference/InferenceResponse.ts

+
+export class Response {
+  body: {
+    predicted_value: InferenceResult


I wonder if we should include a field in the response that indicates what type it is. OpenAI does this using the object field. I wonder if it'd be easier for clients to key off of that field rather than remembering the task type in the request URL 🤷‍♂️

Something like

{ "type": "text_embedding", "<some field>": [ { "embedding": [0.123, -0.456, ...] }, { "embedding": [0.123, -0.456, ...] } ] }

swallez

I'm wondering if we actually need task_type in the URL path for all these APIs:

does the task type define an isolated namespace for model_id, i.e. can we have both text_embedding/foobar and sparse_vector/foobar?
the task_type is also present in some of the response bodies (in ModelConfigContainer), and could equally be made part of ModelConfig, or maybe we don't even need it there since it is a characteristic of the chosen service?

swallez · 2023-11-13T14:53:51Z

specification/inference/inference/InferenceResponse.ts

+
+export class Response {
+  body: {
+    predicted_value: InferenceResult


Currently InferenceResult is defined as being either an array (dense vector) or a dictionary (spare vector). This is enough for clients to disambiguate between the two variants.

I'm wondering however is there may be in the future other inference types that will return another kind of dictionary where values would not all be float. If this is the case, then we must have some disambiguation information.

The best would be something like:

// Container is an aggregation of mutually exclusive variants /** @variants container */ type InferenceResult { text_embedding?: SparseVector, dense_vector?: DenseVector }

specification/inference/_types/Services.ts

davidkyle · 2023-11-15T12:18:09Z

I'm wondering if we actually need task_type in the URL path for all these APIs:

The response is different depending on task_type. If we know the task_type there is validation we can perform that the model is of the correct type and the input is properly constructed. Additionally, the input may be structured differently for certain task_types.

I can see a cases where it would be preferable to elide task_type, one example is GET model by model Id without specifying the task type. That isn't implemented yet, when it is the spec will be updated.

does the task type define an isolated namespace for model_id, i.e. can we have both text_embedding/foobar and sparse_vector/foobar?

model_ids are globally unique you cannot have the same model_id for 2 models of different task types.

the task_type is also present in some of the response bodies (in ModelConfigContainer), and could equally be made part of ModelConfig, or maybe we don't even need it there since it is a characteristic of the chosen service?

ModelConfig is used at model creation where the task type is read from the URL path. ModelConfigContainer is used in the GET model responses and is the same as ModelConfig with the extra fields model_id and task_type both of which are read from the URL at model creation (PUT). Is there a preferred convention modelling this type of pattern?

davidkyle · 2023-11-20T14:28:53Z

@swallez I've made the changes you requested can you take another look please

swallez

Sorry for the delay. I left a comment about the redundancy between ModelConfig and ModelConfigContainer and how we can address it.

swallez · 2023-11-23T11:30:32Z

specification/inference/_types/Services.ts

+/**
+ * Represents a model as returned by the GET API
+ */
+export class ModelConfigContainer {


There's a lot of overlap between ModelConfig and ModelConfigContainer.

Do we need task_type and model_id here, considering that the user provided them in the request? If we remove them, then we can keep only ModelConfig.

If it's deemed necessary to repeat this information, then we should replace service, service_settings and task_settings with a single config: ModelConfig field.

Thanks for the feedback @swallez, your comments have shown that the GET models API is not modelled correctly.

In future the GET inference model API will support wild card expansion. For example get all the text embedding models:

GET _inference/text_embedding/*

To support this GetModelResponse should be an Array of ModelConfigContainer not a single value. Once we have an array of model configs it is clear why model_id and task_type are required in the response because model_id was not included in the request. The same applies to task_type if we were to ask for all models regardless of task the response should contain the task type.

If it's deemed necessary to repeat this information, then we should replace service, service_settings and task_settings with a single config: ModelConfig field.

++ I like to reuse. How can I incorporate the fields from ModelConfig into ModelConfigContainer without having ModelConfig as a nested object? I want just the fields defined in ModelConfig to appear in ModelConfigContainer.

export class ModelConfigContainer { /** * The model Id */ model_id: string /** * The model's task type */ task_type: TaskType include all the fields defined in ModelConfig }

Thanks for the explanation regarding wildcards.

Regarding reuse, we can have ModelConfigContainer extends ModelConfig with additional fields to achieve what you want.

Now there's nothing wrong with nesting (composition). Some of the target languages for client libraries (e.g. Go) don't support inheritance and will "flatten" inheritance hierarchies and produce unrelated types. So if we want to allow roundtripping (e.g. scenarios like get a ModelConfig, tweak it and Post it back), composition makes it easier.

👍 I pushed the ModelConfigContainer extends ModelConfig change.

The inability to round trip config is a common problem in Elasticsearch and it's quite wrong headed that you can't GET then PUT. I will ask for some more opinions as I'd prefer not to change the format and the nested config object makes it harder to read

We looked at the round-tripping problem and realised there are other reason the prevent round-tripping anyway. The configuration contains secret settings such as an API key. The API key is included on PUT to create the model but because it is a secret it is never returned on GET. This means that you can't simply copy and paste the config as you need to insert the API key.

Our preference is to have the GET response contain flat fields as we think that is more readable than nesting the config part in another object.

davidkyle · 2023-11-27T09:30:00Z

Ping @swallez, could you take another look at this PR please

davidkyle · 2023-11-27T10:04:55Z

Ping @swallez, could you take another look at this PR please

Sorry @swallez I did not refresh the page and now I see your reply. Thanks I'll look at your comments now

github-actions · 2023-11-29T12:22:00Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.delete_model`	⚪	Missing test	Missing test
`inference.get_model`	⚪	Missing test	Missing test
`inference.inference`	⚪	Missing test	Missing test
`inference.put_model`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

davidkyle · 2023-11-29T14:38:27Z

I've removed the backport 8.11 label as there have been many changes to the API since 8.11

swallez

LGTM. Thanks!

(cherry picked from commit 4c51be9)

* Add _inference API (#2329) (cherry picked from commit 4c51be9) --------- Co-authored-by: David Kyle <[email protected]> Co-authored-by: Quentin Pradet <[email protected]>

davidkyle added specification backport 8.11 labels Nov 3, 2023

davidkyle commented Nov 6, 2023

View reviewed changes

davidkyle marked this pull request as ready for review November 6, 2023 11:19

davidkyle requested a review from a team as a code owner November 6, 2023 11:19

jonathan-buttner reviewed Nov 8, 2023

View reviewed changes

davidkyle added 3 commits November 10, 2023 09:31

Add _inference API

8febc26

Add text embedding results and tidy up

7662c98

Results are floats not doubles

13b9e8e

davidkyle force-pushed the inference-spec branch from 5321e62 to 13b9e8e Compare November 10, 2023 09:32

jonathan-buttner approved these changes Nov 10, 2023

View reviewed changes

swallez requested changes Nov 13, 2023

View reviewed changes

davidkyle added 3 commits November 15, 2023 11:08

Merge branch 'main' into inference-spec

7dd93e6

request can be single or multiple

8d8bf48

Inference results is a union of different fields

89bafd5

davidkyle requested a review from swallez November 15, 2023 14:44

davidkyle added 2 commits November 16, 2023 15:26

rename to sparse_embedding

41d0a85

Merge branch 'main' into inference-spec

ed33726

Merge branch 'main' into inference-spec

aa89a96

jonathan-buttner mentioned this pull request Nov 21, 2023

[ML] Inference Service Refactoring results format elastic/elasticsearch#102429

Merged

Sparse embedding result is an object

44e6a1f

swallez requested changes Nov 23, 2023

View reviewed changes

davidkyle added 3 commits November 27, 2023 13:08

Multiple models in GET response

d1b2729

Inherit fields

64a4eef

Merge branch 'main' into inference-spec

87c23e3

davidkyle removed the backport 8.11 label Nov 29, 2023

davidkyle requested a review from swallez November 29, 2023 14:37

pquentin mentioned this pull request Dec 1, 2023

Add Inference API elastic/elasticsearch-py#2393

Merged

swallez approved these changes Dec 6, 2023

View reviewed changes

davidkyle merged commit 4c51be9 into main Dec 7, 2023
5 checks passed

davidkyle deleted the inference-spec branch December 7, 2023 08:15

pquentin added the backport 8.12 label Dec 7, 2023

github-actions bot pushed a commit that referenced this pull request Dec 7, 2023

Add _inference API (#2329)

3828761

(cherry picked from commit 4c51be9)

github-actions bot mentioned this pull request Dec 7, 2023

[Backport 8.12] Add _inference API #2361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add _inference API #2329

Add _inference API #2329

davidkyle commented Nov 3, 2023 •

edited

Loading

davidkyle Nov 6, 2023

davidkyle Nov 6, 2023

jonathan-buttner Nov 8, 2023 •

edited

Loading

swallez Nov 13, 2023 •

edited

Loading

jonathan-buttner Nov 16, 2023

davidkyle Nov 16, 2023

jonathan-buttner left a comment

jonathan-buttner Nov 8, 2023 •

edited

Loading

swallez left a comment

swallez Nov 13, 2023 •

edited

Loading

davidkyle commented Nov 15, 2023

davidkyle commented Nov 20, 2023

swallez left a comment

swallez Nov 23, 2023

davidkyle Nov 27, 2023

swallez Nov 27, 2023

davidkyle Nov 27, 2023

davidkyle Nov 28, 2023

davidkyle commented Nov 27, 2023

davidkyle commented Nov 27, 2023

github-actions bot commented Nov 29, 2023

davidkyle commented Nov 29, 2023

swallez left a comment

Add _inference API #2329

Add _inference API #2329

Conversation

davidkyle commented Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan-buttner Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

swallez Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan-buttner left a comment

Choose a reason for hiding this comment

jonathan-buttner Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

swallez left a comment

Choose a reason for hiding this comment

swallez Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

davidkyle commented Nov 15, 2023

davidkyle commented Nov 20, 2023

swallez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidkyle commented Nov 27, 2023

davidkyle commented Nov 27, 2023

github-actions bot commented Nov 29, 2023

davidkyle commented Nov 29, 2023

swallez left a comment

Choose a reason for hiding this comment

davidkyle commented Nov 3, 2023 •

edited

Loading

jonathan-buttner Nov 8, 2023 •

edited

Loading

swallez Nov 13, 2023 •

edited

Loading

jonathan-buttner Nov 8, 2023 •

edited

Loading

swallez Nov 13, 2023 •

edited

Loading