Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _inference API #2329

Merged
merged 13 commits into from
Dec 7, 2023
Merged

Add _inference API #2329

merged 13 commits into from
Dec 7, 2023

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Nov 3, 2023

Add the _inference CRUD APIs.

Each model has a service_settings and task_settings configuration with is related to the service used to evaluate the model. These options are specific to the service,m for example if using the HuggingFace service the options are different to using the ELSER service. The ServiceSettings and TaskSettings classes used to represent these options are currently empty as they are not explicitly modelled, I'm happy for these to be untyped at the moment and can be considered loose maps of <string, string>.

The APIs are not in serverless yet, the new APIs are tagged visibility=private for serverless availability.

"inference.delete_model": {
"request": [],
"response": [
"response definition inference.delete_model:Response / body / instance_of - Non-leaf type cannot be used here: '_types:AcknowledgedResponseBase'"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this same warning in other APIs where AcknowledgedResponseBase is used. Have I misused the class?


export class Response {
body: {
predicted_value: InferenceResult
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicted_value will be of different types depending on which model is used, it could be a text embedding result or a ELSER sparse vector result. The TaskType URL parameter to the API tells the client what the expected result type is, e.g. if I call POST _inference/text_embedding/model-foo I know this field will contain a text embedding result.

An alternative option is to name the field by the result type rather than using predicted_value all the time?

{
  text_embedding: [0....]
}

{
  sparse_vector: { foo : 1.0, ...}
}

From a client's perspective which is better or which is easier to parse?

Copy link
Contributor

@jonathan-buttner jonathan-buttner Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should include a field in the response that indicates what type it is. OpenAI does this using the object field. I wonder if it'd be easier for clients to key off of that field rather than remembering the task type in the request URL 🤷‍♂️

Something like

{
  "type": "text_embedding",
  "<some field>": [
    {
      "embedding": [0.123, -0.456, ...]
    },
    {
      "embedding": [0.123, -0.456, ...]
    }
  ]
}

Copy link
Member

@swallez swallez Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently InferenceResult is defined as being either an array (dense vector) or a dictionary (spare vector). This is enough for clients to disambiguate between the two variants.

I'm wondering however is there may be in the future other inference types that will return another kind of dictionary where values would not all be float. If this is the case, then we must have some disambiguation information.

The best would be something like:

// Container is an aggregation of mutually exclusive variants
/** @variants container */
type InferenceResult {
    text_embedding?: SparseVector,
    dense_vector?: DenseVector
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle for SparseVector should we call it SparseEmbedding and have the field be sparse_embedding?

The TaskType in the java code is SparseEmbedding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for reviewing @jonathan-buttner. I've renamed the field sparse_embedding but kept the type as SparseVector. The reasoning being that the mapping types in Elasticsearch are sparse_vector and dense_vector. A text_embedding goes in a dense_vector and a sparse_embedding in a sparse_vector

@davidkyle davidkyle marked this pull request as ready for review November 6, 2023 11:19
@davidkyle davidkyle requested a review from a team as a code owner November 6, 2023 11:19
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to include this now but will we support an array for the input request field?

If so, we might want to structure the responses something like this that way we don't have a breaking change in the response format:

{
  "<some field>": [
    {
     ...
    },
    {
     .... 
    }
  ]
}

For text embeddings, how do we feel about a structure like this?

{
  "<some field>": [
    {
      "embedding": [0.123, -0.456, ...]
    },
    {
      "embedding": [0.123, -0.456, ...]
    }
  ]
}

Having an array of objects in the response gives us more flexibility to add a field later if we need to. If they are just arrays it'll be harder to add a new field.

specification/inference/_types/Results.ts Outdated Show resolved Hide resolved

export class Response {
body: {
predicted_value: InferenceResult
Copy link
Contributor

@jonathan-buttner jonathan-buttner Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should include a field in the response that indicates what type it is. OpenAI does this using the object field. I wonder if it'd be easier for clients to key off of that field rather than remembering the task type in the request URL 🤷‍♂️

Something like

{
  "type": "text_embedding",
  "<some field>": [
    {
      "embedding": [0.123, -0.456, ...]
    },
    {
      "embedding": [0.123, -0.456, ...]
    }
  ]
}

Copy link
Member

@swallez swallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we actually need task_type in the URL path for all these APIs:

  • does the task type define an isolated namespace for model_id, i.e. can we have both text_embedding/foobar and sparse_vector/foobar?
  • the task_type is also present in some of the response bodies (in ModelConfigContainer), and could equally be made part of ModelConfig, or maybe we don't even need it there since it is a characteristic of the chosen service?


export class Response {
body: {
predicted_value: InferenceResult
Copy link
Member

@swallez swallez Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently InferenceResult is defined as being either an array (dense vector) or a dictionary (spare vector). This is enough for clients to disambiguate between the two variants.

I'm wondering however is there may be in the future other inference types that will return another kind of dictionary where values would not all be float. If this is the case, then we must have some disambiguation information.

The best would be something like:

// Container is an aggregation of mutually exclusive variants
/** @variants container */
type InferenceResult {
    text_embedding?: SparseVector,
    dense_vector?: DenseVector
}

specification/inference/_types/Services.ts Outdated Show resolved Hide resolved
@davidkyle
Copy link
Member Author

I'm wondering if we actually need task_type in the URL path for all these APIs:

The response is different depending on task_type. If we know the task_type there is validation we can perform that the model is of the correct type and the input is properly constructed. Additionally, the input may be structured differently for certain task_types.

I can see a cases where it would be preferable to elide task_type, one example is GET model by model Id without specifying the task type. That isn't implemented yet, when it is the spec will be updated.

does the task type define an isolated namespace for model_id, i.e. can we have both text_embedding/foobar and sparse_vector/foobar?

model_ids are globally unique you cannot have the same model_id for 2 models of different task types.

the task_type is also present in some of the response bodies (in ModelConfigContainer), and could equally be made part of ModelConfig, or maybe we don't even need it there since it is a characteristic of the chosen service?

ModelConfig is used at model creation where the task type is read from the URL path. ModelConfigContainer is used in the GET model responses and is the same as ModelConfig with the extra fields model_id and task_type both of which are read from the URL at model creation (PUT). Is there a preferred convention modelling this type of pattern?

@davidkyle davidkyle requested a review from swallez November 15, 2023 14:44
@davidkyle
Copy link
Member Author

@swallez I've made the changes you requested can you take another look please

Copy link
Member

@swallez swallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I left a comment about the redundancy between ModelConfig and ModelConfigContainer and how we can address it.

/**
* Represents a model as returned by the GET API
*/
export class ModelConfigContainer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of overlap between ModelConfig and ModelConfigContainer.

Do we need task_type and model_id here, considering that the user provided them in the request? If we remove them, then we can keep only ModelConfig.

If it's deemed necessary to repeat this information, then we should replace service, service_settings and task_settings with a single config: ModelConfig field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @swallez, your comments have shown that the GET models API is not modelled correctly.

In future the GET inference model API will support wild card expansion. For example get all the text embedding models:

GET _inference/text_embedding/*

To support this GetModelResponse should be an Array of ModelConfigContainer not a single value. Once we have an array of model configs it is clear why model_id and task_type are required in the response because model_id was not included in the request. The same applies to task_type if we were to ask for all models regardless of task the response should contain the task type.

If it's deemed necessary to repeat this information, then we should replace service, service_settings and task_settings with a single config: ModelConfig field.

++ I like to reuse. How can I incorporate the fields from ModelConfig into ModelConfigContainer without having ModelConfig as a nested object? I want just the fields defined in ModelConfig to appear in ModelConfigContainer.

export class ModelConfigContainer {
  /**
   * The model Id
   */
  model_id: string
  /**
   * The model's task type
   */
  task_type: TaskType

  include all the fields defined in ModelConfig
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation regarding wildcards.

Regarding reuse, we can have ModelConfigContainer extends ModelConfig with additional fields to achieve what you want.

Now there's nothing wrong with nesting (composition). Some of the target languages for client libraries (e.g. Go) don't support inheritance and will "flatten" inheritance hierarchies and produce unrelated types. So if we want to allow roundtripping (e.g. scenarios like get a ModelConfig, tweak it and Post it back), composition makes it easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I pushed the ModelConfigContainer extends ModelConfig change.

The inability to round trip config is a common problem in Elasticsearch and it's quite wrong headed that you can't GET then PUT. I will ask for some more opinions as I'd prefer not to change the format and the nested config object makes it harder to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We looked at the round-tripping problem and realised there are other reason the prevent round-tripping anyway. The configuration contains secret settings such as an API key. The API key is included on PUT to create the model but because it is a secret it is never returned on GET. This means that you can't simply copy and paste the config as you need to insert the API key.

Our preference is to have the GET response contain flat fields as we think that is more readable than nesting the config part in another object.

@davidkyle
Copy link
Member Author

Ping @swallez, could you take another look at this PR please

@davidkyle
Copy link
Member Author

Ping @swallez, could you take another look at this PR please

Sorry @swallez I did not refresh the page and now I see your reply. Thanks I'll look at your comments now

Copy link
Contributor

Following you can find the validation results for the APIs you have changed.

API Status Request Response
inference.delete_model Missing test Missing test
inference.get_model Missing test Missing test
inference.inference Missing test Missing test
inference.put_model Missing test Missing test

You can validate these APIs yourself by using the make validate target.

@davidkyle
Copy link
Member Author

I've removed the backport 8.11 label as there have been many changes to the API since 8.11

Copy link
Member

@swallez swallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@davidkyle davidkyle merged commit 4c51be9 into main Dec 7, 2023
5 checks passed
@davidkyle davidkyle deleted the inference-spec branch December 7, 2023 08:15
github-actions bot pushed a commit that referenced this pull request Dec 7, 2023
(cherry picked from commit 4c51be9)
pquentin added a commit that referenced this pull request Dec 8, 2023
* Add _inference API (#2329)

(cherry picked from commit 4c51be9)

---------

Co-authored-by: David Kyle <[email protected]>
Co-authored-by: Quentin Pradet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants