-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Adding dynamic filtering for EIS configuration #120235
base: main
Are you sure you want to change the base?
Conversation
@@ -78,8 +78,8 @@ default void init(Client client) {} | |||
* Whether this service should be hidden from the API. Should be used for services | |||
* that are not ready to be used. | |||
*/ | |||
default Boolean hideFromConfigurationApi() { | |||
return Boolean.FALSE; | |||
default boolean hideFromConfigurationApi() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some refactoring, I think we can use a primitive here since I don't believe we'll ever want to return null.
List<String> taskTypes = (ArrayList<String>) args[2]; | ||
return new InferenceServiceConfiguration.Builder().setService((String) args[0]) | ||
.setName((String) args[1]) | ||
.setTaskTypes(EnumSet.copyOf(taskTypes.stream().map(TaskType::fromString).collect(Collectors.toList()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyOf
throws if it receives an empty set so I modified this to allow an empty set via the builder. An empty set should be unlikely in production because we shouldn't be getting the configuration at all if no task types are supported but it helps the tests.
@@ -274,7 +277,12 @@ public Collection<?> createComponents(PluginServices services) { | |||
|
|||
ElasticInferenceServiceSettings inferenceServiceSettings = new ElasticInferenceServiceSettings(settings); | |||
String elasticInferenceUrl = this.getElasticInferenceServiceUrl(inferenceServiceSettings); | |||
elasticInferenceServiceComponents.set(new ElasticInferenceServiceComponents(elasticInferenceUrl)); | |||
elasticInferenceServiceComponents.set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brendan-jugan-elastic this is where we'll need the logic to retrieve the actual enabled models and task types from the EIS gateway.
var enabledStreamingTaskTypes = EnumSet.of(TaskType.COMPLETION); | ||
enabledStreamingTaskTypes.retainAll(enabledTaskTypes); | ||
|
||
if (enabledStreamingTaskTypes.isEmpty() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no enabled task types we won't add any
since we don't want to support anything.
} | ||
|
||
private static final LazyInitializable<InferenceServiceConfiguration, RuntimeException> configuration = new LazyInitializable<>( | ||
() -> { | ||
private LazyInitializable<InferenceServiceConfiguration, RuntimeException> initConfiguration() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing static here because this depends on a field initialized in the constructor.
Pinging @elastic/ml-core (Team:ML) |
@@ -38,44 +30,13 @@ class RequestTask implements RejectableTask { | |||
ActionListener<InferenceServiceResults> listener | |||
) { | |||
this.requestCreator = Objects.requireNonNull(requestCreator); | |||
this.listener = getListener(Objects.requireNonNull(listener), timeout, Objects.requireNonNull(threadPool)); | |||
this.timedListener = new TimedListener<>(timeout, listener, threadPool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this into TimedListener
so we could access it in the new send method of HttpRequestSender
.
private static EnumSet<TaskType> toTaskTypes(List<String> stringTaskTypes) { | ||
var taskTypes = EnumSet.noneOf(TaskType.class); | ||
for (String taskType : stringTaskTypes) { | ||
taskTypes.add(TaskType.fromStringOrStatusException(taskType)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: If the task type is invalid we should ignore it, that could result in an empty task_types
array. If that happens we should remove the model entry.
WIP
This PR adds the ability to determine which models and task types will be supported by the cluster at the node bootup time.
This is my suggestion for the format of the response from the gateway:
My reasoning for a list instead of a single entry is that openai's gpt4-o supports completions and image generation which I'm guess would be two separate task types for us in the future. So best to allow multiple entries here.