-
Notifications
You must be signed in to change notification settings - Fork 836
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* new introduction to Core 2 * adjust image links, remove old content * edit faulty link * spelling * Edits from Rajie's suggestions * Wording Suggestions * Add links for next steps
- Loading branch information
1 parent
c80ab43
commit f39ee1c
Showing
4 changed files
with
26 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,63 +1,48 @@ | ||
# About | ||
# Overview | ||
|
||
Seldon Core 2 APIs provide a state of the art solution for machine learning inference which | ||
can be run locally on a laptop as well as on Kubernetes for production. | ||
Seldon Core 2 is a source-available framework for deploying and managing machine learning systems at scale. The data-centric approach and modular architecture of Core 2 helps you deploy, manage, and scale your ML - from simple models to complex ML applications. After the models are deployed, Core 2 enables the monitoring and experimentation on those systems in production. With support for a wide range of model types, and design patterns to build around those models, you can standardize ML deployment across a range of use-cases in the cloud or on-premise serving infrastructure of your choice. | ||
|
||
{% embed url="https://www.youtube.com/watch?v=ar5lSG_idh4" %} | ||
|
||
## Features | ||
## Model Deployment | ||
|
||
* A single platform for inference of wide range of standard and custom artifacts. | ||
* Deploy locally in Docker during development and testing of models. | ||
* Deploy at scale on Kubernetes for production. | ||
* Deploy single models to multi-step pipelines. | ||
* Save infrastructure costs by deploying multiple models transparently in inference servers. | ||
* Overcommit on resources to deploy more models than available memory. | ||
* Dynamically extended models with pipelines with a data-centric perspective backed by Kafka. | ||
* Explain individual models and pipelines with state of the art explanation techniques. | ||
* Deploy drift and outlier detectors alongside models. | ||
* Kubernetes Service mesh agnostic - use the service mesh of your choice. | ||
Seldon Core 2 orchestrates and scales machine learning components running as production-grade microservices. These components can be deployed locally or in enterprise-scale kubernetes clusters. The components of your ML system - such as models, processing steps, custom logic, or monitoring methods - are deployed as **Models**, leveraging serving solutions compatible with Core 2 such as MLServer, Alibi, LLM Module, or Triton Inference Server. These serving solutions package the required dependencies and standardize inference using the Open Inference Protocol. This ensures that, regardless of your model types and use-cases, all request and responses follow a unified format. After models are deployed, they can process REST or gRPC requests for real-time inference. | ||
|
||
## Complex Applications & Orchestration | ||
|
||
## Core features and comparison to Seldon Core V1 APIs | ||
Machine learning applications are increasingly complex. They’ve evolved from individual models deployed as services, to complex applications that can consist of multiple models, processing steps, custom logic, and asynchronous monitoring components. With Core you can build Pipelines that connect any of these components to make data-centric applications. Core 2 handles orchestration and scaling of the underlying components of such an application, and exposes the data streamed through the application in real time using Kafka. | ||
|
||
Our V2 APIs separate out core tasks into separate resources allowing users to get started fast | ||
with deploying a Model and the progressing to more complex Pipelines, Explanations and Experiments. | ||
{% hint style="info" %} | ||
Data-centricity is an approach that places the management, integrity, and flow of data at the core of the machine learning deployment framework. | ||
{% endhint %} | ||
|
||
![intro](images/intro.png) | ||
This approach to MLOps, influenced by our position paper [Desiderata for next generation of ML model serving](https://arxiv.org/abs/2210.14665), enables real-time observability, insight, and control on the behavior, and performance of your ML systems. | ||
|
||
## Multi-model serving | ||
|
||
Seldon transparently will provision your model onto the correct inference server. | ||
![Data-centric pipeline](images/pipeline-intro.png) | ||
|
||
![mms1](images/multimodel1.png) | ||
Lastly, Core 2 provides Experiments as part of its orchestration capabilities, enabling users to implement routing logic such as A/B tests or Canary deployments to models or pipelines in production. After experiments are run, you can promote new models or pipelines, or launch new experiments, so that you can continuously improve the performance of your ML applications. | ||
|
||
By packing multiple models onto a smaller set of servers users can save infrastructure costs and | ||
efficiently utilize their models. | ||
|
||
![mms2](images/mms.png) | ||
## Resource Management | ||
|
||
By allowing over-commit users can provision model models that available memory resources by | ||
allowing Seldon to transparently unload models that are not in use. | ||
In Seldon Core 2 your models are deployed on inference servers, which manage the packaging and execution of ML workloads. As part its design, Core 2 separates out **Servers** and **Models** as separate resources. This approach enables flexible allocation of models to servers aligning with the requirements of your models, and to the underlying infrastructure that you want your servers to run on. Core 2 also provides functionality to autoscale your models and servers up and down as needed based on your workload requirements or user-defined metrics. | ||
|
||
![mms3](images/overcommit.png) | ||
With the modular design of Core 2, users are able to implement cutting-edge methods to minimize hardware costs: | ||
|
||
## Inference Servers | ||
- **Multi-Model serving** consolidates multiple models onto shared inference servers to optimize resource utilization and decrease the number of servers required. | ||
- **Over-commit** allows you to provision more models than the available memory would normally allow by dynamically loading and unloading models from memory to disk based on demand. | ||
|
||
Seldon V2 supports any V2 protocol inference server. At present we include Seldon's MLServer and NVIDIA's Triton inference server automatically on install. These servers cover a wide range of artifacts including custom python models. | ||
![Example: Serving multiple model types across inference servers](images/models-servers.png) | ||
|
||
![servers](images/servers.png) | ||
## End-to-End MLOps with Core 2 | ||
|
||
## Service Mesh Agnostic | ||
Core 2 demonstrates the power of a standardized, data-centric approach to MLOps at scale, ensuring that data observability and management are prioritized across every layer of machine learning operations. Furthermore, Core 2 seamlessly integrates into end-to-end MLOps workflows, from CI/CD, managing traffic with the service mesh of your choice, alerting, data visualization, or authentication and authorization. | ||
|
||
Seldon Core 2 can be integrated with any Kubernetes service mesh. There are current examples with istio, Ambassador and Traefic. | ||
This modular, flexible architecture not only supports diverse deployment patterns but also ensures compatibility with the latest AI innovations. By embedding data-centricity and adaptability into its foundation, Core 2 equips organizations to scale and improve their machine learning systems effectively, to capture value from increasingly complex AI systems. | ||
|
||
![mesh](images/mesh.png) | ||
## Next Steps | ||
|
||
## Publication | ||
|
||
These features are influenced by our position paper on the next generation of ML model serving frameworks: | ||
|
||
*Title*: [Desiderata for next generation of ML model serving](http://arxiv.org/abs/2210.14665) | ||
|
||
*Workshop*: Challenges in deploying and monitoring ML systems workshop - NeurIPS 2022 | ||
- [Install Seldon Core 2](./getting-started/README.md) | ||
- Explore our [Tutorials](./examples/README.md) | ||
- [Join our Slack Community](https://seldondev.slack.com/join/shared_invite/zt-vejg6ttd-ksZiQs3O_HOtPQsen_labg#/shared-invite/email) for updates or for answers to any questions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.