v0.4.0
v0.4.0 - 2024-12-06
This release includes these major new features:
- Add support for Phi3.5 model.
- Add support for using vLLM runtime in inference service.
- Add support for using chat template.
- Bump accelerate to 1.0.0.
Changelog
Features 🌈
- e0f28f0 feat: Handle HF Remote API Call Format (#751)
- 0f9a11d feat: support vllm in controller (#635)
- 5269bd7 feat: bump accelerate to 1.0.0 (#739)
- f5d0958 feat: Update Llama Endpoint (#738)
- 2cb5710 feat: add tuning test to preset test (#741)
- f7e6d66 feat: [SKU modularization] AWS chart changes (#710)
- 391b398 feat: Add flag for running 1ES Public Models (#733)
- 0aea28e feat: Custom Dockerfile update BaseImage (#724)
- 0087e09 feat: add preset test for vllm (#694)
- c25e7e9 feat: RAG service health check (#704)
- f3ef4c8 feat: RAG engine validation (#691)
- 1c6eb2e feat: support adaptive
max_model_len
(#657) - cafb947 feat: RAG engine deployment creation (#660)
- 2ecfdf1 feat: RAG engine controller revision (#682)
- 79494a2 feat: Dockerfile for Kaito RAG Service (#680)
- 9f5632a feat: Migrate E2E to Self-Hosted Runner (#641)
- 1676c0d feat: Runner Setup Script (#676)
- 71ddc55 feat: Introduce Abstract Class for Integration Testing (#674)
- ad0dde9 feat: Update VectorStore Base class (#673)
- 7bea782 feat: run e2e test in parallel (#667)
- 1709ba0 feat: package vllm runtime into image (#655)
- 6b216fc feat: Add delete and finalizer to RAGEngine (#646)
- 1d09da0 feat: implement inference server by using vllm (#624)
- 8906190 feat: Part 4 (Final) - Introduce Main RAG Service API and its tests (#603)
- 791c175 feat: add printcolumn to RAG Engine (#623)
- 544df3f feat: add Nodeclaim & Machine provision to RAG Engine controller (#622)
- 941170b feat: Part 3 - Introduce Vector Store Manager and Vector Store Class (#633)
- 65b844a Revert "feat: Migrate E2E pipeline to using Self-Hosted Runner" (#642)
- b6694c2 feat: Migrate E2E pipeline to using Self-Hosted Runner (#638)
- 314a80e feat: Revert the refactoring of RAGEngineStatus and WorkspaceStatus (#636)
- 870a93d feat: Part 2 - Add custom LLM inference class (#630)
- 1d99028 feat: Part 1 - Add RAG Embedding Interface (#628)
- 152e683 feat: refact updateStatusConditionIfNotMatch for both RAG and workspace (#626)
- 920ada5 feat: refactor updateObjStatus for both RAG and workspace (#625)
- 1818551 feat: Update RAG Status (#621)
- f613bb4 feat: update of functions related to nodeclaim and machine for RAG engine (#620)
- 38656dd feat: Clusterrole and Webhook update for RAG Engine (#619)
- cccb1cb feat: add WorkerNodes to RAGEngineStatus (#612)
- ba1a62d feat: Add ragengine controller scaffolding code and chart (#600)
- a06cf97 feat: [SKU modularization] remove sku_config from v1alpha1 and implement skuHandler interface (#602)
- 2cdc682 feat: Add RAGEngine CRD (#597)
- f3d6e09 feat: Options for Building and Running Private/Custom Models (#598)
Bug Fixes 🐞
- 6a75817 fix: set gorelease main pkg to workspace
- be067f6 fix: featuregate flag render problem
- 3882218 fix: disable tensor parallel for falcon7b (#755)
- fea2924 fix: Filepath in custom-deployment-template.yaml (#757)
- dba607b fix: Update Dockerfile Path
- ab1fd7e fix: preset tuning test workflow (#742)
- 1b440d5 fix: create the workspace service for custom models (#745)
- 21056a1 fix: secret update patch (#709)
- 3dbb660 fix: Validate workspace name (#726)
- 511dfa1 fix: Update custom-model-integration-guide.md
- 666c5fc fix: Update README.md (#697)
- 9d19e8f fix: Update Benchmark Pull Image Instructions
- 9d72066 fix: Update Runner Labels (#712)
- a146f3c fix: patch of RAGengine dockerfile (#707)
- 1517106 fix: binary search for best context length avoiding oom (#705)
- bac8d34 fix: skip e2e test for mcr publish pipeline (#699)
- 77ed191 fix: remove secret env when trigger e2e flow (#696)
- 5812927 fix: Update Makefile (#692)
- f1db127 fix: Remove empty var (#681)
- 001d148 fix: Add K8s Env Var (#679)
- 9408b12 fix: patch imagePullSecrets validation in e2e test (#666)
- d926c44 fix: Update Dockerfile.reference (#665)
- f64b35a fix: Populate ImagePullSecrets in Adapter Deployment and Add Corresponding Tests (#656)
- fc925de fix: polish liveness health threshold (#659)
- 0abe5d7 fix: NVML unknown error (#639)
- 7a9468c fix: update Dockerfile (#640)
- 2ec45ac fix: ignore instanceType when selecting preferred nodes (#618)
- 2b07c29 fix: Update kaito-e2e.yml (#614)
Code Refactoring 💎
Documentation 📘
- 692a7da docs: update for multi-runtime support (#754)
- 711c858 docs: [SKU modularizastion] Add AWS installation documentation (#711)
- 00056b5 docs: Update installation.md (#736)
- f889920 docs: Add guide for running Kaito on BYO GPU nodes (#732)
- 2139dfe docs: Update helm list command in installation guide to use new namespaces. (#730)
- 64c8ffb docs: update docs with 0.3.2 release (#700)
- 58894ba docs: fix terraform and update readme (#637)
- 6481b76 docs: quick deploy using terraform (#634)
- 6b8bc80 docs: Update README with the new release (#592)
Maintenance 🔧
- 0e80023 chore: switch buildkit image to mcr registry
- 1e4e699 chore: Mark ragengine as WIP for helm installation (#758)
- 3bc450b chore: bump actions/dependency-review-action from 4.3.4 to 4.5.0 (#714)
- 69986b0 chore: bump codecov/codecov-action from 4.6.0 to 5.0.7 (#716)
- 3a68fe8 chore: bump actions/setup-go from 5.0.2 to 5.1.0 (#687)
- ff41d1c chore: add zhuangqh to codeowners (#701)
- fcd5d1c chore: restruct workspace controller code - part 4 (#685)
- a057d70 chore: restruct workspace controller code - part 3 (#684)
- 3c873ec chore: restruct workspace controller code - part 2 (#683)
- e886346 chore: restruct workspace controller code - part 1 (#675)
- 38fae09 chore: bump step-security/harden-runner from 2.9.1 to 2.10.1 (#596)
- 79e425c chore: bump github.com/Azure/karpenter-provider-azure from 0.5.1 to 0.5.4 (#599)
- 1fb9989 chore: bump azure/CLI from 2.0.0 to 2.1.0 (#588)
- b97ab11 chore: refactor to move ragengine to a central package (#671)
- 2ca998e chore: removed Microsoft trademark, updated contributing guidelines, CoC in readme (#672)
- 40d1321 chore: Updated to CNCF CoC, Maintainers file (#670)
- 1248109 chore: clean up build cmds for workspace (#668)
- 8f894bb chore: bump codecov/codecov-action from 4.5.0 to 4.6.0 (#613)
- 00ad1f6 chore: bump azure/login from 2.1.1 to 2.2.0 (#627)
- bf12222 chore: bump actions/checkout from 4.1.7 to 4.2.2 (#647)
- 5f2f649 chore: Renaming to reflect updated repo (#663)
- f35ca31 chore: bump azure/login from 2.1.1 to 2.2.0 (#604)
- bcc0276 chore: Update Phi README.md (#593)