Skip to content

Commit

Permalink
Remove references to text-generation-inference
Browse files Browse the repository at this point in the history
  • Loading branch information
sd109 committed Dec 13, 2023
1 parent 608f851 commit b7bb1c0
Show file tree
Hide file tree
Showing 5 changed files with 8 additions and 7 deletions.
2 changes: 1 addition & 1 deletion templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
The LLM app allows users to deploy machine learning models using [text-generation-inference](https://github.com/huggingface/text-generation-inference) as a model serving backend and [gradio](https://github.com/gradio-app/gradio) as a web interface.
The LLM app allows users to deploy machine learning models using [vLLM](https://docs.vllm.ai/en/latest/) as a model serving backend and [gradio](https://github.com/gradio-app/gradio) as a web interface.
2 changes: 1 addition & 1 deletion templates/api/deployment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ spec:
# TODO: Make this configurable (e.g. hostPath or PV)
- name: data
{{- .Values.api.cacheVolume | toYaml | nindent 10 }}
# Suggested in text-generation-inference docs
# Suggested in vLLM docs
- name: shm
emptyDir:
medium: Memory
Expand Down
5 changes: 2 additions & 3 deletions values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,15 @@ api:
version: "6876068"
# Service config
service:
name: text-generation-inference
name: llm-backend
type: ClusterIP
zenith:
enabled: false
skipAuth: false
label: Inference API
iconUrl:
description: |
The raw inference API endpoints for the deployed LLM.
Public API docs are available [here](https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference)
The raw inference API endpoints for the deployed LLM.
# Config for huggingface model cache volume
# This is mounted at /root/.cache/huggingface in the api deployment
cacheVolume:
Expand Down
3 changes: 2 additions & 1 deletion web-app-utils/example_app_playful.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
from api_startup_check import wait_for_backend

# NOTE: This url should match the chart's api service name & namespace
backend_url = "http://text-generation-inference.default.svc"
#TODO: Detect namespace automatically?
backend_url = "http://llm-backend.default.svc"
wait_for_backend(backend_url)

prompt = """
Expand Down
3 changes: 2 additions & 1 deletion web-app-utils/example_app_vanilla.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
from api_startup_check import wait_for_backend

# NOTE: This url should match the chart's api service name & namespace
backend_url = "http://text-generation-inference.default.svc"
#TODO: Detect namespace automatically?
backend_url = "http://llm-backend.default.svc"
wait_for_backend(backend_url)

prompt = """
Expand Down

0 comments on commit b7bb1c0

Please sign in to comment.