Skip to content

Commit

Permalink
add tpu jetstream reference
Browse files Browse the repository at this point in the history
  • Loading branch information
mfuntowicz committed Jan 13, 2025
1 parent 15a6a62 commit 00e5a7b
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
2 changes: 2 additions & 0 deletions _blog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5284,5 +5284,7 @@
- tgi
- backends
- vllm
- neuron
- jetstream
- tensorrt-llm
- community
3 changes: 2 additions & 1 deletion tgi-multi-backend.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ The new multi-backend capabilities of TGI open up many impactful roadmap opportu
* **NVIDIA TensorRT-LLM backend**: We are collaborating with the NVIDIA TensorRT-LLM team to bring all the optimized NVIDIA GPUs \+ TensorRT performances to the community. This work will be covered more extensively in an upcoming blog post. It closely relates to our mission to empower AI builders with the open-source availability of both `optimum-nvidia` quantize/build/evaluate TensorRT compatible artifacts alongside TGI+TRTLLM to easily deploy, execute, and scale deployments on NVIDIA GPUs.
* **Llama.cpp backend**: We are collaborating with the llama.cpp team to extend the support for server production use cases. The llama.cpp backend for TGI will provide a strong CPU-based option for anyone willing to deploy on Intel, AMD, or ARM CPU servers.
* **vLLM backend**: We are contributing to the vLLM project and are looking to integrate vLLM as a TGI backend in Q1 '25.
* **Neuron backend**: we are working with the Neuron teams at AWS to enable Inferentia 2 and Trainium 2 support natively in TGI
* **AWS Neuron backend**: we are working with the Neuron teams at AWS to enable Inferentia 2 and Trainium 2 support natively in TGI.
* **Google TPU backend**: We are working with the Google Jetstream & TPU teams to provide the best performance through TGI.

We are confident TGI Backends will help simplify the deployments of LLMs, bringing versatility and performance to all TGI users.
You'll soon be able to use TGI Backends directly within [Inference Endpoints](https://huggingface.co/inference-endpoints/). Customers will be able to easily deploy models with TGI Backends on various hardware with top-tier performance and reliability out of the box.
Expand Down

0 comments on commit 00e5a7b

Please sign in to comment.