Skip to content

Commit

Permalink
addresses more suggestions
Browse files Browse the repository at this point in the history
  • Loading branch information
mfuntowicz committed Jan 13, 2025
1 parent 4785228 commit 15a6a62
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 4 deletions.
14 changes: 14 additions & 0 deletions _blog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5271,4 +5271,18 @@
- open-source
- nlp
- tools
- community

- local: tgi-multi-backend
title: Introducing multi-backend (TRT-LLM, vLLM) support for Text-Generation-Inference
author:
- mfuntowicz
- hlarcher
thumbnail: TODO
date: January 15, 2025
tags:
- tgi
- backends
- vllm
- tensorrt-llm
- community
7 changes: 3 additions & 4 deletions tgi-multi-backend.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,20 @@
---
title: "Text-Generation-Inference empowering all the AI Builders Community"
title: "Introducing multi-backend (TRT-LLM, vLLM) support for Text-Generation-Inference"
thumbnail: TODO
authors:
- user: mfuntowicz
- user: hlarcher
---

# Text-Generation-Inference empowering all the AI Builders Community

# Introducing multi-backend (TRT-LLM, vLLM) support for Text-Generation-Inference
## Introducing multi-backends support for TGI

## Introduction

Since its initial release in 2022, Text-Generation-Inference (TGI) has provided Hugging Face and the AI Community with a performance-focused solution to easily deploy large-language models (LLMs). TGI initially offered an almost no-code solution to load models from the Hugging Face Hub and deploy them in production on NVIDIA GPUs. Over time, support expanded to include AMD Instinct GPUs, Intel GPUs, AWS Trainium/Inferentia, Google TPU, and Intel Gaudi.
Over the years, multiple inferencing solutions have emerged, including vLLM, SGLang, llama.cpp, TensorRT-LLM, etc., splitting up the overall ecosystem. Different models, hardware, and use cases may require a specific backend to achieve optimal performance. However, configuring each backend correctly, managing licenses, and integrating them into existing infrastructure can be challenging for users.

To address this, we are excited to introduce the concept of TGI Backends. This new feature gives the flexibility to integrate with any of the solutions above through a single unified frontend layer: TGI. This change makes it easier for the community to get the best performance for their production workloads, switching backends according to their modeling, hardware, and performance requirements.
To address this, we are excited to introduce the concept of TGI Backends. This new architecture gives the flexibility to integrate with any of the solutions above through TGI as a single unified frontend layer. This change makes it easier for the community to get the best performance for their production workloads, switching backends according to their modeling, hardware, and performance requirements.

The Hugging Face team is excited to contribute to and collaborate with the teams that build vLLM, llama.cpp, TensorRT-LLM, and the teams at AWS, Google, NVIDIA, AMD and Intel to offer a robust and consistent user experience for TGI users whichever backend and hardware they want to use.

Expand Down

0 comments on commit 15a6a62

Please sign in to comment.