- vLLM
- text-generation-inference
- TensorRT-LLM
- Deepspeed-mii
- Token latency
- Avg latency
- Variance
- Pause time
- Total pause time
- Pause ratio: pause time / end-to-end inference time
- Time to first token
- Prefilling time
- Queuing time
- Memory
- Memory IO
- Compute
- Energy
- Llama v2
- 13B