Skip to content

Issues: NVIDIA/TensorRT-LLM

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

fp8 quantization for CohereForCausalLM Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#2666 opened Jan 7, 2025 by Alireza3242
Doc llm-api reference page empty? bug Something isn't working Documentation Improvements or additions to documentation Investigating triaged Issue has been triaged by maintainers
#2665 opened Jan 7, 2025 by BugFreeee
4 tasks
What are supported low-bit (int8/fp8/int4) data types in MLP and Attention layers? Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#2664 opened Jan 6, 2025 by mirzadeh
QTIP Quantization Support? Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#2663 opened Jan 6, 2025 by aikitoria
Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM bug Something isn't working Investigating LLM API/Workflow triaged Issue has been triaged by maintainers
#2658 opened Jan 5, 2025 by xunuohope1107
2 of 4 tasks
PyTorch Nightly support in Dockerfile
#2657 opened Jan 3, 2025 by sbhavani
No module named 'tensorrt_llm.bindings'` error message triaged Issue has been triaged by maintainers
#2656 opened Jan 3, 2025 by maulikmadhavi
setuptools conflict Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#2655 opened Jan 3, 2025 by kanebay
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable bug Something isn't working triaged Issue has been triaged by maintainers
#2652 opened Jan 3, 2025 by Whisht
2 of 4 tasks
Throughput Measurements
#2648 opened Jan 2, 2025 by Alireza3242
gemma 2 convert_checkpoint takes gpu ram more than needed bug Something isn't working Investigating LLM API/Workflow triaged Issue has been triaged by maintainers
#2647 opened Jan 2, 2025 by Alireza3242
2 of 4 tasks
Failed to build engine with lookahead_decoding bug Something isn't working Investigating Speculative Decoding triaged Issue has been triaged by maintainers
#2641 opened Dec 31, 2024 by aikitoria
2 of 4 tasks
Attention clarification
#2639 opened Dec 30, 2024 by Saeedmatt3r
Multi-Modal on TRT-LLM on aarch64 (Holoscan IGX Devkit) fails to covert VILA checkpoints bug Something isn't working
#2638 opened Dec 30, 2024 by MMelQin
2 of 4 tasks
Cpp runner outputs wrong results when using lora + tensor parallelism bug Something isn't working Investigating Lora/P-tuning triaged Issue has been triaged by maintainers
#2634 opened Dec 28, 2024 by ShuaiShao93
4 tasks
Troubleshoot mistral model bug Something isn't working
#2632 opened Dec 26, 2024 by krishnanpooja
1 of 4 tasks
Qwen2.5-72B-Instruct YaRN BUG bug Something isn't working
#2630 opened Dec 26, 2024 by PaulX1029
1 of 4 tasks
ProTip! Follow long discussions with comments:>50.