-
Notifications
You must be signed in to change notification settings - Fork 1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
fp8 quantization for CohereForCausalLM
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2666
opened Jan 7, 2025 by
Alireza3242
Doc llm-api reference page empty?
bug
Something isn't working
Documentation
Improvements or additions to documentation
Investigating
triaged
Issue has been triaged by maintainers
#2665
opened Jan 7, 2025 by
BugFreeee
4 tasks
What are supported low-bit (int8/fp8/int4) data types in MLP and Attention layers?
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2664
opened Jan 6, 2025 by
mirzadeh
QTIP Quantization Support?
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2663
opened Jan 6, 2025 by
aikitoria
Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse
bug
Something isn't working
#2660
opened Jan 6, 2025 by
Somasundaram-Palaniappan
2 of 4 tasks
[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example?
Investigating
Performance
Issue about performance number
triaged
Issue has been triaged by maintainers
#2659
opened Jan 5, 2025 by
danielhua23
Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2658
opened Jan 5, 2025 by
xunuohope1107
2 of 4 tasks
No module named 'tensorrt_llm.bindings'` error message
triaged
Issue has been triaged by maintainers
#2656
opened Jan 3, 2025 by
maulikmadhavi
setuptools conflict
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2655
opened Jan 3, 2025 by
kanebay
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2652
opened Jan 3, 2025 by
Whisht
2 of 4 tasks
gemma 2 convert_checkpoint takes gpu ram more than needed
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2647
opened Jan 2, 2025 by
Alireza3242
2 of 4 tasks
Failed to build engine with lookahead_decoding
bug
Something isn't working
Investigating
Speculative Decoding
triaged
Issue has been triaged by maintainers
#2641
opened Dec 31, 2024 by
aikitoria
2 of 4 tasks
Multi-Modal on TRT-LLM on aarch64 (Holoscan IGX Devkit) fails to covert VILA checkpoints
bug
Something isn't working
#2638
opened Dec 30, 2024 by
MMelQin
2 of 4 tasks
How to make it not display info information?executorExampleBasic.cpp
#2637
opened Dec 28, 2024 by
aaIce
Cpp runner outputs wrong results when using lora + tensor parallelism
bug
Something isn't working
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2634
opened Dec 28, 2024 by
ShuaiShao93
4 tasks
Troubleshoot mistral model
bug
Something isn't working
#2632
opened Dec 26, 2024 by
krishnanpooja
1 of 4 tasks
Qwen2.5-72B-Instruct YaRN BUG
bug
Something isn't working
#2630
opened Dec 26, 2024 by
PaulX1029
1 of 4 tasks
Error with LoRA Weights Data Type in Quantized TensorRT-LLM Model Execution
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2628
opened Dec 25, 2024 by
Alireza3242
Previous Next
ProTip!
Follow long discussions with comments:>50.