You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I quantized a gemma model with AWQ. Now I want to use LoRA at runtime. However, when I send the LoRA weights and ask it to compute, I receive the following error:
[TensorRT-LLM][ERROR] Encountered an error when fetching new request: [TensorRT-LLM][ERROR] Assertion failed: Expected lora weights to be the same data type as base model (/workspace/tensorrt_llm/cpp/tensorrt_llm/runtime/loraUtils.cpp:66)
1 0x7f6918b7bc64 tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2 0x7f6918b8b005 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x78a005) [0x7f6918b8b005]
3 0x7f691afad798 tensorrt_llm::batch_manager::PeftCacheManager::addRequestPeft(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, bool) + 184
4 0x7f691afd0242 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::updatePeftCache(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> const&) + 82
5 0x7f691b0256bf tensorrt_llm::executor::Executor::Impl::fetchNewRequests[abi:cxx11](int, std::optional<float>) + 2543
6 0x7f691b027698 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1176
7 0x7f69e86b0253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f69e86b0253]
8 0x7f69e843fac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f69e843fac3]
9 0x7f69e84d0a04 clone + 68
Steps Taken:
1- Using tensorrt-0.15
2- In the file: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/top_model_mixin.py I added the following:
I quantized a gemma model with AWQ. Now I want to use LoRA at runtime. However, when I send the LoRA weights and ask it to compute, I receive the following error:
Steps Taken:
1- Using tensorrt-0.15
2- In the file: /usr/local/lib/python3.10/dist-packages/tensorrt_llm/top_model_mixin.py I added the following:
And in the TopModelMixin class, I wrote:
3- Running:
4- Running:
5- Running:
6- Starting Triton server:
7- Running:
The text was updated successfully, but these errors were encountered: