Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"model is only supported on GPU" since v1.8.0 #468

Open
alexkramer98 opened this issue Dec 12, 2024 · 1 comment
Open

"model is only supported on GPU" since v1.8.0 #468

alexkramer98 opened this issue Dec 12, 2024 · 1 comment

Comments

@alexkramer98
Copy link

alexkramer98 commented Dec 12, 2024

Since v1.8.0, models are not loading and I am seeing this in the logs:

-- 402 -- /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
-- 402 --   return torch._C._cuda_getDeviceCount() > 0
-- 402 -- /usr/local/lib/python3.10/dist-packages/auto_gptq/nn_modules/triton_utils/kernels.py:411: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
-- 402 --   def forward(ctx, input, qweight, scales, qzeros, g_idx, bits, maxq):
-- 402 -- /usr/local/lib/python3.10/dist-packages/auto_gptq/nn_modules/triton_utils/kernels.py:419: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
-- 402 --   def backward(ctx, grad_output):
-- 402 -- /usr/local/lib/python3.10/dist-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
-- 402 --   @custom_fwd(cast_inputs=torch.float16)
-- 402 -- 20241212 09:38:10 MODEL STATUS loading model
-- 402 -- Traceback (most recent call last):
-- 402 --   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
-- 402 --     return _run_code(code, main_globals, None,
-- 402 --   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
-- 402 --     exec(code, run_globals)
-- 402 --   File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/inference/inference_worker.py", line 154, in <module>
-- 402 --     worker_loop(args.model, models_mini_db, supported_models.config, compile=args.compile)
-- 402 --   File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/inference/inference_worker.py", line 51, in worker_loop
-- 402 --     inference_model = InferenceHF(
-- 402 --   File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/inference/inference_hf.py", line 148, in __init__
-- 402 --     assert torch.cuda.is_available(), "model is only supported on GPU"
-- 402 -- AssertionError: model is only supported on GPU
-- 294 -- 20241212 09:38:10 WEBUI 172.17.0.1:53830 - "GET /tab-host-have-gpus HTTP/1.1" 200
-- 294 -- 20241212 09:38:10 WEBUI 172.17.0.1:53846 - "GET /tab-finetune-config-and-runs HTTP/1.1" 200
20241212 09:38:11 402 finished python -m self_hosting_machinery.inference.inference_worker --model qwen2.5/coder/1.5b/base @:gpu00, retcode 1
/finished compiling -- failed, probably unrecoverable, will not retry

I have an RTX4060 (Laptop), if that matters at all. v1.7.0 worked fine.

@mitya52
Copy link
Member

mitya52 commented Dec 23, 2024

@alexkramer98 hi! Looks like you have problem with cuda. Try to upgrade your driver to 525.147.05 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants