You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For now, I'm using a one node configuration . The detail is listed above:
2xx unit of CPU( I think it is irrelevant:))
800G Memory
8 * A100
I run deepspeed inference with two commands(please ignore -t -p arguments): deepspeed --num_gpus 4 test_deepspeed.py -t 256 -p test1.txt deepspeed --num_gpus 8 test_deepspeed.py -t 256 -p test1.txt
in test_deepspeed.py, which code is like:
model = AutoModelForCausalLM.from_pretrained(175b_path, torch_dtype=torch.float16)
model = deepspeed.init_inference(
model=model, # Transformers模型
tensor_parallel={'enabled': True, 'tp_size': ${num_gpus}}, # 模型并行数量
dtype=torch.float16, # 权重类型(fp16),
replace_method="auto",
replace_with_kernel_inject="True"
)
return model
And then deepspeed have tried to run my script with 4 or 8 processes(based on num_gpus), and eventually stopped abnormally. which reason is due to lack of memory. (for single process, it is consumed more than 200GB).
So I want to know, which configuration of machine should I use? And for this script, which configuration of inference should I use?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
For now, I'm using a one node configuration . The detail is listed above:
I run deepspeed inference with two commands(please ignore -t -p arguments):
deepspeed --num_gpus 4 test_deepspeed.py -t 256 -p test1.txt
deepspeed --num_gpus 8 test_deepspeed.py -t 256 -p test1.txt
in test_deepspeed.py, which code is like:
And then deepspeed have tried to run my script with 4 or 8 processes(based on num_gpus), and eventually stopped abnormally. which reason is due to lack of memory. (for single process, it is consumed more than 200GB).
So I want to know, which configuration of machine should I use? And for this script, which configuration of inference should I use?
(Sorry of my poor written English)
Beta Was this translation helpful? Give feedback.
All reactions