Does DeepSpeed Inference support offload params to cpu #3760

TigerYang414 · 2023-06-16T06:32:01Z

TigerYang414
Jun 16, 2023

I just want to run llama 30b inference on A10, for which params exceed gpu memory capacity.
It seems InferenceEngine does not process zero related arguments.

Then I try to use deepspeed.initialize with zero offload, it will always trigger cuda oom.

devkaranjoshi · 2023-06-17T06:19:21Z

devkaranjoshi
Jun 17, 2023

Exactly same error i am getting whenever i am trying to take inference from vicuna33b.

0 replies

TigerYang414 · 2023-06-20T07:27:32Z

TigerYang414
Jun 20, 2023
Author

It seems DS Inference only support tensor parallel. I just try use training mode with zero offload. But it runs very slow.
Does anyone know how I can check whether it works as expected? Like prefetch param during other layer calculation?

Here is my ds config used:
ds_config = {
"max_tokens": args.max_tokens,
"fp16": {
"enabled": True
},
"zero_optimization": {
"stage": 3,
"overlap_comm": True,
"contiguous_gradients": True,
"stage3_param_persistence_threshold": 1 << 30, # disable tensor partition
"stage3_max_live_parameters": 1 << 30,
"offload_param":{"device":"cpu","pin_memory":True}
},
"steps_per_print": 2000,
"train_batch_size": 2,
}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does DeepSpeed Inference support offload params to cpu #3760

{{title}}

Replies: 3 comments

{{title}}

{{title}}

Select a reply

Does DeepSpeed Inference support offload params to cpu #3760

TigerYang414 Jun 16, 2023

Replies: 3 comments

devkaranjoshi Jun 17, 2023

TigerYang414 Jun 20, 2023 Author

TigerYang414
Jun 16, 2023

devkaranjoshi
Jun 17, 2023

TigerYang414
Jun 20, 2023
Author