How to perform inference MoE model with expert parallel #6891

Guodanding · 2024-12-18T13:13:52Z

Hello, I want to perform inference on the HuggingFace MoE model Qwen1.5-MoE-A2.7B with expert parallelism using DeepSpeed in a multi-GPU environment. However, the official tutorials are not comprehensive enough, and despite reviewing the documentation, I still don't know how to proceed.

Could you please help me refine this request?

Guodanding · 2024-12-20T02:03:40Z

hello

delock · 2025-01-03T02:54:01Z

I have same quesn:
I came through this link,
https://www.deepspeed.ai/tutorials/mixture-of-experts-inference/?utm_source=chatgpt.com#initializing-for-inference which have this code snipet. However it is not clear where does get_model comes from.

import deepspeed
import torch.distributed as dist

# Set expert-parallel size
world_size = dist.get_world_size()
expert_parallel_size = min(world_size, args.num_experts)

# create the MoE model
moe_model = get_model(model, ep_size=expert_parallel_size)
...

# Initialize the DeepSpeed-Inference engine
ds_engine = deepspeed.init_inference(moe_model,
                                     mp_size=tensor_slicing_size,
                                     dtype=torch.half,
                                     moe_experts=args.num_experts,
                                     checkpoint=args.checkpoint_path,
                                     replace_with_kernel_inject=True,)
model = ds_engine.module
output = model('Input String')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to perform inference MoE model with expert parallel #6891

How to perform inference MoE model with expert parallel #6891

Guodanding commented Dec 18, 2024 •

edited

Loading

Guodanding commented Dec 20, 2024

delock commented Jan 3, 2025

How to perform inference MoE model with expert parallel #6891

How to perform inference MoE model with expert parallel #6891

Comments

Guodanding commented Dec 18, 2024 • edited Loading

Guodanding commented Dec 20, 2024

delock commented Jan 3, 2025

Guodanding commented Dec 18, 2024 •

edited

Loading