Fix Errors in trainer_sgdmf.py and movielens.py #779

czzhangheng · 2024-08-02T11:01:14Z

I tried to run this example following the docs at https://federatedscope.io/docs/recommendation/ with the command:
python federatedscope/main.py --cfg federatedscope/mf/baseline/hfl-sgdmf_fedavg_standalone_on_movielens1m.yaml
However, it did not run and reported some errors. The error occurred in the file ./federatedscope/mf/trainer/trainer_sgdmf.py. It might be caused by a torch type "Embedding". Specifically, ctx.model.embed_user.grad is incorrect, while ctx.model.embed_user.weight.grad is correct. Additionally, there are some other errors, such as "add(sparse, dense)".

I tried using ChatGPT to fix the code, and now the example can run. I checked my Git history. Here are my fixed records:

In federatedscope/mf/dataset/movielens.py, line 160-161

row = [mapping_user[mid] for _, mid in data["userId"].items()]
col = [mapping_item[mid] for _, mid in data["movieId"].items()]

In federatedscope/mf/trainer/trainer_sgdmf.py line 70, replace all funciton def hook_on_batch_backward(ctx):

def hook_on_batch_backward(ctx):
    """Private local updates in SGDMF

    """
    ctx.optimizer.zero_grad()
    ctx.loss_task.backward()

    if ctx.model.embed_user.weight.grad.is_sparse:
        dense_user_grad = ctx.model.embed_user.weight.grad.to_dense()
    else:
        dense_user_grad = ctx.model.embed_user.weight.grad

    if ctx.model.embed_item.weight.grad.is_sparse:
        dense_item_grad = ctx.model.embed_item.weight.grad.to_dense()
    else:
        dense_item_grad = ctx.model.embed_item.weight.grad

    # Inject noise
    dense_user_grad.data += get_random(
        "Normal",
        sample_shape=ctx.model.embed_user.weight.shape,
        params={
            "loc": 0,
            "scale": ctx.scale
        },
        device=ctx.model.embed_user.weight.device)
    dense_item_grad.data += get_random(
        "Normal",
        sample_shape=ctx.model.embed_item.weight.shape,
        params={
            "loc": 0,
            "scale": ctx.scale
        },
        device=ctx.model.embed_item.weight.device)

    ctx.model.embed_user.weight.grad = dense_user_grad.to_sparse()
    ctx.model.embed_item.weight.grad = dense_item_grad.to_sparse()
    ctx.optimizer.step()

    # Embedding clipping
    with torch.no_grad():
        embedding_clip(ctx.model.embed_user.weight, ctx.sgdmf_R)
        embedding_clip(ctx.model.embed_item.weight, ctx.sgdmf_R)

The code can now run, but I’m not sure if there are any other issues.
I rarely use GitHub. I might need to learn how to pull a request later.

Env.:
python 3.9
torch 1.10.1
cuda 11.3

Thank your work. Have a good day. ：）

The text was updated successfully, but these errors were encountered:

czzhangheng · 2024-08-02T11:16:57Z

Meanwhile, I cannot scan the QR code of Ding-group on the official website. It appears to be out-of-date.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Errors in trainer_sgdmf.py and movielens.py #779

Fix Errors in trainer_sgdmf.py and movielens.py #779

czzhangheng commented Aug 2, 2024

czzhangheng commented Aug 2, 2024

Fix Errors in trainer_sgdmf.py and movielens.py #779

Fix Errors in trainer_sgdmf.py and movielens.py #779

Comments

czzhangheng commented Aug 2, 2024

czzhangheng commented Aug 2, 2024