Issue with federate.method set to global #771

krishnakanthnakkav2 · 2024-04-29T14:45:42Z

Hello,

I have launched the experiment with command

python federatedscope/main.py --cfg federatedscope/cv/baseline/fedavg_convnet2_on_cifar10.yaml federate.client_num 1 federate.sample_client_rate 1.0 federate.method global

However, it looks the model is not updated during the evaluation time. The test accuracy stays at 11% for all the rounds while the training accuracy improves.

2024-04-29 16:42:48,817 (client:357) INFO: {'Role': 'Client #1', 'Round': 0, 'Results_raw': {'train_avg_loss': 1.446278, 'train_total': 50000, 'train_acc': 0.48616, 'train_correct': 24308.0, 'train_loss': 72313.922022}}
2024-04-29 16:42:48,820 (server:344) INFO: Server: Starting evaluation at the end of round 0.
2024-04-29 16:42:50,443 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:42:50,445 (server:960) INFO: {'Role': 'Server #', 'Round': 1, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:42:50,445 (server:350) INFO: ----------- Starting a new training round (Round #1) -------------
2024-04-29 16:42:59,204 (client:357) INFO: {'Role': 'Client #1', 'Round': 1, 'Results_raw': {'train_avg_loss': 1.096129, 'train_total': 50000, 'train_acc': 0.6146, 'train_correct': 30730.0, 'train_loss': 54806.453947}}
2024-04-29 16:42:59,206 (server:344) INFO: Server: Starting evaluation at the end of round 1.
2024-04-29 16:43:00,697 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:00,697 (server:960) INFO: {'Role': 'Server #', 'Round': 2, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:00,697 (server:350) INFO: ----------- Starting a new training round (Round #2) -------------
2024-04-29 16:43:09,473 (client:357) INFO: {'Role': 'Client #1', 'Round': 2, 'Results_raw': {'train_avg_loss': 0.959477, 'train_total': 50000, 'train_acc': 0.66432, 'train_correct': 33216.0, 'train_loss': 47973.853004}}
2024-04-29 16:43:09,474 (server:344) INFO: Server: Starting evaluation at the end of round 2.
2024-04-29 16:43:11,000 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:11,000 (server:960) INFO: {'Role': 'Server #', 'Round': 3, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:11,000 (server:350) INFO: ----------- Starting a new training round (Round #3) -------------
2024-04-29 16:43:19,756 (client:357) INFO: {'Role': 'Client #1', 'Round': 3, 'Results_raw': {'train_avg_loss': 0.867314, 'train_total': 50000, 'train_acc': 0.6992, 'train_correct': 34960.0, 'train_loss': 43365.681585}}
2024-04-29 16:43:19,757 (server:344) INFO: Server: Starting evaluation at the end of round 3.
2024-04-29 16:43:21,245 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:21,245 (server:960) INFO: {'Role': 'Server #', 'Round': 4, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:21,246 (server:350) INFO: ----------- Starting a new training round (Round #4) -------------
2024-04-29 16:43:29,954 (client:357) INFO: {'Role': 'Client #1', 'Round': 4, 'Results_raw': {'train_avg_loss': 0.794839, 'train_total': 50000, 'train_acc': 0.72466, 'train_correct': 36233.0, 'train_loss': 39741.947819}}
2024-04-29 16:43:29,956 (server:344) INFO: Server: Starting evaluation at the end of round 4.
2024-04-29 16:43:31,466 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:31,466 (server:960) INFO: {'Role': 'Server #', 'Round': 5, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:31,467 (server:350) INFO: ----------- Starting a new training round (Round #5) -------------
2024-04-29 16:43:40,151 (client:357) INFO: {'Role': 'Client #1', 'Round': 5, 'Results_raw': {'train_avg_loss': 0.730278, 'train_total': 50000, 'train_acc': 0.74824, 'train_correct': 37412.0, 'train_loss': 36513.923683}}
2024-04-29 16:43:40,153 (server:344) INFO: Server: Starting evaluation at the end of round 5.
2024-04-29 16:43:41,618 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:41,619 (server:960) INFO: {'Role': 'Server #', 'Round': 6, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}
2024-04-29 16:43:41,619 (server:350) INFO: ----------- Starting a new training round (Round #6) -------------
2024-04-29 16:43:50,265 (client:357) INFO: {'Role': 'Client #1', 'Round': 6, 'Results_raw': {'train_avg_loss': 0.671751, 'train_total': 50000, 'train_acc': 0.77108, 'train_correct': 38554.0, 'train_loss': 33587.532097}}
2024-04-29 16:43:50,266 (server:344) INFO: Server: Starting evaluation at the end of round 6.
2024-04-29 16:43:51,771 (context:296) WARNING: No val_data or val_loader in the trainer, will skip evaluation.If this is not the case you want, please check whether there is typo for the name
2024-04-29 16:43:51,771 (server:960) INFO: {'Role': 'Server #', 'Round': 7, 'Results_raw': {'test_avg_loss': 2.301025, 'test_total': 10000, 'test_acc': 0.11, 'test_correct': 1100.0, 'test_loss': 23010.249565}}

The text was updated successfully, but these errors were encountered:

krishnakanthnakkav2 · 2024-04-29T15:22:54Z

I think I have found one reason for this behaviour.

If the federate.method is set to global, there is no model_para broadcast (see the workers/server,py file) to the single client (worker idx is 1 I think) where the local training happens. Moreover, since the merge_test_data is set to True and make_global_eval is also set to True, the evaluation happens on the server (worker idx 0) which has never received the updated model.

I think if the method is set to global, possibly we should not activate the merge_test_data or make_global_eval. Please correct me. The same reasoning applies in the case when federate.method is set to local since there is also no broadcast in this setting as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with federate.method set to global #771

Issue with federate.method set to global #771

krishnakanthnakkav2 commented Apr 29, 2024

krishnakanthnakkav2 commented Apr 29, 2024 •

edited

Loading

Issue with federate.method set to global #771

Issue with federate.method set to global #771

Comments

krishnakanthnakkav2 commented Apr 29, 2024

krishnakanthnakkav2 commented Apr 29, 2024 • edited Loading

krishnakanthnakkav2 commented Apr 29, 2024 •

edited

Loading