Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
kathrynle20 authored Nov 6, 2024
2 parents 71dd9e8 + 481c367 commit 65fd527
Show file tree
Hide file tree
Showing 26 changed files with 1,016 additions and 149 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ notes.txt
removeme*.png

# Byte-compiled / optimized / DLL files
__pycache__/
**/__pycache__/
*.py[cod]
*$py.class

Expand Down Expand Up @@ -165,6 +165,10 @@ cython_debug/
# .DS_Store for OSX
.DS_Store

# vscode
**/.vscode/
**/swap-pane

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
Expand Down
58 changes: 58 additions & 0 deletions docs/getting-started/experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Automating Experiments

In this tutorial, we will discuss how to automate running multiple experiments by customizing our experiment script. Note that we currently only support automation on one machine with the gRPC protocol. If you have not already read the [Getting Started](./getting-started.md) guide, we recommend you do so before proceeding.

## Running the Code
The `main_exp.py` file automates running experiments on one machine using gRPC. You can run this file with the command:
``` bash
python main_exp.py -host randomhost42.mit.edu
```

## Customizing the Experiments
To customize your experiment automation, make these changes in `main_exp.py`.

1. Specify your constant settings in `sys_config.py` and `algo_config.py`
2. Import the sys_config and algo_config setting objects you want to use for your experiments.
``` python
from configs.algo_config import traditional_fl
from configs.sys_config import grpc_system_config
```

3. Write the experiment object like the example `exp_dict`, mapping each new experiment ID to the set of keys that you want to change per experiment. Specify the `algo_config` and its specific customizations in `algo`, and likewise for `sys_config` and `sys`. *Note every experiment must have a unique experiment path, and we recommend guarenteeing this by giving every experiment a unique experiment id.*
``` python
exp_dict = exp_dict = {
"test_automation_1": {
"algo_config": traditional_fl,
"sys_config": grpc_system_config,
"algo": {
"rounds": 3,
},
"sys": {
"seed": 3,
"num_users": 3,
},
},
"test_automation_2": {
"algo_config": traditional_fl,
"sys_config": grpc_system_config,
"algo": {
"rounds": 4,
},
"sys": {
"seed": 4,
"num_users": 4,
},
},
}
```


4. (Optional) Specify whether or not to run post hoc metrics and plots by setting the boolean at the top of the file.
``` bash
post_hoc_plot: bool = True
```

5. Start the experiments with the command.
``` bash
python main_exp.py -host randomhost42.mit.edu
```
10 changes: 5 additions & 5 deletions docs/getting-started/grpc.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ In this tutorial, we will discuss how to use gRPC for training models across mul
The main advantage of our abstract communication layer is that the same code runs regardless of the fact you are using MPI or gRPC underneath. As long as the communication layer is implemented correctly, the rest of the code remains the same. This is a huge advantage for the framework as it allows us to switch between different communication layers without changing the code.

## Running the code
Let's say you want to run the decentralized training with 80 users on 4 machines. Our implementation currently requires a coordinating node to manage the orchestration. Therefore, there will be 81 nodes in total. Make sure `sys_config.py` has `num_users: 80` in the config. You should run the following command on all 4 machines:
Let's say you want to run the decentralized training with 80 users on 4 machines. Our implementation currently requires a coordinating node to manage the orchestration. Therefore, there will be 81 nodes in total. In the `sys_config.py`, specify the hostname and port you want to run the coordinator node (i.e. `"comm": { "type": "GRPC", "peer_ids": ["randomhost41.mit.edu:5003"] # the coordinator port will be specified here }`), and set `num_users: 80`.

On the machine that you want to run the coordinator node on, start the coordinator by running the following command:
``` bash
python main_grpc.py -n 20 -host randomhost42.mit.edu
python main.py -super true
```

On **one** of the machines that you want to use as a coordinator node (let's say it is `randomhost43.mit.edu`), change the `peer_ids` with the hostname and the port you want to run the coordinator node and then run the following command:

Then, start the user threads by running the following command on all 4 machines (change the name of the host per machine you are using, and note that you may need to open a new terminal if you are using the same machine as the supernode):
``` bash
python main.py -super true
python main_grpc.py -n 20 -host randomhost42.mit.edu
```

> **_NOTE:_** Most of the algorithms right now do not use the new communication protocol, hence you can only use the old MPI version with them. We are working on updating the algorithms to use the new communication protocol.
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ nav:
- Config File: getting-started/config.md
- Customizability: getting-started/customize.md
- Using GRPC: getting-started/grpc.md
- Automating Experiments: getting-started/experiments.md
- CollaBench:
- Main: collabench.md
- Feature Comparison: feature.md
Expand Down
Loading

0 comments on commit 65fd527

Please sign in to comment.