Merge branch 'main' into main

aidecentralized · Nov 6, 2024 · 65fd527 · 65fd527
2 parents 71dd9e8 + 481c367
commit 65fd527
Show file tree

Hide file tree

Showing 26 changed files with 1,016 additions and 149 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,7 +9,7 @@ notes.txt
 removeme*.png
 
 # Byte-compiled / optimized / DLL files
-__pycache__/
+**/__pycache__/
 *.py[cod]
 *$py.class
 
@@ -165,6 +165,10 @@ cython_debug/
 # .DS_Store for OSX
 .DS_Store
 
+# vscode
+**/.vscode/
+**/swap-pane
+
 # PyCharm
 #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore

diff --git a/docs/getting-started/experiments.md b/docs/getting-started/experiments.md
@@ -0,0 +1,58 @@
+# Automating Experiments
+
+In this tutorial, we will discuss how to automate running multiple experiments by customizing our experiment script. Note that we currently only support automation on one machine with the gRPC protocol. If you have not already read the [Getting Started](./getting-started.md) guide, we recommend you do so before proceeding.
+
+## Running the Code
+The `main_exp.py` file automates running experiments on one machine using gRPC. You can run this file with the command:
+``` bash
+python main_exp.py -host randomhost42.mit.edu
+```
+
+## Customizing the Experiments
+To customize your experiment automation, make these changes in `main_exp.py`.
+
+1. Specify your constant settings in `sys_config.py` and `algo_config.py`
+2. Import the sys_config and algo_config setting objects you want to use for your experiments. 
+``` python
+from configs.algo_config import traditional_fl
+from configs.sys_config import grpc_system_config
+```
+
+3. Write the experiment object like the example `exp_dict`, mapping each new experiment ID to the set of keys that you want to change per experiment. Specify the `algo_config` and its specific customizations in `algo`, and likewise for `sys_config` and `sys`. *Note every experiment must have a unique experiment path, and we recommend guarenteeing this by giving every experiment a unique experiment id.*
+``` python
+exp_dict = exp_dict = {
+    "test_automation_1": {
+        "algo_config": traditional_fl,
+        "sys_config": grpc_system_config,
+        "algo": {
+            "rounds": 3,
+        },
+        "sys": {
+            "seed": 3,
+            "num_users": 3,
+        },
+    },
+    "test_automation_2": {
+        "algo_config": traditional_fl,
+        "sys_config": grpc_system_config,
+        "algo": {
+            "rounds": 4,
+        },
+        "sys": {
+            "seed": 4,
+            "num_users": 4,
+        },
+    },
+}
+```
+
+
+4. (Optional) Specify whether or not to run post hoc metrics and plots by setting the boolean at the top of the file.
+``` bash
+post_hoc_plot: bool = True
+```
+
+5. Start the experiments with the command. 
+``` bash
+python main_exp.py -host randomhost42.mit.edu
+```
diff --git a/docs/getting-started/grpc.md b/docs/getting-started/grpc.md
@@ -8,16 +8,16 @@ In this tutorial, we will discuss how to use gRPC for training models across mul
 The main advantage of our abstract communication layer is that the same code runs regardless of the fact you are using MPI or gRPC underneath. As long as the communication layer is implemented correctly, the rest of the code remains the same. This is a huge advantage for the framework as it allows us to switch between different communication layers without changing the code.
 
 ## Running the code
-Let's say you want to run the decentralized training with 80 users on 4 machines. Our implementation currently requires a coordinating node to manage the orchestration. Therefore, there will be 81 nodes in total. Make sure `sys_config.py` has `num_users: 80` in the config. You should run the following command on all 4 machines:
+Let's say you want to run the decentralized training with 80 users on 4 machines. Our implementation currently requires a coordinating node to manage the orchestration. Therefore, there will be 81 nodes in total. In the `sys_config.py`, specify the hostname and port you want to run the coordinator node (i.e. `"comm": { "type": "GRPC", "peer_ids": ["randomhost41.mit.edu:5003"] # the coordinator port will be specified here }`), and set `num_users: 80`. 
 
+On the machine that you want to run the coordinator node on, start the coordinator by running the following command:
 ``` bash
-python main_grpc.py -n 20 -host randomhost42.mit.edu
+python main.py -super true
 ```
 
-On **one** of the machines that you want to use as a coordinator node (let's say it is `randomhost43.mit.edu`), change the `peer_ids` with the hostname and the port you want to run the coordinator node and then run the following command:
-
+Then, start the user threads by running the following command on all 4 machines (change the name of the host per machine you are using, and note that you may need to open a new terminal if you are using the same machine as the supernode):
 ``` bash
-python main.py -super true
+python main_grpc.py -n 20 -host randomhost42.mit.edu
 ```
 
 > **_NOTE:_**  Most of the algorithms right now do not use the new communication protocol, hence you can only use the old MPI version with them. We are working on updating the algorithms to use the new communication protocol.

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -13,6 +13,7 @@ nav:
       - Config File: getting-started/config.md
       - Customizability: getting-started/customize.md
       - Using GRPC: getting-started/grpc.md
+      - Automating Experiments: getting-started/experiments.md
   - CollaBench:
     - Main: collabench.md
     - Feature Comparison: feature.md