Add feature: dynamic load balancing #141

IshaanDesai · 2025-01-02T15:34:19Z

Major steps in implementing dynamic load balancing:

1. Each rank of the Micro Manager accesses the complete macro mesh. Even though the entire mesh is accessed, only a part of the micro simulations are created.
2. Initially the Micro Manager distributes the total number of micro simulations as evenly as possible amongst all the available ranks.
3. When the load balancing is triggered, an allgather is run to collect on each rank the global number of active simulations.
4. The global number of active simulations are divided by the number of ranks to find out the required number of active simulations per rank to have a balanced load.
5. Just like the allgather on the number of active simulation, another allgather is run to get the global IDs of active simulations, and the information of on which rank these active simulations are. The IDs and the rank location is necessary to determine a communication map to redistribute the load.
6. Using the above information, a communication map is created to decide to which rank is each active simulation sent to. The logic for this is already implemented in the existing GlobalAdaptivityCalculator class.
7. If an active simulation is moved to a different rank, all the inactive simulations associated to it on its current rank are also moved to a different rank.
8. When a simulation is moved to a new rank, the old rank writes zero data as results of a micro simulation that it no longer has. Every rank only writes results for the micro simulations that it is currently hosting.

Checklist:

I made sure that the CI passed before I ask for a review.
I added a summary of the changes (compared to the last release) in the CHANGELOG.md.
If necessary, I made changes to the documentation and/or added new content.
I will remember to squash-and-merge, providing a useful summary of the changes of this PR.

…a config option

…tive simulations in load balancing

IshaanDesai added 5 commits June 12, 2024 16:15

Add load imbalance data gathering

2bf1733

Remove rank from filename

f1d7442

Merge branch 'develop' into load_imbalance

b9b8a09

Merge branch 'develop' into load_imbalance

1138938

Add new class for global adaptivity with dynamic load balancing with …

8688f1c

…a config option

IshaanDesai added the new-feature Adding a new feature label Jan 2, 2025

IshaanDesai self-assigned this Jan 2, 2025

IshaanDesai linked an issue Jan 2, 2025 that may be closed by this pull request

The need for dynamic load balancing in global adaptivity #129

Open

IshaanDesai added 8 commits January 9, 2025 17:49

Add tools/misc.py and some initial class structure

19ce49e

Making the Micro Manager interface compliant to dynamic load balancing

79ccc91

First part of the implementation of redistribution of active and inac…

f2ec076

…tive simulations in load balancing

Merge branch 'develop' into load_balancing

239b8b3

Add first tests for dynamic load balancing

f678756

Remove incorrect underscore from Config class function call

6df9f3f

Add unit tests for dynamic load balancing

06175d2

Add code removed by mistake from GlobalAdaptivityCalculator

14371b0

Provide feedback