Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature: dynamic load balancing #141

Draft
wants to merge 13 commits into
base: develop
Choose a base branch
from
Draft

Conversation

IshaanDesai
Copy link
Member

@IshaanDesai IshaanDesai commented Jan 2, 2025

Major steps in implementing dynamic load balancing:

  • 1. Each rank of the Micro Manager accesses the complete macro mesh. Even though the entire mesh is accessed, only a part of the micro simulations are created.
  • 2. Initially the Micro Manager distributes the total number of micro simulations as evenly as possible amongst all the available ranks.
  • 3. When the load balancing is triggered, an allgather is run to collect on each rank the global number of active simulations.
  • 4. The global number of active simulations are divided by the number of ranks to find out the required number of active simulations per rank to have a balanced load.
  • 5. Just like the allgather on the number of active simulation, another allgather is run to get the global IDs of active simulations, and the information of on which rank these active simulations are. The IDs and the rank location is necessary to determine a communication map to redistribute the load.
  • 6. Using the above information, a communication map is created to decide to which rank is each active simulation sent to. The logic for this is already implemented in the existing GlobalAdaptivityCalculator class.
  • 7. If an active simulation is moved to a different rank, all the inactive simulations associated to it on its current rank are also moved to a different rank.
  • 8. When a simulation is moved to a new rank, the old rank writes zero data as results of a micro simulation that it no longer has. Every rank only writes results for the micro simulations that it is currently hosting.

Checklist:

  • I made sure that the CI passed before I ask for a review.
  • I added a summary of the changes (compared to the last release) in the CHANGELOG.md.
  • If necessary, I made changes to the documentation and/or added new content.
  • I will remember to squash-and-merge, providing a useful summary of the changes of this PR.

@IshaanDesai IshaanDesai added the new-feature Adding a new feature label Jan 2, 2025
@IshaanDesai IshaanDesai self-assigned this Jan 2, 2025
@IshaanDesai IshaanDesai linked an issue Jan 2, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature Adding a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The need for dynamic load balancing in global adaptivity
1 participant