Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Profiling: Hot Spots and Load Balance #3622

Merged
merged 2 commits into from
Dec 13, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions Docs/sphinx_documentation/source/AMReX_Profiling_Tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,47 @@ it is also recommended to wrap any ``BL_PROFILE_TINY_FLUSH();`` calls in
informative ``amrex::Print()`` lines to ensure accurate identification of each
set of timers.

Hot Spots and Load Balance
~~~~~~~~~~~~~~~~~~~~~~~~~~

The output of TinyProfiler can help us to identify hot spots. For example,
the following output shows the top three hot spots of a linear solver test
running on 4 MPI processes.

.. highlight:: console

::

--------------------------------------------------------------------------------------------
Name NCalls Excl. Min Excl. Avg Excl. Max Max %
--------------------------------------------------------------------------------------------
MLPoisson::Fsmooth() 560 0.4775 0.4793 0.4815 34.97%
MLPoisson::Fapply() 114 0.1103 0.113 0.1167 8.48%
FabArray::Xpay() 109 0.1 0.1013 0.1038 7.54%

In this test, there are 16 boxes even distributed among 4 MPI processes. The
atmyers marked this conversation as resolved.
Show resolved Hide resolved
output above shows that the load is perfectly balanced. However, if the load
is not balanced, the results can be very different and sometimes
misleading. For example, if we put 2, 2, 6 and 6 boxes on processes 0, 1, 2
and 3, respectively, the top three hot spots now include two MPI
communication functions, ``FillBoundary`` and ``ParallelCopy``.

.. highlight:: console

::

--------------------------------------------------------------------------------------------
Name NCalls Excl. Min Excl. Avg Excl. Max Max %
--------------------------------------------------------------------------------------------
FillBoundary_finish() 607 0.01568 0.3367 0.6574 41.97%
MLPoisson::Fsmooth() 560 0.2133 0.4047 0.5973 38.13%
FabArray::ParallelCopy_finish() 231 0.002977 0.09748 0.1895 12.10%

The reason that the MPI communication appears slow is that the lightly
loaded processes have to wait for messages sent by the heavily loaded
processes. See also :ref:`sec:profopts` for a diagnostic option that may
provide more insight on the load imbalance.

.. _sec:full:profiling:

Full Profiling
Expand Down
Loading