Skip to content
This repository has been archived by the owner on May 11, 2021. It is now read-only.

Using MPI and OpenMP

Kengo TOMIDA edited this page Feb 27, 2016 · 33 revisions

Domain Decomposition: MeshBlock

For parallel simulations with MPI, the computing domain is decomposed into small units. In Athena++, this decomposition unit is called MeshBlock, and all the MeshBlocks have the same logical size (i.e., the number of cells). These MeshBlocks are stored on a tree structure, and have unique integer IDs numbered by Z-ordering.

The MeshBlock size is specified by <meshblock> parameters in an input file. The following example is decomposing a Mesh with 256^3 into MeshBlocks with 64^3 cells, resulting in 64 MeshBlocks. Obviously, the size of Mesh must be divisible by MeshBlocks.

<mesh>
nx1     =    256
...
nx2     =    256
...
nx3     =    256
...
<meshblock>
nx1     =    64
nx2     =    64
nx3     =    64

The data output for non-parallelized formats (e.g. VTK), one file is generated per MeshBlock regardless of the actual number of processes. We recommend the HDF5 output because it combines all the MeshBlocks and outputs only two files per output timestep. For detailes, see Outputs.

MPI Parallelization

OpenMP Parallelization

OpenMP is a standard shared-memory parallelization within a node. OpenMP parallelize calculations within each MeshBlock. To enable this, configure the code with -omp option and set num_threads in the <mesh> block in your input file. Also, you probably need to set environment parameter to specify the number of threads. Generally this is OMP_NUM_THREADS, but please check the documentat of your system.

OpenMP parallelization is not very scalable. Usually you will get the best performance with 2 or 4 threads per process. Because these threads can share some data, especially the MeshBlock tree, it saves some memory. When you are running gigantic parallel simulations, this will be helpful.

Clone this wiki locally