ESMF 8.3.0
Overview
The 8.3.0 release of ESMF implements a number of incremental improvements and bug fixes across the library. Highlights of the 8.3.0 release are outlined in the following paragraphs. A detailed list of release notes is provided further down.
On the code management side, ESMF has aligned its tagging scheme with the standard convention used by many other packages on GitHub. Standard tags now start with the lowercase letter “v”, followed by the version triplet. For example, the tag for release 8.3.0 on the ESMF GitHub repository is “v8.3.0”. Beta snapshots leading to a future release have the same root, followed by the lowercase letter “b” and a two digit snapshot number. E.g. v8.3.0b17 was the last 8.3.0 beta snapshot tag before the official release tag.
ESMF uses a library called ParallelIO (PIO) for its internal I/O operations, such as reading in mesh files and writing out fields. During this release, the version of PIO used internally was upgraded from a very outdated 1.x version to version 2.5. A new option was also added to the ESMF build system to allow linking to an external build of the PIO library.
Built on top of the PIO upgrade, the ESMF_MeshCreate() method that reads a mesh from file was re-implemented. It now reads the mesh coordinate information in a fully distributed way. This reduces the memory footprint dramatically, allowing the creation of much larger meshes from file than before.
Further progress was made toward the full adoption of MOAB as the internal mesh representation in ESMF. The internal MOAB library, included with ESMF, was updated to version 5.3, and combinatorial testing was added to the ESMF testing framework to ensure consistency and backward compatibility between the native mesh implementation and the MOAB-based implementation. Several consistency issues were resolved as a result of the new testing. By default, ESMF 8.3.0 still uses the native mesh implementation internally. As in previous releases, users can enable the MOAB-based implementation at run-time by calling ESMF_MeshSetMOAB().
Support for dynamically changing grid coordinates (e.g. storm following grids) was added to ESMF. The ESMF_GridCreate() method that creates a new Grid from an existing Grid with new DistGrid was extended to optionally return a RouteHandle object. The RouteHandle allows subsequent calls into the new ESMF_GridRedist() method to efficiently redistribute the coordinate values from the original source grid to the new destination grid. NUOPC_Connector support for handling changing grid coordinates is not available in this release but will be added in a future release.
Two issues were encountered and addressed in the ESMF_XGrid implementation. First, element areas can now be set in a Field built on an XGrid by using the ESMF_FieldRegridGetArea() method. Previously, this capability was only supported for a Field built on a Grid or Mesh. Second, the algorithm used to generate interpolation weights was improved to guarantee that exchange grid cells are exactly overlapping exactly one cell on each side. Prior to this change, small numerical errors prevented this property from holding and resulted in small remapping errors.
Release Notes
- This release is backward compatible with the last release ESMF 8.2.0, for all the interfaces that are marked as backward compatible in the Reference Manual. There were API changes to a few unmarked methods that may require minor modifications to user code that uses these methods. The entire list of API changes is summarized in a table showing interface changes since ESMF 8.2.0, including the rationale and impact for each change.
- No bit-for-bit changes were observed for this release compared to release ESMF v8.2.0 with Intel compilers using “-O2 -fp-model precise”.
- Tables summarizing the ESMF regridding status have been updated. These include supported grids and capabilities of the offline and integrated regridding.
- A new section was added to the NUOPC Reference Manual describing the use of NUOPC_AddNestedState() for the coupling of multiple nests or multiple data sets between components.
- The option to profile the execution time of each individual iteration through a NUOPC run sequence has been implemented in the Driver Component Metadata Profiling attribute. Setting the appropriate profiling bit results in a profile where the timing for each individual run sequence iteration is reported in the timing profile under a unique label. This information can be helpful for cases where the cost per iteration changes throughout the execution.
- An issue in the ESMF_StateReconcile() method used by ESMF and NUOPC to generate a consistent object view across multiple components was fixed. The optimization implemented in v8.1.0 introduced the unintended behavior of switching out geom objects (Grid, Mesh, etc.) for Fields contained in States that are used in multiple ESMF_StateReconcile() operations. The incorrect association of geom objects with Fields resulted in unexpected results during subsequent operations using those Fields, such as creating a RouteHandle for regridding.
- Progress was made in full adoption of MOAB as the internal mesh representation in ESMF. This includes updating the internal MOAB library included with ESMF to version 5.3.1 and the addition of combinatorial testing designed to ensure consistency and backward compatibility between the native mesh implementation and the MOAB-based implementation. Several consistency issues were resolved as a result of the new testing.
- The previous version of ESMF_MeshCreate() from a file used to read all the node coordinate information on every processor. For large mesh files this global read can lead to high memory consumption and prevent reading in certain large meshes entirely. To reduce the memory footprint, a fully distributed read of mesh coordinate information was implemented. This change allows the creation of much larger meshes from file.
- The
nodeOwners
argument for method ESMF_MeshCreate() and ESMF_MeshAddNodes() was made optional. This allows the user to defer specification of node ownership to ESMF in cases where a specific ownership assignment is not needed to match the application data distribution. When this argument is absent, ESMF generates a consistent assignment of node owners. - The ESMF_GridCreate() method that creates a new Grid from an existing Grid with new DistGrid was extended to optionally return a RouteHandle object. The RouteHandle allows subsequent calls into the new ESMF_GridRedist() method to redistribute the coordinate values from the original source grid to the new destination grid. This feature supports efficient handling of dynamically changing grids between components.
- The implementation of the exchange grid (ESMF_XGrid) class that supports efficient conservative regridding between multiple grids on source and destination sides has been improved:
- Element areas can now be set in a Field built on an XGrid by using the ESMF_FieldRegridGetArea() method. Previously, this capability was only supported for a Field built on a Grid or Mesh.
- The algorithm used to generate interpolation weights was improved to guarantee that exchange grid cells are exactly overlapping exactly one cell on each side. Prior to this change, small numerical errors prevented this property from holding.
- Added the optional --
checkFlag
argument to ESMF_RegridWeightGen application. This flag allows the user to turn on more expensive error checking that may not be appropriate for an operational run. Initially this flag turns on a check for grid self-intersection during conservative regridding. - The VM Epoch implementation now provides an option to reduce the memory pressure on the sending side PETs. By default, internal send buffers, once allocated, are kept until the VM is destroyed. This can lead to high memory pressure for cases where the same sending PETs participate in communication with multiple sets of receiving PETs. Setting
keepAlloc=.false.
when calling ESMF_VMEpochEnter(), instructs ESMF to immediately deallocate internal send buffers once the data has been transferred. This is in analogy to the handling of internal receive buffers withkeepAlloc=.false.
when calling ESMF_VMEpochExit(). The default remains .false. for both sides for efficiency. - Two internal fixed size buffers that caused issues when precomputing RouteHandles (e.g. via RegridStore()) for high-resolution, high PET count cases (~10,000 and above) were modified. The size of one of the buffers was doubled, while the other fixed size limitation was removed. The symptom of the first buffer size issue (now increased in size) was an error trace in the ESMF Log starting with
"ESMCI_DELayout.C:9616 ESMCI::XXE::storeBufferInfo() Internal error: Bad condition - bufferInfoList overflow!!!".
The second buffer size issue (now eliminated) was an error trace starting with"ESMCI_DELayout.C:8416 ESMCI::XXE::execReady() Internal error: Bad condition - sendnbCount out of range"
. - ESMF uses a library called ParallelIO (PIO) for its internal I/O operations, such as reading in mesh files and writing out fields. During this release, the version of PIO used internally was upgraded from a very outdated 1.x version to version 2.5. As a result, the binary output option ESMF_IOFMT_BIN is no longer supported and has been removed. A new option was also added to the ESMF build system to allow linking to an external build of the PIO library, as long as the external build is at least version 2.5.8. The upgrade eliminates the need to pass the compiler flags "-fallow-argument-mismatch -fallow-invalid-boz" when building with GNU 10.x or newer compilers. If the internal build of PIO is used, CMake version 2.8.12 or newer must be available in the system path. See the User's Guide for information about the environment variables used to configure PIO build and linking options.
- An extra column was added to the ESMF profiler summary output, reporting the number of PEs (CPU cores) associated with the executing PETs. This information is helpful for example when profiling components that run with ESMF-managed threading. In the single-threaded case, each PET is associated with a single PE, and the number of PEs equals that of PETs. However, for the multi-threaded case, where N threads (e.g. OpenMP) are spawned under each PET, the number of PEs will be N times the number of PETs.
- The ESMF_COMM build setting for MPICH has been reworked to better align with the current state of the MPICH project, and other ESMF_COMM settings. ESMF_COMM=mpich now covers the current MPICH versions 3 and 4. ESMF_COMM=mpich3 is still supported for backward compatibility. The old MPICH2 continues to be supported via ESMF_COMM=mpich2.
- A problem with ESMF library installation linking for dylibs under Darwin was fixed. Previously the installed ESMF library remained dependent on files under the src directory of the ESMF build tree.
- The FindESMF.cmake file included with ESMF, which is provided as a convenience to users that use CMake in their projects, has been updated. The module now searches ESMF_ROOT if ESMFMKFILE is not provided by the environment. Option USE_ESMF_STATIC_LIBS has been added to use the static ESMF library when building executables. This module requires CMake v3.12 and above.
Known Issues
- Attempting to write weight files from the ESMPy Regrid object when using filemode=FileMode.WITHAUX currently crashes.
Platform-specific issues:
- The GNU and Intel compilers require GCC>=4.8 for C++11 support (Intel uses the GCC headers). By default, ESMF uses the C++11 standard and cannot be downgraded. If you run into build issues due to the C++11 dependency, you must make sure a GCC>=4.8 is loaded.
- On Darwin, with the GNU gfortran+gcc combination, when building MPICH3 from source, it is important to specify the "--enable-two-level-namespace" configure option. By default, i.e. without this option, on Darwin, the produced MPICH compiler wrappers include a linker flag (-flat_namespace) that causes issues with C++ exception handling. Building and linking ESMF applications with MPICH compiler wrappers that specify this linker option leads to “mysterious” application aborts during execution.
- On Darwin, with the Intel Fortran compiler, command line arguments cannot be accessed from ESMF applications when linked against the shared library version of libesmf. There is no issue when linked against the static libesmf.a version. Setting the environment variable ESMF_SHARED_LIB_BUILD=OFF, during the ESMF build, can be used as a work around for this issue.
- There is an issue with intercepting the MPI calls for profiling on some of the supported platforms. This results in a single FAIL reported for ESMF_TraceMPIUTest.F90. The affected platforms are:
- Catania: Darwin+GNU+MPICH3
- Gaea: Unicos+GNU+cray-mpich
- There is an issue with loading the libesmftrace_preload.so library on some of the supported platforms. This results in a reported CRASH for ESMF_TraceIOUTest.F90 and ESMF_TraceMPIUTest.F90. The affected platforms are:
- Cori: Unicos+Intel+cray-mpich
- Cori: Unicos+Intel+mpiuni
- Discover: Linux+GNU+intelmpi
- Gaea: Unicos+Intel+cray-mpich
- Gaea: Unicos+Intel+mpiuni
- Hera: Linux+GNU+intelmpi
- Orion: Linux+GNU+mpiuni
Documentation
- ESMF Reference Manual for Fortran
- ESMF Reference Manual for C
- ESMF User Guide
- NUOPC Layer Reference
- Building a NUOPC Model
- ESMPy Doc