Overview
Starting with version 8.2.0, the ESMF team has moved to a more frequent release cadence with new releases anticipated approximately every six months. This approach helps to ensure that new features, bug fixes, and optimizations are available more frequently in official releases of ESMF.
Highlights of the 8.2.0 release are outlined below. A detailed list of release notes is also provided below.
The NUOPC run sequence feature has proven a viable formalism to capture and express the control- and data-flow among the components of a wide range of coupled applications. Recent application work has demonstrated the need for more succinctly specifying conditional execution of run sequence elements. This release extends the NUOPC RunSequence syntax to include Alarm Blocks. Alarm blocks allow the user to specify if certain run sequence elements should be called less frequently than the parent timestep.
Several groups have started implementing exchange grids in their modeling systems, including within NUOPC Mediators. To facilitate these efforts, the ESMF_XGrid support was extended in this release. It now includes the use of all ESMF regridding methods (bilinear, patch, etc.) and options (extrapolation, regridding status, etc.) when regridding to or from Fields built on exchange grids.
A number of issues were uncovered during the deployment of the ESMF-managed threading and resource control features. This release addresses these issues, and the NUOPC level support for resource control and handling of threaded components is now more robust and has been demonstrated in several large-scale applications. This feature allows model components to independently set OpenMP threading levels so that all components in a coupled system are best utilizing available HPC resources, based on their individual scaling profiles.
The VMEpoch feature is an important communication optimization used by the NUOPC_Connector, and by some applications directly. This release fixes a problem that was encountered when using VMEpoch with any of the Redist() methods. The release also addresses an out-of-memory issue that can be triggered when the sending side runs many iterations ahead of the receiving side, by introducing automatic message throttling. Finally, a new reference manual section is available that describes the use of VMEpoch for asynchronous RouteHandle communications.
The process of replacing the native ESMF mesh implementation with the MOAB library, developed by the U.S. Department of Energy, is continuing. This release makes the MOAB mesh backend available to ESMPy users by calling Manager.set_moab(). This option allows users to test the impacts of using the MOAB mesh backend instead of the default native mesh through ESMPy.
Release Notes
- This release is backward compatible with the last major release update, ESMF 8.1.0 and patch release ESMF 8.1.1, for all the interfaces that are marked as backward compatible in the Reference Manual. There were API changes to a few unmarked methods that may require minor modifications to user code that uses these methods. The entire list of API changes is summarized in a table showing interface changes since ESMF_8_1_0, including the rationale and impact for each change.
- No bit-for-bit changes were observed for this release compared to release ESMF 8.1.0 and patch release ESMF 8.1.1, with Intel compilers using “-O2 -fp-model precise”. However, the release contains code changes to the regridding implementation that have the potential to lead to bit-for-bit changes in regridding weights. Any release item with the potential to introduce a bit-for-bit change is indicated in the respective release note.
- Tables summarizing the ESMF regridding status have been updated. These include supported grids and capabilities of the offline and integrated regridding.
- The NUOPC RunSequence syntax was extended to support Alarm Blocks. An alarm block specifies the time interval at which the elements within the block are executed. This adds additional flexibility to the RunSequence approach, e.g. to write restart files at certain intervals that are multiples of the parent timestep.
- Fields created on XGrids can now be used as either source, destination, or both when calling the general ESMF regrid methods (ESMF_FieldRegridStore(), ESMF_FieldRegrid(), ESMF_FieldBundleRegridStore(), ESMF_FieldBundleRegrid()). This enables the use of all ESMF regridding methods (bilinear, patch, etc.) and options (extrapolation, regridding status, etc.) when regridding to or from Fields on an XGrid. Prior to this release, regridding to or from Fields on an XGrid was only supported when going from one of the grids used to originally create the XGrid. Also, only conservative methods were supported.
- A change in the 3D spherical bilinear weight calculation to handle more complex cells lead to a decrease in performance in releases 8.0.0, 8.1.0, and 8.1.1. The current release restores the performance to the level of ESMF 7.1.0r, and better, while retaining support for the complex cells. (Note that this change has the potential to introduce round off level changes in weights calculated for the 3D spherical bilinear method compared to previous ESMF releases. However, bit-for-bit testing with the Intel compiler using “-O2 -fp-model precise” did not detect any changes.)
- A number of issues that were found with ESMF-managed threading under real application usage, as released with ESMF 8.1.0, have been addressed: (1) PETs that execute a threaded component are no longer instantiated as Pthreads by default but instead execute under the original MPI process. This resolves the issue of not being able to set an unlimited stack size. (2) Issues within the automatic garbage collection of ESMF objects have been resolved, which lead to memory corruption issues during ESMF_Finalize() when Grids or Meshes were transferred between threaded components. (3) Thread affinities and number of OpenMP threads are reset when exiting from a threaded component method, and global resource control can be turned on/off via the optional argument
globalResourceControl
during ESMF_Initialize(). - It is now possible to override the defaults of a number of global ESMF settings by specifying an ESMF_Config file during ESMF_Initialize(). This is particularly useful for adjusting log specific settings, or to turn on/off resource control on the global VM.
- A new section was added to the ESMF Reference Manual that discusses use of VMEpoch for asynchronous RouteHandle communications.
- The VMEpoch feature allows sending PETs to fill the message queue up to the limit set by the MPI implementation. For message sizes where an MPI implementation chooses to use the EAGER protocol, this can lead to memory exhaustion on the receiving PETs. To prevent this issue, VMEpoch now limits the number of outstanding send cycles to ten by default. This default can be overridden by the user through the optional argument
throttle
to ESMF_VMEpochEnter(). - The process of replacing the native ESMF mesh implementation with the MOAB library is continuing. The MOAB mesh backend is now available to ESMPy by calling Manager.set_moab(). This allows the user to test ESMPy regridding features with the new MOAB backend in preparation for MOAB becoming the default. Manager.moab returns a boolean value to indicate if the MOAB backend is currently in use. The default is to use the native ESMF mesh backend.
Known Bugs
-
The ESMF_XGrid construction can lead to degenerate cells in cases where the source and destination grids have edges that are almost the same. Often these cells don't produce weights and are benign, but when weights are produced, they can lead to low accuracy results when transferring data to/from the XGrid.
-
Attempting to write weight files from the ESMPy Regrid object when using filemode=FileMode.WITHAUX currently crashes.
Platform-specific bugs: -
The GNU and Intel compilers require GCC>=4.8 for C++11 support (Intel uses the GCC headers). By default, ESMF uses the C++11 standard and cannot be downgraded. If you run into build issues due to the C++11 dependency, you must make sure a GCC>=4.8 is loaded.
-
For GNU compilers GCC>=10.x, the default Fortran argument mismatch checking has become stricter. This results in build failures in some of the code that comes with ESMF. Setting environment variable ESMF_F90COMPILEOPTS="-fallow-argument-mismatch -fallow-invalid-boz", during the ESMF build, can be used as a work-around for this issue.
-
On Darwin, with the GNU gfortran+gcc combination, when building MPICH3 from source, it is important to specify the "--enable-two-level-namespace" configure option. By default, i.e. without this option, on Darwin, the produced MPICH compiler wrappers include a linker flag (-flat_namespace) that causes issues with C++ exception handling. Building and linking ESMF applications with MPICH compiler wrappers that specify this linker option leads to “mysterious” application aborts during execution.
-
On Darwin, with the Intel Fortran compiler, command line arguments cannot be accessed from ESMF applications when linked against the shared library version of libesmf. There is no issue when linked against the static libesmf.a version. Setting the environment variable ESMF_SHARED_LIB_BUILD=OFF, during the ESMF build, can be used as a work around for this issue.
-
The ESMF_ArrayIOUTest unit test fails the binary read test on the S4 test system (Linux+Intel+IntelMPI).
-
There is an issue with intercepting the MPI calls for profiling on some of the supported platforms. This results in a single FAIL reported for ESMF_TraceMPIUTest.F90. The affected platforms are:
- Catania: Darwin+GNU+MPICH3
- Gaea: Unicos+GNU+cray-mpich
-
There is an issue with loading the libesmftrace_preload.so library on some of the supported platforms. This results in a reported CRASH for ESMF_TraceIOUTest.F90 and ESMF_TraceMPIUTest.F90. The affected platforms are:
- Cori: Unicos+Intel+cray-mpich
- Cori: Unicos+Intel+mpiuni
- Discover: Linux+GNU+intelmpi
- Gaea: Unicos+Intel+cray-mpich
- Gaea: Unicos+Intel+mpiuni
- Hera: Linux+GNU+intelmpi
- Orion: Linux+GNU+mpiuni
Documentation
- ESMF Reference Manual for Fortran
- ESMF Reference Manual for C
- ESMF User Guide
- NUOPC Layer Reference
- Building a NUOPC Model
- ESMPy Doc