-
Notifications
You must be signed in to change notification settings - Fork 876
Container Versioning
This page outlines a proposal for extending the Open MPI versioning rules to better support container use cases.
As a reminder, Open MPI currently guarantees binary compatibility for the MPI interface across releases with the same major version number. We also make a strong attempt to maintain behavior of utility application (mpirun
, mpicc
, etc.) command line options across releases with the same major version number. Releases with the same major and minor version number are expected to see bug fixes, but not new major features.
There are three pieces of software which all must coordinate to run an Open MPI application when not direct launching or using DVM:
-
mpirun
, which coordinates launching the run-time infrastructure and application - orte, the run-time component on each host which coordinates the MPI processes for that job/user/etc.
- libmpi, the MPI library which the application loads / uses during the application's execution.
In the traditional MPI environment, it is assumed that all three of these components are of the same version, which is why we talk about versioning solely between the application and libmpi. The container world, however, would like to break this assumption and use different versions of Open MPI across the three components. To see why, let's look at how people are using containers with MPI.
This is the mode most often used in Singularity, but also can be useful for other container infrastructure. In this mode, mpirun
and orte run outside of the container. Open MPI organizes starting the container and then the user application inside the container. Generally, this use case assumes there is some process management system outside of the container infrastructure (like Slurm or Torque), similar to what one might find on a traditional HPC system. The mpirun
and orte versions are those installed on the host infrastructure, and so are likely the same version. The version of libmpi is that installed in the container, which may be (likely is?) of a different version.
Unlike Singularity, Docker defaults to fairly strict container boundaries, and also can provide unique network addresses for each container. In this case, the user manages containers outside of Open MPI, and mpirun launches the orte runtime in the application container. The advantage to this case is that mpirun can launch containers of specific sizes and let the container runtime (such as Docker or ECS) allocate the physical resources to back those containers. In this scenario, orte and libmpi are both in the same container, so are likely to be from the same release / build. However, mpirun
is run from outside the container and may be (likely is?) from a different release.
In this mode, the user starts mpirun from a container, which either launches orte into existing containers or causes those containers to be created. This case is almost exactly like the traditional usage model; it can be expected that all three components (mpirun
, orte, and _libmpi) are from the same release.
Any change to our versioning policy will quickly run into the same problem with our current versioning policy: we don't test it. Amazon has been putting together a plan for testing the library versioning behavior by having an MTT run that builds against the X.0.0 release and saves all the test binaries (likely in S3). The nightly tests would build Open MPI and run against those pre-build binaries. If we get MTT to support that behavior, it shouldn't be hard to test the other interactions in additional MTT runs. We'd need the following sets of interactions:
- MPI application built against revision A,
mpirun
, orte, and libmpi from revision B. - MPI application built against revision B,
mpirun
from version A, orte and libmpi from revision B. - MPI application built against revision B,
mpirun
and orte from version A, libmpi from revision B. - MPI application built against revision A,
mpirun
, orte, and libmpi from revision A.
Of course, the last item is how we run all our MTT tests today, so that wouldn't require any special infrastructure.
Amazon believes it can implement testing the version combinations above before the 3.1.0 release. We can likely handle version problems in the 3.0.x release series with code reviews. Therefore, given the customer benefit of more flexibility in versions, Amazon think it makes sense to guarantee version compatibility between runtime components in the same major release series as part of the 3.x release series (and continue that practice going forward).