Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI doesn't work when docker is running #1

Open
G-Ragghianti opened this issue Jul 27, 2023 · 15 comments
Open

OpenMPI doesn't work when docker is running #1

G-Ragghianti opened this issue Jul 27, 2023 · 15 comments

Comments

@G-Ragghianti
Copy link
Contributor

Problem: When a docker container is running, simple OpenMPI jobs cannot run using the tcp interface. For example, a broadcast test will hang.

Steps to reproduce:

$ spack install osu-micro-benchmarks ^openmpi~rsh fabric=ucx
$ spack load osu-micro-benchmarks
$ mpirun -n 2 osu_bcast

# OSU MPI Broadcast Latency Test v7.1
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                       4.32
2                       4.32
4                       4.36
8                       4.30
16                      4.30
32                      4.32
64                      4.33
128                     4.10
256                     4.30
512                     5.72
1024                    5.81
2048                    6.07
4096                    5.74
8192                    6.67
16384                   7.74
32768                  13.65
<hangs>

Expected result:

$mpirun -n 2 --mca oob_base_verbose 100 osu_bcast

# OSU MPI Broadcast Latency Test v7.1
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                       3.26
2                       4.05
4                       4.40
8                       7.55
16                      5.53
32                      5.53
64                      4.06
128                     4.49
256                     6.37
512                     7.11
1024                    5.92
2048                    7.26
4096                    6.74
8192                    8.74
16384                  10.93
32768                  14.40
65536                  33.09
131072                 48.18
262144                 70.30
524288                118.22
1048576               200.32

Verbose output:

[histamine0:1785348] mca: base: components_register: registering framework oob components
[histamine0:1785348] mca: base: components_register: found loaded component tcp
[histamine0:1785348] mca: base: components_register: component tcp register function successful
[histamine0:1785348] mca: base: components_open: opening oob components
[histamine0:1785348] mca: base: components_open: found loaded component tcp
[histamine0:1785348] mca: base: components_open: component tcp open function successful
[histamine0:1785348] mca:oob:select: checking available component tcp
[histamine0:1785348] mca:oob:select: Querying component [tcp]
[histamine0:1785348] oob:tcp: component_available called
[histamine0:1785348] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[histamine0:1785348] [[3819,0],0] oob:tcp:init rejecting loopback interface lo
[histamine0:1785348] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[histamine0:1785348] [[3819,0],0] oob:tcp:init adding 10.0.0.49 to our list of V4 connections
[histamine0:1785348] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
[histamine0:1785348] [[3819,0],0] oob:tcp:init adding 172.17.0.1 to our list of V4 connections
[histamine0:1785348] [[3819,0],0] TCP STARTUP
[histamine0:1785348] [[3819,0],0] attempting to bind to IPv4 port 0
[histamine0:1785348] [[3819,0],0] assigned IPv4 port 36725
[histamine0:1785348] mca:oob:select: Adding component to end
[histamine0:1785348] mca:oob:select: Found 1 active transports
[histamine0:1785348] [[3819,0],0]: get transports
[histamine0:1785348] [[3819,0],0]:get transports for component tcp

# OSU MPI Broadcast Latency Test v7.1
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                       4.45
2                       4.61
4                       4.66
8                       4.63
16                      4.02
32                      4.06
64                      4.07
128                     4.10
256                     4.13
512                     5.82
1024                    5.92
2048                    6.27
4096                    5.98
8192                    6.69
16384                   7.57
32768                  14.08
<hangs>
@G-Ragghianti
Copy link
Contributor Author

G-Ragghianti commented Jul 27, 2023

It appears that this occurs because openmpi tries to use the virtual network interface that is set up for the docker container. This is the interface with IP 172.17.0.1 in the verbose log. It is not clear what we should do to avoid this.

@bosilca @abouteiller @mgates3

@bosilca
Copy link

bosilca commented Jul 27, 2023

To prevent OMPI from using a specific IP interface you can do --mca btl_tcp_if_exclude 172.17.0.0/16 or use the explicit interface name --mca btl_tcp_if_exclude docker0.

@G-Ragghianti
Copy link
Contributor Author

Yes, but I'm assuming that you want openmpi to work without the users of our systems all having to know this an always run with this?

@abouteiller
Copy link

abouteiller commented Jul 27, 2023 via email

@bosilca
Copy link

bosilca commented Jul 27, 2023

Indeed, there is what I want and then there is what is possible. Is there a consistent way to identify the interfaces created by dockers or interfaces that are virtual and cannot be used for data exchanges ? Unfortunately the answer is no, and thus either the users/sys admin provide the correct configuration files (either user or system wide MCA param) or we will be reliant on the system timeout (btw, the execution did not deadlock it is just waiting for the timeout to signal that the interface cannot be used, and the default timeout is extremely long).

@G-Ragghianti
Copy link
Contributor Author

G-Ragghianti commented Jul 28, 2023

Yes, disabling the docker0 interfaces avoids the problem. I would have to think of the best way to set this. This would not be very clean to manually set it within the spack openmpi install directory, but it looks like it doesn't look anywhere else for the conf file.

Also, I'm confused why openmpi isn't using vader/sm. Even if I set "--mca btl self,vader" it doesn't work correctly (doesn't run the osu_bcast):

[guyot:342029] mca: base: components_register: registering framework oob components
[guyot:342029] mca: base: components_register: found loaded component tcp
[guyot:342029] mca: base: components_register: component tcp register function successful
[guyot:342029] mca: base: components_open: opening oob components
[guyot:342029] mca: base: components_open: found loaded component tcp
[guyot:342029] mca: base: components_open: component tcp open function successful
[guyot:342029] mca:oob:select: checking available component tcp
[guyot:342029] mca:oob:select: Querying component [tcp]
[guyot:342029] oob:tcp: component_available called
[guyot:342029] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[guyot:342029] [[45588,0],0] oob:tcp:init rejecting loopback interface lo
[guyot:342029] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
[guyot:342029] [[45588,0],0] oob:tcp:init adding 10.0.0.151 to our list of V4 connections
[guyot:342029] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
[guyot:342029] [[45588,0],0] oob:tcp:init adding 160.36.131.188 to our list of V4 connections
[guyot:342029] WORKING INTERFACE 4 KERNEL INDEX 8 FAMILY: V4
[guyot:342029] [[45588,0],0] oob:tcp:init adding 172.17.0.1 to our list of V4 connections
[guyot:342029] [[45588,0],0] TCP STARTUP
[guyot:342029] [[45588,0],0] attempting to bind to IPv4 port 0
[guyot:342029] [[45588,0],0] assigned IPv4 port 59527
[guyot:342029] mca:oob:select: Adding component to end
[guyot:342029] mca:oob:select: Found 1 active transports
[guyot:342029] [[45588,0],0]: get transports
[guyot:342029] [[45588,0],0]:get transports for component tcp
[guyot:342029] [[45588,0],0] TCP SHUTDOWN
[guyot:342029] [[45588,0],0] TCP SHUTDOWN done
[guyot:342029] mca: base: close: component tcp closed
[guyot:342029] mca: base: close: unloading component tcp

@bosilca
Copy link

bosilca commented Jul 28, 2023

All these output messages are from PMIX and not from OMPI. So based on these we cannot conclude if vader/sm was or not used. Use --mca pml_base_verbose 10 to see what PML is used and what it loads.

@G-Ragghianti
Copy link
Contributor Author

OK:

[guyot:738468] mca: base: components_register: registering framework pml components
[guyot:738468] mca: base: components_register: found loaded component cm
[guyot:738468] mca: base: components_register: component cm register function successful
[guyot:738468] mca: base: components_register: found loaded component ob1
[guyot:738467] mca: base: components_register: registering framework pml components
[guyot:738467] mca: base: components_register: found loaded component cm
[guyot:738467] mca: base: components_register: component cm register function successful
[guyot:738467] mca: base: components_register: found loaded component ob1
[guyot:738467] mca: base: components_register: component ob1 register function successful
[guyot:738467] mca: base: components_register: found loaded component ucx
[guyot:738468] mca: base: components_register: component ob1 register function successful
[guyot:738468] mca: base: components_register: found loaded component ucx
[guyot:738467] mca: base: components_register: component ucx register function successful
[guyot:738468] mca: base: components_register: component ucx register function successful
[guyot:738467] mca: base: components_register: found loaded component v
[guyot:738468] mca: base: components_register: found loaded component v
[guyot:738468] mca: base: components_register: component v register function successful
[guyot:738467] mca: base: components_register: component v register function successful
[guyot:738468] mca: base: components_open: opening pml components
[guyot:738468] mca: base: components_open: found loaded component cm
[guyot:738467] mca: base: components_open: opening pml components
[guyot:738467] mca: base: components_open: found loaded component cm
[guyot:738467] mca: base: close: component cm closed
[guyot:738467] mca: base: close: unloading component cm
[guyot:738468] mca: base: close: component cm closed
[guyot:738468] mca: base: close: unloading component cm
[guyot:738468] mca: base: components_open: found loaded component ob1
[guyot:738467] mca: base: components_open: found loaded component ob1
[guyot:738468] mca: base: components_open: component ob1 open function successful
[guyot:738467] mca: base: components_open: component ob1 open function successful
[guyot:738467] mca: base: components_open: found loaded component ucx
[guyot:738468] mca: base: components_open: found loaded component ucx
[guyot:738467] mca: base: components_open: component ucx open function successful
[guyot:738467] mca: base: components_open: found loaded component v
[guyot:738467] mca: base: components_open: component v open function successful
[guyot:738468] mca: base: components_open: component ucx open function successful
[guyot:738468] mca: base: components_open: found loaded component v
[guyot:738468] mca: base: components_open: component v open function successful
[guyot:738467] select: initializing pml component ob1
[guyot:738467] select: init returned priority 20
[guyot:738467] select: initializing pml component ucx
[guyot:738468] select: initializing pml component ob1
[guyot:738468] select: init returned priority 20
[guyot:738468] select: initializing pml component ucx
[guyot:738467] select: init returned failure for component ucx
[guyot:738467] select: component v not in the include list
[guyot:738467] selected ob1 best priority 20
[guyot:738467] select: component ob1 selected
[guyot:738468] select: init returned failure for component ucx
[guyot:738468] select: component v not in the include list
[guyot:738468] selected ob1 best priority 20
[guyot:738468] select: component ob1 selected
[guyot:738467] mca: base: close: component ucx closed
[guyot:738467] mca: base: close: unloading component ucx
[guyot:738467] mca: base: close: component v closed
[guyot:738467] mca: base: close: unloading component v
[guyot:738468] mca: base: close: component ucx closed
[guyot:738468] mca: base: close: unloading component ucx
[guyot:738468] mca: base: close: component v closed
[guyot:738468] mca: base: close: unloading component v
[guyot:738467] check:select: PML check not necessary on self
[guyot:738468] check:select: checking my pml ob1 against process [[52872,1],0] pml ob1

@bosilca
Copy link

bosilca commented Jul 28, 2023

OB1 is selected, so all BTLs should be up and running, if you did not specifically excluded them (with --mca btl ˆsomething) . If you want more details, you can use --mca btl_base_verbose 10 to see specifically what BTL are loaded and what they do for initialization. However, being loaded does not mean it will be used, this will depend on the application's communication pattern.

@abouteiller
Copy link

We should investigate an upgrade of UCX to latest and Open MPI to 5.0.2, that may have resolved these problems.

@G-Ragghianti
Copy link
Contributor Author

G-Ragghianti commented Mar 1, 2024

I have scheduled a rebuild of the module that will be placed in a new location (date code 2024-03-01).

@G-Ragghianti
Copy link
Contributor Author

G-Ragghianti commented Mar 6, 2024

I'm building a new software module set of the latest [email protected] and [email protected], but the changes in UCX are scheduled for 1.16.

@G-Ragghianti
Copy link
Contributor Author

G-Ragghianti commented Mar 6, 2024

There is a problem with updating to openmpi@5 on our newer systems. The systems use [email protected] (required by slurm), but there is an incompatibility with this pmix version and openmpi version 5. It would be possible to use an "internal" pmix in openmpi, but I don't know if it will work with slurm then. Ideas?

@G-Ragghianti
Copy link
Contributor Author

Using openmpi's internal pmix, this is available to test on login.icl.utk.edu:

export MODULEPATH=/apps/spacks/2024-03-05/share/spack/modules/linux-rocky9-x86_64

@abouteiller
Copy link

Using that open mpi works as expected except for the following warning message

52: A requested component was not found, or was unable to be opened.  This
52: means that this component is either not installed or is unable to be
52: used on your system (e.g., sometimes this means that shared libraries
52: that the component requires are unable to be found/loaded).  Note that
52: PMIx stopped checking at the first component that it did not find.
52:
52: Host:      leconte
52: Framework: psec
52: Component: munge
52: --------------------------------------------------------------------------

This can be resolved by installing the munge package (from the slurm installation rpms, it doesn't get installed automatically in the client image when installing slurm, but it should).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants