You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi guys,
i have a problem as user of the Oracle Cloud Infrastructure, let's see if anyone can help.
I have a binary compiled in the login node, a parallel code which uses MPI heavily. I have a slurm script that submit the job loading firstly some modules. What is strange is that if I sbatch the script specifying a BM instance already up and running, i have an error at the MPI init, i.e. at the very beginning. If i do the same to a VM, all works fine. All works fine also if I log in directly to the BM, load the same modules, and run the binary using "mpirun -np ..."
It seems that there is a problem with MPI through slurm in the BM... any hint?
I attach here the slurm script.
thanks!
#!/bin/bash#SBATCH --job-name="combo"#SBATCH --time=02:00:00#SBATCH --ntasks=64#SBATCH --threads-per-core=1#SBATCH --output=/mnt/shared/ELEM/data/scars-darrel-test/slurm_outputs/output_%J.out#SBATCH --error=/mnt/shared/ELEM/data/scars-darrel-test/slurm_outputs/output_%J.err#### I use this only to test the case when starting the BM beforehand#SBATCH --nodelist=bm-standard-e2-64-ad1-0003
module purge
module load hwloc
module load pmix
module load prun/1.3
module load gnu8/8.3.0
module load openmpi3/3.1.4
module load ohpc
module load Python/3.6.6-foss-2018b
set -eo pipefail -o nounset
source /etc/profile.d/lmod.sh
export folderdata=/mnt/shared/ELEM/data/scars-darrel-test
export foldertemplate=${folderdata}/data_in
export folderin=${folderdata}/data_in_${SLURM_JOB_ID}export foldergeom=${folderdata}/geom_in
export folderout=${folderdata}/resu_${SLURM_JOB_ID}export foldervtkgeom=${folderdata}/vtk-geom-definition
export probname=wedge_scars
export binalya=/mnt/shared/ELEM/bm-standard-e2-64-ad1-0001-cosas/mariano-exmedi-ohara-alya2/Executables/unix/Alya.g
mkdir -p ${folderin}
cp -r ${foldertemplate}/*${folderin}/.
echo'--|JOB STARTING AT: '`date`echo'--| ALYA: STARTING AT: '`date`cd${folderin}#### I get the error after this:time -p srun --mpi=pmix ${binalya}${probname}echo'--| ALYA: FINISHED AT: '`date`echo'--|JOB FINISHED: '`date`
The text was updated successfully, but these errors were encountered:
hi guys,
i have a problem as user of the Oracle Cloud Infrastructure, let's see if anyone can help.
I have a binary compiled in the login node, a parallel code which uses MPI heavily. I have a slurm script that submit the job loading firstly some modules. What is strange is that if I sbatch the script specifying a BM instance already up and running, i have an error at the MPI init, i.e. at the very beginning. If i do the same to a VM, all works fine. All works fine also if I log in directly to the BM, load the same modules, and run the binary using "mpirun -np ..."
It seems that there is a problem with MPI through slurm in the BM... any hint?
I attach here the slurm script.
thanks!
The text was updated successfully, but these errors were encountered: