Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Urania machine and validate flag to scheduler #6184

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions support/Environments/urania.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/env sh

# Distributed under the MIT License.
# See LICENSE.txt for details.

spectre_setup_charm_paths() {
# Define Charm paths
export CHARM_ROOT=/u/guilara/charm_impi_2/mpi-linux-x86_64-smp
export PATH=$PATH:/u/guilara/charm_impi_2/mpi-linux-x86_64-smp/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you create a module file like /u/guilara/modules/charm/7.0.0-impi-2 that sets these paths. Then you can module load it below and don't have to call this extra spectre_setup_charm_paths function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I want to tackle this right now. Installing charm was a bit tricky

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then better set these paths in spectre_load_modules, or else you always have to call this extra function?

}

spectre_load_modules() {
module load gcc/11
module load impi/2021.7
module load boost/1.79
module load gsl/1.16
module load cmake/3.26
module load hdf5-serial/1.12.2
module load anaconda/3/2021.11
module load paraview/5.10
# Load Spack environment
source /u/guilara/repos/spack/share/spack/setup-env.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete this line

source /urania/u/guilara/repos/spack/var/spack/environments\
/env3_spectre_impi/loads
# Load python environment
source /u/guilara/envs/spectre_env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "bin/activate" ?

}

spectre_unload_modules() {
module unload gcc/11
module unload impi/2021.7
module unload boost/1.79
module unload gsl/1.16
module unload cmake/3.26
module unload hdf5-serial/1.12.2
module unload anaconda/3/2021.11
module unload paraview/5.10
# Unload Spack environment
spack env deactivate
# Unload python environment
deactivate
}

spectre_run_cmake() {
if [ -z ${SPECTRE_HOME} ]; then
echo "You must set SPECTRE_HOME to the cloned SpECTRE directory"
return 1
fi
spectre_load_modules

cmake -D CMAKE_C_COMPILER=gcc \
-D CMAKE_CXX_COMPILER=g++ \
-D CMAKE_Fortran_COMPILER=gfortran \
-D CHARM_ROOT=$CHARM_ROOT \
-D CMAKE_BUILD_TYPE=Release \
-D DEBUG_SYMBOLS=OFF \
-D BUILD_SHARED_LIBS=ON \
-D MEMORY_ALLOCATOR=JEMALLOC \
-D BUILD_PYTHON_BINDINGS=ON \
-D MACHINE=Urania \
-D Catch2_DIR=/u/guilara/repos/Catch2/install_dir/lib64/cmake/Catch2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backslash at end of line missing

-D MPI_C_COMPILER=/mpcdf/soft/SLE_15/packages/skylake\
/impi/gcc_11-11.2.0/2021.7.1/bin/mpigcc \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cmake doesn't find these without help even though the impi/2021.7 module is loaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried removing them. I must have added them because it couldn't find it.

-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) 
-- Could NOT find MPI (missing: MPI_C_FOUND C)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MPI needed? If everything works without explicitly linking MPI then just remove these lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need it, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't link to it explicitly, only charm is built with it.

-D MPI_CXX_COMPILER=/mpcdf/soft/SLE_15/packages/skylake\
/impi/gcc_11-11.2.0/2021.7.1/bin/mpig++ \
-D MPI_Fortran_COMPILER=/mpcdf/soft/SLE_15/packages/skylake\
/impi/gcc_11-11.2.0/2021.7.1/bin/mpigfortran \
-Wno-dev "$@" $SPECTRE_HOME
}
11 changes: 11 additions & 0 deletions support/Machines/Urania.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Distributed under the MIT License.
# See LICENSE.txt for details.

Machine:
Name: Urania
Description: |
Supercomputer at the Max Planck Computing Data Facilty.
DefaultProcsPerNode: 72
DefaultQueue: "p.urania"
DefaultTimeLimit: "1-00:00:00"
LaunchCommandSingleNode: ["srun", "-n", "1"]
16 changes: 13 additions & 3 deletions support/Python/Schedule.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ def schedule(
submit: Optional[bool] = None,
clean_output: bool = False,
force: bool = False,
validate: Optional[bool] = True,
extra_params: dict = {},
**kwargs,
) -> Optional[subprocess.CompletedProcess]:
Expand Down Expand Up @@ -319,6 +320,8 @@ def schedule(
files in the 'run_dir' before scheduling the run. (Default: 'False')
force: Optional. When 'True', overwrite input file and submit script
in the 'run_dir' instead of raising an error when they already exist.
validate: Optional. When 'True', validate that the input file is parsed
correctly. When 'False' skip this step.
extra_params: Optional. Dictionary of extra parameters passed to input
file and submit script templates. Parameters can also be passed as
keyword arguments to this function instead.
Expand Down Expand Up @@ -549,9 +552,11 @@ def schedule(
)

# Validate input file
validate_input_file(
input_file_path.resolve(), executable=executable, work_dir=run_dir
)
if validate:
validate_input_file(
input_file_path.resolve(), executable=executable, work_dir=run_dir
)

# - If the input file may request resubmissions, make sure we have a
# segments directory
metadata, input_file = yaml.safe_load_all(rendered_input_file)
Expand Down Expand Up @@ -861,6 +866,11 @@ def scheduler_options(f):
"You may also want to use '--clean-output'."
),
)
@click.option(
"--validate/--no-validate",
default=True,
help="Validate or skip the validation of the input file.",
)
# Scheduling options
@click.option(
"--scheduler",
Expand Down
39 changes: 39 additions & 0 deletions support/SubmitScripts/Urania.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{% extends "SubmitTemplateBase.sh" %}

# Distributed under the MIT License.
# See LICENSE.txt for details.

# Uranina -- HPC cluster of ACR division of MPI for Grav Physics, housed at the
# Max Planck Computing & Data Facility.
# https://docs.mpcdf.mpg.de/doc/computing/clusters/systems/Gravitational_Physics_ACR.html

{% block head %}
{{ super() -}}
#SBATCH --nodes {{ num_nodes | default(1) }}
#SBATCH --ntasks-per-node=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have found that 2 tasks per node work well on other clusters of this size, maybe even 3. I see you currently do 1 task but 2 comm threads. Have you tried 2 tasks with 1 comm thread each? Not sure if there's a difference @knelli2 @nilsdeppe do you know?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically each task = an MPI rank = a comm core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really experimented with this. So I don't know.

#SBATCH --ntasks-per-core=1
nilsvu marked this conversation as resolved.
Show resolved Hide resolved
#SBATCH --cpus-per-task=72
#SBATCH -t {{ time_limit | default("1-00:00:00") }}
#SBATCH -p {{ queue | default("p.urania") }}
{% endblock %}

{% block charm_ppn %}
# Two thread for communication
CHARM_PPN=$(expr ${SLURM_CPUS_PER_TASK} - 2)
{% endblock %}

{% block list_modules %}
# Load compiler and MPI modules with explicit version specifications,
# consistently with the versions used to build the executable.
source ${SPECTRE_HOME}/support/Environments/urania.sh
spectre_load_modules
spectre_setup_charm_paths
Comment on lines +28 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] Might also want to module list after this so you can see what's loaded


{% endblock %}

{% block run_command %}
srun -n ${SLURM_NTASKS} ${SPECTRE_EXECUTABLE} \
--input-file ${SPECTRE_INPUT_FILE} \
++ppn ${CHARM_PPN} +pemap 0-34,36-70 +commap 35,71 \
${SPECTRE_CHECKPOINT:+ +restart "${SPECTRE_CHECKPOINT}"}
{% endblock %}
1 change: 1 addition & 0 deletions tests/support/Python/Test_Schedule.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@ def test_schedule(self):
extra_option="TestOpt",
metadata_option="MetaOpt",
force=False,
validate=True,
input_file="InputFile.yaml",
input_file_name="InputFile.yaml",
input_file_template=str(self.test_dir / "InputFile.yaml"),
Expand Down
Loading