-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slurm nodes installation under Ubuntu16.04 #2
base: master
Are you sure you want to change the base?
Conversation
The files can be used for installing Slurm nodes with Ubuntu 16.04. The Slurm master node is biomedia03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of changes, most critical are:
- use Jinja templates
- use Pillar
files/etc/slurm-llnl/slurm.conf
Outdated
{% set rocsList = [] %} | ||
{% for node, values in pillar['slurm']['nodes']['batch']['cpus'].items() %} {% if node.startswith('roc') %} {% set rocsListTrash = rocsList.append(node) %} {% endif %} {% endfor %} | ||
NodeName=biomedia01 RealMemory=64000 CPUs=24 State=UNKNOWN | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be part of the Jinja template
files/etc/slurm-llnl/slurm.conf
Outdated
NodeName=monal01 RealMemory=80000 CPUs=12 Gres=gpu:8 State=UNKNOWN | ||
|
||
# Partitions | ||
PartitionName=long Nodes=biomedia01,biomedia02,biomedia05,biomedia06,biomedia07,biomedia08,biomedia09,biomedia10,roc01,roc02,roc03 Default=YES MaxTime=43200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all roc machines should be in long as well. it's fine to have two partitions overlapping
@@ -6,7 +6,7 @@ ArchiveSuspend=no | |||
#ArchiveScript=/usr/sbin/slurm.dbd.archive | |||
#AuthInfo=/var/run/munge/munge.socket.2 | |||
AuthType=auth/munge | |||
DbdHost={{ pillar['slurm']['controller'] }} | |||
DbdHost=biomedia03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no hardcoded values please => use Pillar
init.sls
Outdated
{% endif %} | ||
install slurm packages from local repo: | ||
pkg.installed: | ||
- sources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
@@ -9,4 +9,4 @@ | |||
# who wants to be able to SSH in as root via public-key on Biomedia servers. | |||
# disable SSH for anybody but root | |||
+:root:ALL | |||
-:ALL EXCEPT (csg) dr jpassera bglocker:ALL | |||
-:ALL EXCEPT (csg) (biomedia) dr jpassera bglocker jgao:ALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regular users shouldn't have SSH access to the cluster nodes, hence the previous config
ConstrainSwapSpace=yes | ||
AllowedSwapSpace=10.0 | ||
# Not well supported until Slurm v14.11.4 https://groups.google.com/d/msg/slurm-devel/oKAUed7AETs/Eb6thh9Lc0YJ | ||
#ConstrainDevices=yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be enabled and not commented out then?
files/etc/slurm-llnl/slurm.conf
Outdated
@@ -3,7 +3,7 @@ | |||
# See the slurm.conf man page for more information. | |||
# | |||
# Workaround because Slurm does not recognize full hostname... | |||
ControlMachine={{ pillar['slurm']['controller'] }} | |||
ControlMachine=biomedia03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pillar
@@ -1,3 +0,0 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script is useful to add new members. it should stay there
@@ -1,13 +0,0 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Script used to bootstrap new minions. Any reason to remove it?
* Slurm | ||
* Slurm Database | ||
* SSH | ||
To install Slurm nodes, you need to copy (on Slurm mater node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move that lower in an "Instructions" subsection instead of replacing what the formula contains?
Hi Jonathan,
Thank you very much for the comments. I will modify and test on predicts cluster.
Best wishes,
Jianliang
________________________________
From: jopasserat <[email protected]>
Sent: 30 April 2018 17:02:34
To: BioMedIA/slurm-formula
Cc: Gao, Jianliang; Author
Subject: Re: [BioMedIA/slurm-formula] Slurm nodes installation under Ubuntu16.04 (#2)
@jopasserat requested changes on this pull request.
Couple of changes, most critical are:
* use Jinja templates
* use Pillar
________________________________
In files/etc/slurm-llnl/slurm.conf<#2 (comment)>:
-{% set rocsList = [] %}
-{% for node, values in pillar['slurm']['nodes']['batch']['cpus'].items() %} {% if node.startswith('roc') %} {% set rocsListTrash = rocsList.append(node) %} {% endif %} {% endfor %}
+NodeName=biomedia01 RealMemory=64000 CPUs=24 State=UNKNOWN
+
this should be part of the Jinja template
________________________________
In files/etc/slurm-llnl/slurm.conf<#2 (comment)>:
+NodeName=roc11 RealMemory=257869 CPUs=32 State=UNKNOWN
+
+NodeName=roc12 RealMemory=257869 CPUs=32 State=UNKNOWN
+
+NodeName=roc13 RealMemory=257869 CPUs=32 State=UNKNOWN
+
+NodeName=roc14 RealMemory=257869 CPUs=32 State=UNKNOWN
+
+NodeName=roc15 RealMemory=257869 CPUs=32 State=UNKNOWN
+
+NodeName=roc16 RealMemory=257869 CPUs=32 State=UNKNOWN
+
+NodeName=monal01 RealMemory=80000 CPUs=12 Gres=gpu:8 State=UNKNOWN
+
+# Partitions
+PartitionName=long Nodes=biomedia01,biomedia02,biomedia05,biomedia06,biomedia07,biomedia08,biomedia09,biomedia10,roc01,roc02,roc03 Default=YES MaxTime=43200
all roc machines should be in long as well. it's fine to have two partitions overlapping
________________________________
In files/etc/slurm-llnl/slurmdbd.conf<#2 (comment)>:
@@ -6,7 +6,7 @@ ArchiveSuspend=no
#ArchiveScript=/usr/sbin/slurm.dbd.archive
#AuthInfo=/var/run/munge/munge.socket.2
AuthType=auth/munge
-DbdHost={{ pillar['slurm']['controller'] }}
+DbdHost=biomedia03
no hardcoded values please => use Pillar
________________________________
In init.sls<#2 (comment)>:
-/var/log/slurm-llnl/sched.log:
- file.managed:
- - group: slurm
- - user: slurm
- - require:
- - user: slurm
-{% endif %}
+install slurm packages from local repo:
+ pkg.installed:
+ - sources:
nice 👍
________________________________
In files/etc/security/access.conf<#2 (comment)>:
@@ -9,4 +9,4 @@
# who wants to be able to SSH in as root via public-key on Biomedia servers.
# disable SSH for anybody but root
+:root:ALL
--:ALL EXCEPT (csg) dr jpassera bglocker:ALL
+-:ALL EXCEPT (csg) (biomedia) dr jpassera bglocker jgao:ALL
regular users shouldn't have SSH access to the cluster nodes, hence the previous config
________________________________
In files/etc/slurm-llnl/cgroup.conf<#2 (comment)>:
@@ -16,7 +16,8 @@ CgroupReleaseAgentDir=/var/spool/slurm-llnl/cgroup
ConstrainCores=yes
TaskAffinity=yes
#ConstrainRAMSpace=no
-### not used yet
-#ConstrainDevices=no
-#AllowedDevicesFile=/etc/slurm-llnl/cgroup_allowed_devices_file.conf
-
+ConstrainSwapSpace=yes
+AllowedSwapSpace=10.0
+# Not well supported until Slurm v14.11.4 https://groups.google.com/d/msg/slurm-devel/oKAUed7AETs/Eb6thh9Lc0YJ
+#ConstrainDevices=yes
Should that be enabled and not commented out then?
________________________________
In files/etc/slurm-llnl/slurm.conf<#2 (comment)>:
@@ -3,7 +3,7 @@
# See the slurm.conf man page for more information.
#
# Workaround because Slurm does not recognize full hostname...
-ControlMachine={{ pillar['slurm']['controller'] }}
+ControlMachine=biomedia03
Pillar
________________________________
In tools/add_users_to_slurm_groups.sh<#2 (comment)>:
@@ -1,3 +0,0 @@
-#!/bin/bash
This script is useful to add new members. it should stay there
________________________________
In tools/install_salt-minion.sh<#2 (comment)>:
@@ -1,13 +0,0 @@
-#!/bin/bash
Script used to bootstrap new minions. Any reason to remove it?
________________________________
In README.md<#2 (comment)>:
@@ -3,9 +3,9 @@
Salt formula provisioning a Slurm cluster
…-Availables states:
- * Munge
- * Screen
- * Slurm
- * Slurm Database
- * SSH
+To install Slurm nodes, you need to copy (on Slurm mater node)
Maybe move that lower in an "Instructions" subsection instead of replacing what the formula contains?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#2 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARkEoAC125GdeIqtpcT2yBx_p7wBdqzbks5ttzWagaJpZM4TsphH>.
|
files/etc/slurm-llnl/slurm.conf
Outdated
PartitionName=rocsShort Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=60 Priority=5000 | ||
#PartitionName=long Nodes=biomedia01,biomedia02,biomedia03,biomedia05 Default=YES MaxTime=43200 | ||
#PartitionName=short Nodes=biomedia01,biomedia03,biomedia05 Default=NO MaxTime=60 Priority=5000 | ||
PartitionName=gpus Nodes=monal01 Default=NO MaxTime=10080 MaxCPUsPerNode=4 MaxMemPerNode=30720 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the MaxCPUsPerNode=4 MaxMemPerNode=30720
settings. Just legacy code here
Hi Jonathan,
OK, thank you very much. I haven't done the modification yet. I'm now testing a shell script to submit docker run jobs via slurm.
Best wishes,
Jianliang
Sent from my Mobile
-------- Original Message --------
Subject: Re: [BioMedIA/slurm-formula] Slurm nodes installation under Ubuntu16.04 (#2)
From: jopasserat
To: BioMedIA/slurm-formula
CC: "Gao, Jianliang" ,Author
@jopasserat commented on this pull request.
________________________________
In files/etc/slurm-llnl/slurm.conf<#2 (comment)>:
-PartitionName=rocsLong Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=43200
-PartitionName=rocsShort Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=60 Priority=5000
+#PartitionName=long Nodes=biomedia01,biomedia02,biomedia03,biomedia05 Default=YES MaxTime=43200
+#PartitionName=short Nodes=biomedia01,biomedia03,biomedia05 Default=NO MaxTime=60 Priority=5000
+PartitionName=gpus Nodes=monal01 Default=NO MaxTime=10080 MaxCPUsPerNode=4 MaxMemPerNode=30720
You can remove the MaxCPUsPerNode=4 MaxMemPerNode=30720 settings. Just legacy code here
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#2 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ARkEoHSzssmEsrvy-L9Dohsn3mR345GYks5tuMUhgaJpZM4TsphH>.
|
Updated slurm.conf & pillar.example using jinja template
StoragePass={{ pillar['slurm']['db']['password'] }} | ||
StorageLoc=slurmdb | ||
StorageUser=slurm | ||
StoragePass=1BUy4eVv7X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
password in cleartext in the commit history...
The files can be used for installing Slurm nodes with Ubuntu 16.04. The Slurm master node is biomedia03