Check and set the time for Gigabyte nodes.
If the console log indicates the time between the rest of the system and the compute nodes is off by several hours, then it prevents the spire-agent
from getting a valid certificate,
which causes the node boot to drop into the dracut
emergency shell.
-
(
ncn-mw#
) Retrieve thecray-console-operator
pod ID.CONPOD=$(kubectl get pods -n services -o wide|grep cray-console-operator|awk '{print $1}'); echo ${CONPOD}
Example output:
cray-console-operator-79bf95964-qpcpp
The following steps should be repeated for each Gigabyte node which needs to have its BIOS time reset.
-
(
ncn-mw#
) Set theXNAME
variable to the component name (xname) of the node whose console you wish to open.XNAME=x3001c0s24b1n0
-
(
ncn-mw#
) Find thecray-console-node
pod that is connected to that node.NODEPOD=$(kubectl -n services exec "${CONPOD}" -c cray-console-operator -- \ sh -c "/app/get-node ${XNAME}" | jq .podname | sed 's/"//g') ; echo ${NODEPOD}
Example output:
cray-console-node-1
-
(
ncn-mw#
) Connect to the node's console using ConMan on the identifiedcray-console-node
pod.kubectl exec -it -n services "${NODEPOD}" -- conman -j "${XNAME}"
Example output:
<ConMan> Connection to console [x3001c0s24b1] opened.
-
(
ncn-mw#
) In another terminal, boot the node to BIOS.-
Set the
BMC
variable to the component name (xname) of the BMC for the node.This value will be different for each node.
BMC=x3001c0s24b1
-
Boot the node to BIOS.
read -s
is used to prevent the password from being written to the screen or the shell history.USERNAME=root read -r -s -p "$BMC ${USERNAME} password: " IPMI_PASSWORD export IPMI_PASSWORD ipmitool -I lanplus -U "${USERNAME}" -E -H "${BMC}" chassis bootdev bios ipmitool -I lanplus -U "${USERNAME}" -E -H "${BMC}" chassis power off sleep 10 ipmitool -I lanplus -U "${USERNAME}" -E -H "${BMC}" chassis power on
-
-
(
ncn-mw#
) Update theSystem Date
field to match the time on the system.Use the terminal which is watching the console for this step. As the node powers on, it will complete POST (Power On Self Test) and then display the BIOS menu.
The
System Date
field is located under theMain
tab in the navigation bar. -
Enter the
F10
key followed by theEnter
key to save the BIOS time. -
Exit the connection to the console by entering
&.
. -
Repeat the above steps for other nodes which need their BIOS time reset.