You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the request
Currently the liveness probe uses port 8089 to check the health status of the splunk process. In a busy indexer running smartstore downloads it may fail to respond within 30 seconds, however, this is normal behaviour so I've made the failure thresholds very large.
However, I believe there is a safer way to check the Splunk process is alive and healthy.
As an example this is a failed indexer:
Liveness probe failed: Mgmt. port is not reachable
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8089: Connection refused
(killed by the OOM killer)
This indexer is busy but will recover:
Liveness probe failed: command "/mnt/probes/livenessProbe.sh" timed out
Expected behavior
I'd suggest using a command: /opt/splunk/bin/splunk status
I checked inside the pod and if successful you get a 0, if down you get a 3:
$ /opt/splunk/bin/splunk status
splunkd is not running.
[splunk@001 /opt/splunk]
$ echo $?
3
Please select the type of request
Bug
Tell us more
Describe the request
Currently the liveness probe uses port 8089 to check the health status of the splunk process. In a busy indexer running smartstore downloads it may fail to respond within 30 seconds, however, this is normal behaviour so I've made the failure thresholds very large.
However, I believe there is a safer way to check the Splunk process is alive and healthy.
As an example this is a failed indexer:
Liveness probe failed: Mgmt. port is not reachable
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8089: Connection refused
(killed by the OOM killer)
This indexer is busy but will recover:
Liveness probe failed: command "/mnt/probes/livenessProbe.sh" timed out
Expected behavior
I'd suggest using a command:
/opt/splunk/bin/splunk status
I checked inside the pod and if successful you get a 0, if down you get a 3:
A running splunk instance shows a return code of 0 , the K8s docs https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ mention that non-zero is failure.
Reproduction/Testing steps
Stop the splunk instance inside the pod
K8s environment
K8s 1.28
Proposed changes(optional)
As per expected behaviour, switch the liveness probe to:
/opt/splunk/bin/splunk status
The text was updated successfully, but these errors were encountered: