-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exit code 4 from --log=error
introduced by #131
#152
Comments
The current exporter uses this command. Can you please provide the output the device in question? Obscure sensitive data. smartctl --json --info --health --attributes --tolerance=verypermissive --nocheck=standby --format=brief --log=error <device> This was changed in e884420 on 2022-10-03 which was introduced in v0.9.0. Can you please run this command for the same device? Again, obscure sensitive data. smartctl --json --xall <device> |
Running the earlier-specified {
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
4
],
"pre_release": false,
"svn_revision": "5530",
"platform_info": "Darwin 23.4.0 arm64",
"build_info": "(local build)",
"argv": [
"smartctl",
"--json",
"--info",
"--health",
"--attributes",
"--tolerance=verypermissive",
"--nocheck=standby",
"--format=brief",
"--log=error",
"/dev/disk0"
],
"messages": [
{
"string": "Read 1 entries from Error Information Log failed: GetLogPage failed: system=0x38, sub=0x0, code=745",
"severity": "error"
}
],
"exit_status": 4
},
"local_time": {
"time_t": 1710975545,
"asctime": "Wed Mar 20 18:59:05 2024 EDT"
},
"device": {
"name": "/dev/disk0",
"info_name": "/dev/disk0",
"type": "nvme",
"protocol": "NVMe"
},
"model_name": "APPLE SSD AP1024Q",
"serial_number": "xxx",
"firmware_version": "373.100.",
"nvme_pci_vendor": {
"id": 4203,
"subsystem_id": 4203
},
"nvme_ieee_oui_identifier": 0,
"nvme_controller_id": 0,
"nvme_version": {
"string": "<1.2",
"value": 0
},
"nvme_number_of_namespaces": 3,
"smart_support": {
"available": true,
"enabled": true
},
"smart_status": {
"passed": true,
"nvme": {
"value": 0
}
},
"nvme_smart_health_information_log": {
"critical_warning": 0,
"temperature": 31,
"available_spare": 100,
"available_spare_threshold": 99,
"percentage_used": 7,
"data_units_read": 363901154,
"data_units_written": 274998549,
"host_reads": 6290253662,
"host_writes": 3321280137,
"controller_busy_time": 0,
"power_cycles": 287,
"power_on_hours": 2778,
"unsafe_shutdowns": 20,
"media_errors": 0,
"num_err_log_entries": 0
},
"temperature": {
"current": 31
},
"power_cycle_count": 287,
"power_on_time": {
"hours": 2778
}
} However, when calling the exporter's /metrics endpoint, only the following is returned:
Note the complete absence of any metrics. By removing Line 67 in d756b26
I am not a Go expert, but I believe this is occurring because of this area: Lines 155 to 157 in d756b26
...which, by returning false due to the present error message, causes the upstream Lines 100 to 109 in d756b26
|
I can confirm this behaviour with set of old SSDs (5y+) because they don't support error log. Metrics are not collected because exit status value of 4. Running smartctl without '--log=error' or with '--quietmode=errorsonly' returns metrics with exit status value of 0. Maybe adding commandline option to disable error log argument to smartctl will solve this and related issues? |
https://github.com/prometheus-community/smartctl_exporter/pull/131/files#diff-c249026c4aeb110469ab01c3170ce12c5a292612584702fd378d42a1c868a686
This introduced a bug where smartctl return code is 4 instead of 0 if any messages were present in the call.
We also need to be using
--quietmode=errorsonly
in addition with--log=error
:https://linux.die.net/man/8/smartctl
errorsonly - only print: For the '-l error' option, if nonzero, the number of errors recorded in the SMART error log and the power-on time when they occurred; For the '-l selftest' option, errors recorded in the device self-test log; For the '-H' option, SMART "disk failing" status or device Attributes (pre-failure or usage) which failed either now or in the past; For the '-A' option, device Attributes (pre-failure or usage) which failed either now or in the past.
The text was updated successfully, but these errors were encountered: