- Fixes for AMD GPUs
- Added
-once
flag to gwagent, which is useful for testing (gathers metrics, dumps to stdout, and exits)
- Changes for Petrel v0.36
- Changes for Nvidia GPU data collection (because we have to scrape
nvidia-smi
rather than usesysfs
)
- Petrel dependency updated to 0.36.0
- CPU and GPU temps now exported to current, hourly, and daily files
jq
added to container image
- Nvidia cards too old for the installed driver now have their model
name collected from
lspci
- Host missing time calculation fixed
- Javascript split into its own source file
- DB table
daily
was accumulating values every 48h rather than every 24. This has been fixed - DB retention now honors config settings
- Removed
localhost
and port number from URL construction inside script; everything is now purely relative to the httpd root
- Metrics stash moved from /var/run to /var/lib
- Fixes bad dates in missing host notification
- Give page a title, finally
- Improved Intel CPU name handling
- Initial pass at info gathering for Intel iGPUs
- Further shortening of CPU descs
- Re-added CPU thread count
- Placeholders for null values from GPUs
gofmt
fixes
-
nodeStatus now begins with all hosts from all db tables
-
A crashing bug due to new nodes coming online was fixed
-
The
alerts
section of the config file has been removed, along with its struct. A new section,log
has been added -
gwgather now logs to a file, and log level is settable via config file
- CPU temps are now pulled from Intel processors
- Alerts are now set for nodes which have been seen, but have fallen out of the current system data
- Simplified data handling for adding metrics to DB
- Dead code removal
- Removed unneeded node update tracking code + vars
- Hovering over 'Last' now shows update time as a tooltip
- Enabled GPU data gathering for AMD GPUs
- Tweaked data formating for Nvidia GPUs; all GPUs will be have identical formatting from here on out
- Enabled GPU info reporting, beginning with Nvidia cards
- Metrics stashing dir is (re)created if it doesn't exist, and there is an attept to stash
- GPU data now shown in system stats
- Removed average clock speed from CPUdata struct
- Config file attribute
over_temp
changed totemp_warn
, and new attributetemp_crit
was added
- Changed JSON output filenames
- Generally simplified data handling
- Each node's time since last checkin is now shown
- Hovering over Clock displays a tooltip with per-core clocks
- CPU temp displays a warning status at 80C; critical at 90C
- Last seen displays a critical status if a node's checkin time is more than 210 seconds ago (90s plus the data refresh period of 120s)
- Finished removal of gwquery and its supporting code
- Removed JSON dump functionality from gwgather; it has been placed in a new utility named gwdump which runs periodicaly in gwgather's container
- Average clock now computed and stored in DB
- Loadavg is now a single value (1 minute)
- Improvements in avoiding unnecessary JSON (un)marshalling stemming from hasty BadgerDB -> SQLite rewrite
- Go sqlite lib now built with JSON1 extension
- Gnatwren is now fully discrete from Homefarm
- gwgather is containerized
- Small fixes to CPU temp reporting for k10 driver
- Machine archtecture is now gathered and reported
- Hostname is only checked once
- gwgather now exports CPU temp data to a JSON file every 5 minutes
- An HTML page which renders CPU temp data using the echarts lib has been added -- the beginning of data visualization efforts
- 'dbstatus' added to gwquery
- gwgather now makes DB GC calls on startup, and every 45 minutes subsequently
- Petrel handlers split out into petrel.go
-
gwgather's nodeStatus map now stores two timestamps: most recent checkin and newest metrics. When nodes are reporting stowed metrics, these will differ
-
gwquery updated to handle AgentStatus, a new type derived from AgentPayload, which includes the most recent checkin timestamp
- gwgather now stores the past 24h of reported metrics
- Crash-on-halt fixed in gwgather
- gwagent now stows unsendable metrics to a file, instead of dying. Stowed metrics are sent when connectivity to gwgather recovers
- Go 1.16 is now required, as ioutil dependencies have been removed
- gwgather now exists, and aggregates/reports data
- gwagent is now properly an agent (client reporting to a server), rather than a server
- gwquery now requests data from gwgather rather than polling all agents
- gwagent now takes a config file
- Changed memory to show available and freeable
- Added loadavg
- Pathfinder/demo implmentation; production design will work in a completely different way
- Agent: memory, uptime, cpu clocks and temp gathering on AMD Ryzen and Raspberry Pi. Uses only /proc and /sys data; no calls to external processes; no third-party golang deps
- Query: simple, static, "on my machine" only implementation. Can gather metrics from all machines in 0.14 seconds