Skip to content

Interact with a Linux watchdog driver using Python.

License

Notifications You must be signed in to change notification settings

AT0myks/pywatchdog

Repository files navigation

pywatchdog

Python versions PyPI Code style: black License

What's a watchdog?

A watchdog (or watchdog timer, WDT) is a hardware component that reboots your system if it is not notified regularly from the user space, usually by a daemon. If user space fails for any reason, the watchdog will stop being notified and when the timeout occurs, it will reboot the system.

On the software side, once the kernel driver is loaded, communication with the hardware is done via the special device file /dev/watchdog.

It is useful when you have for example a remote system that you don't have physical access to, and you want to ensure that it will not freeze and become unavailable indefinitely until you get physical access again.

Requirements

Installation

Using pip:

sudo -H pip install pywatchdog

For a specific Python version if you have multiple:

sudo -H python3.x -m pip install pywatchdog

Directly from the repository for the latest changes:

sudo -H pip install git+https://github.com/AT0myks/pywatchdog
sudo -H python3.x -m pip install git+https://github.com/AT0myks/pywatchdog

Usage

API

The recommended usage is with a context manager:

from pywatchdog import Watchdog
with Watchdog() as wdt:
    ...

Watchdog takes as an argument the device file. It defaults to /dev/watchdog, which should work in most cases.

You can also manually open and close the file:

wdt = Watchdog()
wdt.open()
...
wdt.close()

Once the file is open, if the watchdog is not pinged before the timeout occurs, the computer will reboot.

To ping the watchdog use the keep_alive method:

wdt.keep_alive()

The timeout can be queried, and for some drivers it can also be modified on the fly:

print("The timeout is", wdt.timeout, "seconds")
wdt.timeout = 45  # Has no effect if unsupported.
print("The timeout is now", wdt.timeout, "seconds")

Note that some devices have a granularity of minutes for their timeout. In this case the example above will print The timeout is now 60 seconds.

Some watchdog drivers have the ability to report the remaining time before the system will reboot:

print("Time left before reboot:", wdt.time_left, "seconds")  # Prints None if unsupported.

Here's the rest of properties. The only one you can set is pretimeout:

print(wdt.identity)  # A string identifying the watchdog driver. Example: 'iTCO_wdt'.
print(wdt.firmware_version)  # Shortcut for wdt.support.firmware_version.
print(wdt.options)  # A list of flags describing what the device supports.
print(wdt.pretimeout)  # Returns the pretimeout in seconds.
wdt.pretimeout = 15  # Seconds. Set to 0 to disable. Unsettable for some drivers.
print(wdt.support)  # Returns an instance of watchdog_info.
print(wdt.status)  # Current status, not always supported.
print(wdt.boot_status)  # Status at the last reboot, not always supported.
print(wdt.temperature)  # In Fahrenheit, not always supported.

If a property is not supported by a driver, None is returned.

Some watchdogs can be enabled and disabled:

wdt.enable()  # Turn on the watchdog timer.
wdt.disable()  # Turn off the watchdog timer.

And a few drivers have the ability to cause a kernel panic when the system overheats:

wdt.temp_panic()

Once this option is enabled, it cannot be disabled without removing the kernel module and readding it. Also, its status cannot be queried.

See the docstrings for more information on the properties and methods.

CLI

This package also provides a pywatchdog command. It has two subcommands:

find-module

Get a list of potential kernel watchdog drivers for your system.

usage: pywatchdog find-module [-a]

optional arguments:
  -a, --all   only look for the presence of /dev/watchdog

This is done by trying every module in /lib/modules/$(uname -r)/kernel/drivers/watchdog and testing which ones, when inserted, result in the availability of both /dev/watchdog and /dev/watchdog0. When the --all option is specified, it only looks for /dev/watchdog.

Since I don't know the difference between having just /dev/watchdog and having both, the --all option is provided just in case.

The softdog module is always ignored because it is a software watchdog and should work for all systems anyway. You should always use a hardware watchdog when possible.

test-reboot

Test if a watchdog correctly reboots your computer.

usage: pywatchdog test-reboot [-t TIMEOUT] [-d DEVICE]

optional arguments:
  -t TIMEOUT, --timeout TIMEOUT    the timeout that will be set for the test
  -d DEVICE, --device DEVICE       default: /dev/watchdog

This will open the device file without notifying the watchdog, which means that when the timeout occurs, the computer should reboot.

A countdown before reboot will be shown. Make sure to save your work before using this command. Use Ctrl-C to abort.

Example

This example assumes you are on a distribution with systemd.

Two files are provided to create an example daemon: a script and a systemd service file. The script is a simple infinite loop that pings the watchdog every second. The service file runs the script as root and makes sure it is restarted in case it fails, and it can also be used to load the watchdog module at boot.

In this example we are in the home directory of user ben and the watchdog module is iTCO_wdt.

Download the two files:

wget https://raw.githubusercontent.com/AT0myks/pywatchdog/main/example.py
wget https://raw.githubusercontent.com/AT0myks/pywatchdog/main/example.service

Let's rename them to something more meaningful:

mv example.py pywatchdog_daemon.py
mv example.service pywatchdog.service

Make the script executable:

chmod +x pywatchdog_daemon.py

Move the service file to /etc/systemd/system/:

sudo mv -i pywatchdog.service /etc/systemd/system/

Edit the [Service] section of the service file with

sudo systemctl edit --full pywatchdog

and specify the correct path to the script:

ExecStart=/home/ben/pywatchdog_daemon.py

If the watchdog module is not loaded at boot (should not be the case for a Raspberry Pi), you can use the service file to do it (please see the note at the bottom of this section) by adding these two lines under [Service]:

ExecStartPre=/usr/sbin/modprobe iTCO_wdt
ExecStopPost=/usr/sbin/modprobe -r iTCO_wdt

You can find the absolute path of modprobe with which modprobe.

Now enable (start automatically at boot) and start the watchdog daemon:

sudo systemctl enable pywatchdog
sudo systemctl start pywatchdog

You can now see the status of the daemon:

systemctl status pywatchdog

And its logs:

journalctl -u pywatchdog

Note: because of the hard blacklisting of watchdog modules (at least on Ubuntu), we use the service file to load a module at boot instead of the regular ways (like /etc/modules). See for example here and here.

Do I have a watchdog?

If you have a /dev/watchdog file, then yes. This should already be the case if for example you are working with a Raspberry Pi. You can also check by typing the command sudo wdctl. If you get wdctl: cannot open /dev/watchdog: No such file or directory, keep reading.

If you don't have a /dev/watchdog file, chances are you have a watchdog but need to find the appropriate kernel module to load for your specific hardware. The modules that work in my case are sp5100_tco for a computer with (pretty recent, if it matters) AMD hardware and iTCO_wdt for another with Intel hardware. Once you've found the module, load it with sudo modprobe <module>.

You can also use the pywatchdog find-module command.

You should avoid loading a watchdog module if you don't have a daemon feeding it, to prevent unexpected reboots.

Contributing

You are welcome to contribute to this project. Feel free to open an issue or a pull request if you encounter any bug or want to add/fix something.

Based on