| Version 7 (modified by , 5 years ago) ( diff ) |
|---|
Gateworks boards provide both a hardware boot watchdog timer that power cycles the board if boot firmware failed to run as well as SoC watchdogs.
Please see details below on the various differences between hardware watchdogs available and software watchdog daemons.
Terminology used here:
This is the most bulletproof watchdog because it runs on the Gateworks System Controller and results in a power-cycle of the board's primary power supply when tripped. Note that this feature is Gateworks specific.
Deficiencies of CPU/SoC watchdogs:
In contrast the GSC boot watchdog benefits are:
For more info:
The Cavinum CN80XX SoC uses the ARM SBSA watchdog. The CN80XX external watchdog reset output is also an input to the GSC so that the GSC can power-cycle the board if the SoC watchdog expires.
The linux kernel driver(drivers/watchdog/sbsa_gwdt.c) defaults to a 10 second timeout
The IMX6 SoC watchdog has an 8bit timeout configuration ranging from 500ms to 128s in 500ms intervals and will issue a chip-level SoC reset. On some boards an external output can also be present to reset other peripherals.
The linux kernel driver (drivers/watchdog/imx2_wdt.c) defaults to 60 seconds and allows a timeout period between 1 and 128 seconds.
Due to some IMX6 chip errata resulting in occasional boot failures when booting from NAND flash (which is used as the primary boot device on all Ventana boards) a GSC 'boot' watchdog is used in a special mode to protect against boot failures. In this mode, the GSC 'boot' watchdog is disabled in the bootloader before launching the OS. If the GSC watchdog is enabled (not to be confused with the GSC 'boot' watchdog which can not be disabled) then the watchdog remains enabled from power-up and must be handled by software in the OS to avoid tripping.
The cns3xxx SoC has a 32bit count-down timer watchdog provided by the ARM11-MPCORE will issue a chip-level reset. An output from the cns3xxx is also used to reset other board peripherals such as the NOR FLASH.
The linux kernel driver (drivers/watchdog/mpcore_wdt.c) defaults to 60 seconds and allows a timeout period between 0 and 65536 seconds.
The software side of a watchdog involves the software that is responsible for periodically resetting the watchdog timer (aka tickling or petting) to avoid it triggering. This can be as simple as resetting it based on a timer (without any additional checks) or can be very complex based on a series of complicated system checks.
This is not to be confused with the concept of a 'software watchdog' which is simply code that will perform checks and issue a soft reboot if they are not met. This is usually useful when using boards that have no hardware watchdog(s) available which is not the case for Gateworks products.
The rule of thumb is typically to tickle the watchdog at least twice as fast as its timeout however you may find that you want to increase this frequency if you are heavily loading your system and the watchdog process is not getting enough attention (this varies greatly on your CPU, application load, and kernel configuration).
The Linux kernel has a watchdog driver API that can be implemented to provide a common userspace API to a hardware watchdog.
Most Linux watchdog drivers have a nowayout kernel parameter which can be defaulted at build time via the kernel config CONFIG_WATCHDOG_NOWAYOUT or passed in via a parameter during module loading or via bootargs. Drivers that support this should display the nowayout setting upon driver init. If nowayout=1 the driver does not allow the watchdog to be disabled (no way out of the situation). This is desireable in high reliability cases as the normal API behavior is to start the watchdog when /dev/watchdog is opened by the userspace app, and stop/disable the watchdog when it is closed (which can happen if the userspace watchdog process is killed or even crashes).
Example trying to kill the watchdog:
root@ventana:~# ps
PID USER VSZ STAT COMMAND
1 root 1676 S init [5]
2 root 0 SW [kthreadd]
3 root 0 SW [ksoftirqd/0]
....
467 root 1720 S watchdog
root@ventana:~# kill -9 467
[ 49.320282] watchdog watchdog0: nowayout prevents watchdog being stopped!
[ 49.327081] watchdog watchdog0: watchdog did not stop!
For more info:
The traditional linux userspace watchdog daemon (such as http://watchdog.sourceforge.net/) is an example of a very full featured watchdog daemon that can be configured to do controlled shutdowns before tipping the watchdog and can add all kinds of system level checks which need to pass before ticking the watchdog such as:
On the Gateworks Ubuntu based pre-built images the watchdog package is installed and configured to start on boot as below, however if you are using a different OS you may need to do these steps manually.
To configure the watchdog daemon to run on boot, setup the watchdog for a 30sec timeout and simply tickle the watchdog every 5 seconds:
cat << EOF > /etc/watchdog.conf watchdog-device = /dev/watchdog realtime = yes priority = 1 interval = 5 watchdog-timeout = 30 EOF
cat << EOF > /etc/init.d/watchdog #!/bin/sh watchdog EOF chmod +x /etc/init.d/watchdog
update-rc.d watchdog defaults 1
sync
For more details on configuring the traditional linux userspace watchdog see the man pages:
While older versions of OpenWrt used the watchdog daemon from busybox, newer versions (including the Gateworks BSP's from 13-06 and forward) implement the watchdog daemon via procd, which is the init process (PID1). Therefore on modern OpenWrt, you will never see the watchdog process when doing a ps.
Note that the procd watchdog functionality does not implement any specific system checks - if procd is simply running, it will tickle/reset the watchdog based on its configured period.
The procd watchdog code always uses the primary watchdog device /dev/watchdog. You can configure what watchdog that is (ie GSC Watchdog or SoC watchdog) by disabling all but the desired watchdog in the kernel configuration.
You can see the current configuration of the watchdog service via ubus:
root@OpenWrt:/# ubus call system watchdog { "status": "running", "timeout": 30, "frequency": 5 }
While there is no uci configuration available for these options you could change them in an rc script such as rc.local if you wish:
ubus call system watchdog '{ "timeout": 60 }' # change to 60s timeout ubus call system watchdog '{ "frequency": 1 }' # change to 1s frequency
To sop the service:
ubus call system watchdog '{ "stop": true }' # watchdog will cause a reset after it expires
The Android OS watchdog daemon is /sbin/watchdog and is implemented in /system/core/init/watchdogd.c. It is kicked off by init and does not perform any specific checks.