Changes between Initial Version and Version 1 of watchdog

10/22/2017 05:28:45 AM (5 years ago)



  • watchdog

    v1 v1  
     4= Watchdog Timer =
     5Gateworks boards provide a hardware watchdog timer that tickles (resets) the hardware device every x amount of seconds. If the hardware watchdog device is not tickled within Y amount of seconds, then reboot.
     7Please see details below on the various differences between hardware watchdogs available and software watchdog daemons.
     9Terminology used here:
     10 * SoC (System on Chip) - refers to the chip containing the CPU core as well as vendor peripherals. This is often referred to as the CPU but technically the IXP4xx/CSN3xxx/IMX6 chips used on Gateworks products are SoC's which marry ARM CPU cores with other perhiperhals inside the chip.
     11 * resetting the watchdog (aka tickling or petting) refers to restarting the 'countdown timer' in a watchdog.
     12 * watchdog reset, trigger or expiration - the event that occurs when the internal countdown timer of a watchdog expires which usually results in a chip-level reset, a board-level reset, or a board power-cycle depending on the board and watchdog used.
     13 * timeout or timeout period - the time before the watchdog will trigger.
     14 * frequency - the period at which the watchdog will be reset or tickled.
     17== Hardware ==
     19=== GSC Watchdog ===
     20This is the most bulletproof watchdog because it runs on the Gateworks System Controller and results in a power-cycle of the board's primary power supply when tripped. Note that this feature is Gateworks specific.
     22Deficiencies of CPU/SoC watchdogs:
     23 * they are not enabled at powerup and often not enabled until fairly late when the Linux kernel driver that controls them initializes so if the board hangs (because of software issues or even CPU chip errata) before that, they do not help.
     24 * they issue a chip-level reset. Depending on the CPU and board design they may also assert an output signal from the chip, but often this does not or can not reach every peripheral chip in the system. This can result in hangs following chip-level reset.
     26In contrast the GSC watchdog benfits are:
     27 * when expired it momentarily disable the board's primary power supply thus acting as a full board power cycle.
     28 * is enabled when the board comes out of reset thus it can protect against any software or hardware issue from power-on until your software starts monitoring itself. The enabling of the watchdog is still configurable via a GSC register but because those registers are battery backed by the GSC coin-cell and non-volatile once enabled it stays enabled
     30For more info:
     31 * [wiki:gsc#HardwareWatchdog GSC Watchdog]
     32 * [wiki:gsc#GSCDrivers GSC Drivers]
     35=== Ventana (imx6) CPU watchdog ===
     36The IMX6 SoC watchdog has an 8bit timeout configuration ranging from 500ms to 128s in 500ms intervals and will issue a chip-level SoC reset. On some boards an external output can also be present to reset other peripherals.
     38The linux kernel driver ({{{drivers/watchdog/imx2_wdt.c}}}) defaults to 60 seconds and allows a timeout period between 1 and 128 seconds.
     40Due to some IMX6 chip errata resulting in occasional boot failures when booting from NAND flash (which is used as the primary boot device on all Ventana boards) a GSC 'boot' watchdog is used in a special mode to protect against boot failures. In this mode, the GSC 'boot' watchdog is disabled in the bootloader before launching the OS. If the GSC watchdog is enabled (not to be confused with the GSC 'boot' watchdog which can not be disabled) then the watchdog remains enabled from power-up and must be handled by software in the OS to avoid tripping.
     43=== Laguna (cns3xxx) CPU watchdog ===
     44The cns3xxx SoC has a 32bit count-down timer watchdog provided by the ARM11-MPCORE will issue a chip-level reset. An output from the cns3xxx is also used to reset other board peripherals such as the NOR FLASH.
     46The linux kernel driver ({{{drivers/watchdog/mpcore_wdt.c}}}) defaults to 60 seconds and allows a timeout period between 0 and 65536 seconds.
     49=== Cambria (ixp43x) CPU watchdog ===
     50The ixp43x SoC watchdog available on the Cambria products has a 32bit watchdog with 66MHz ticks thus can have a timeout value of 0 to 65 seconds.
     52The linux kernel driver ({{{drivers/watchdog/ixp4xx_wdt.c}}}) defaults to 60 seconds and allows a timeout period between 0 and 60 seconds.
     55== Software ==
     56The software side of a watchdog involves the software that is responsible for periodically resetting the watchdog timer (aka tickling or petting) to avoid it triggering. This can be as simple as resetting it based on a timer (without any additional checks) or can be very complex based on a series of complicated system checks.
     58This is not to be confused with the concept of a 'software watchdog' which is simply code that will perform checks and issue a soft reboot if they are not met. This is usually useful when using boards that have no hardware watchdog(s) available which is not the case for Gateworks products.
     60The rule of thumb is typically to tickle the watchdog at least twice as fast as its timeout however you may find that you want to increase this frequency if you are heavily loading your system and the watchdog process is not getting enough attention (this varies greatly on your CPU, application load, and kernel configuration).
     63=== Linux Kernel Drivers and nowayout ===
     64The Linux kernel has a [ watchdog driver API] that can be implemented to provide a common userspace API to a hardware watchdog.
     66Most Linux watchdog drivers have a {{{nowayout}}} kernel parameter which can be defaulted at build time via the kernel config {{{CONFIG_WATCHDOG_NOWAYOUT}}} or passed in via a parameter during module loading or via bootargs. Drivers that support this should display the nowayout setting upon driver init. If {{{nowayout=1}}} the driver does not allow the watchdog to be disabled (no way out of the situation). This is desireable in high reliability cases as the normal API behavior is to start the watchdog when {{{/dev/watchdog}}} is opened by the userspace app, and stop/disable the watchdog when it is closed (which can happen if the userspace watchdog process is killed or even crashes).
     69 * GSC watchdog driver with nowayout disabled
     72[    0.000000] Kernel command line: console=ttymxc1,115200 root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs
     73[    3.949752] gsc_wdt watchdog.39: registered watchdog (nowayout=0)
     75 * enabling nowayout at runtime for GSC watchdog driver:
     78Ventana> setenv extra 'gsc_wdt.nowayout=1'
     79Ventana> boot
     81[    0.000000] Kernel command line: console=ttymxc1,115200 root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs gsc_wdt.nowayout=1
     82[    3.949752] gsc_wdt watchdog.39: registered watchdog (nowayout=1)
     85Example trying to kill the watchdog:
     88root@ventana:~# ps
     90    1 root      1676 S    init [5]
     91    2 root         0 SW   [kthreadd]
     92    3 root         0 SW   [ksoftirqd/0]
     94  467 root      1720 S    watchdog
     97root@ventana:~# kill -9 467
     98[   49.320282] watchdog watchdog0: nowayout prevents watchdog being stopped!
     99[   49.327081] watchdog watchdog0: watchdog did not stop!
     103For more info:
     104 * [ Linux Kernel Watchdog driver API]
     107=== Yocto BSP Traditional Linux userpsace watchdog daemon ===
     108The traditional linux userspace watchdog daemon (such as is an example of a very full featured watchdog daemon that can be configured to do controlled shutdowns before tipping the watchdog and can add all kinds of system level checks which need to pass before ticking the watchdog such as:
     109 * temperature checks
     110 * load checks
     111 * memory usage checks
     112 * process checks
     113 * network checks
     115Note that the watchdog daemon is installed but not configured to run on boot. To configure it to run on boot, setup the watchdog for a 30sec timeout and simply tickle the watchdog every 5 seconds:
     116 1. Create a conf file:
     119cat << EOF > /etc/watchdog.conf
     120watchdog-device = /dev/watchdog
     121realtime = yes
     122priority = 1
     123interval = 5
     124watchdog-timeout = 30
     127 2. create an executable init script:
     130cat << EOF > /etc/init.d/watchdog
     135chmod +x /etc/init.d/watchdog
     137 3. configure as a system service to start on boot, priority 1:
     140update-rc.d watchdog defaults 1
     143 * see [wiki:Yocto/services] for more info on services
     146For more details on configuring the traditional linux userspace watchdog see the man pages:
     147 *
     148 *
     151=== OpenWrt procd Watchdog (Modern OpenWrt used in 14-08 BSP and forward) ===
     152While older versions of OpenWrt used the watchdog daemon from busybox, newer versions (including the Gateworks BSP's from 13-06 and forward) implement the watchdog daemon via procd, which is the init process (PID1). Therefore on modern OpenWrt, you will never see the watchdog process when doing a ps.
     154Note that the procd watchdog functionality does not implement any specific system checks - if procd is simply running, it will tickle/reset the watchdog based on its configured period.
     156The procd watchdog code always uses the primary watchdog device {{{/dev/watchdog}}}. You can configure what watchdog that is (ie GSC Watchdog or SoC watchdog) by disabling all but the desired watchdog in the kernel configuration.
     158You can see the current configuration of the watchdog service via {{{ubus}}}:
     161root@OpenWrt:/# ubus call system watchdog
     163        "status": "running",
     164        "timeout": 30,
     165        "frequency": 5
     169While there is no uci configuration available for these options you could change them in an rc script such as rc.local if you wish:
     172ubus call system watchdog '{ "timeout": 60 }'   # change to 60s timeout
     173ubus call system watchdog '{ "frequency": 1 }'   # change to 1s frequency
     176To sop the service:
     179ubus call system watchdog '{ "stop": true }'   # watchdog will cause a reset after it expires
     183=== OpenWrt busybox watchdog (Older OpenWrt used in 13-06 BSP and older) ===
     184Older OpenWrt versions such as the one used in the Gateworks 13-06 BSP use the busybox watchdog daemon. This daemon is similar to the procd based watchdog daemon in modern OpenWrt as it does not do any specific system checks - if the service is running it will tickle/reset the watchdog based on its configured period.
     186There is no uci configuration for the period or frequency, they are hard-coded in the {{{/etc/init.d/watchdog}}} init script for a default timeout of 60 seconds and a tickle frequency of 5 seconds.
     191Usage: watchdog [-t N[ms]] [-T N[ms]] [-F] DEV
     193Periodically write to watchdog device DEV
     196        -T N    Reboot after N seconds if not reset (default 60)
     197        -t N    Reset every N seconds (default 30)
     198        -F      Run in foreground
     200Use 500ms to specify period in milliseconds
     207watchdog -t 30 -T 60 /dev/watchdog # reset the watchdog every 30 seconds with 60 second timeout
     211=== Android watchdog daemon ===
     212The Android OS watchdog daemon is {{{/sbin/watchdog}}} and is implemented in {{{/system/core/init/watchdogd.c}}}. It is kicked off by init and does not perform any specific checks.
     214The Ventana Android BSP configures the watchdog for a 30 second timer and resets it every 10 seconds. If the GSC watchdog is enabled it will be used, otherwise the IMX6 SoC watchdog will be used.