Changes between Initial Version and Version 1 of watchdog


Ignore:
Timestamp:
10/22/2017 05:28:45 AM (6 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • watchdog

    v1 v1  
     1[[PageOutline]]
     2
     3
     4= Watchdog Timer =
     5Gateworks boards provide a hardware watchdog timer that tickles (resets) the hardware device every x amount of seconds. If the hardware watchdog device is not tickled within Y amount of seconds, then reboot.
     6
     7Please see details below on the various differences between hardware watchdogs available and software watchdog daemons.
     8
     9Terminology used here:
     10 * SoC (System on Chip) - refers to the chip containing the CPU core as well as vendor peripherals. This is often referred to as the CPU but technically the IXP4xx/CSN3xxx/IMX6 chips used on Gateworks products are SoC's which marry ARM CPU cores with other perhiperhals inside the chip.
     11 * resetting the watchdog (aka tickling or petting) refers to restarting the 'countdown timer' in a watchdog.
     12 * watchdog reset, trigger or expiration - the event that occurs when the internal countdown timer of a watchdog expires which usually results in a chip-level reset, a board-level reset, or a board power-cycle depending on the board and watchdog used.
     13 * timeout or timeout period - the time before the watchdog will trigger.
     14 * frequency - the period at which the watchdog will be reset or tickled.
     15
     16
     17== Hardware ==
     18
     19=== GSC Watchdog ===
     20This is the most bulletproof watchdog because it runs on the Gateworks System Controller and results in a power-cycle of the board's primary power supply when tripped. Note that this feature is Gateworks specific.
     21
     22Deficiencies of CPU/SoC watchdogs:
     23 * they are not enabled at powerup and often not enabled until fairly late when the Linux kernel driver that controls them initializes so if the board hangs (because of software issues or even CPU chip errata) before that, they do not help.
     24 * they issue a chip-level reset. Depending on the CPU and board design they may also assert an output signal from the chip, but often this does not or can not reach every peripheral chip in the system. This can result in hangs following chip-level reset.
     25
     26In contrast the GSC watchdog benfits are:
     27 * when expired it momentarily disable the board's primary power supply thus acting as a full board power cycle.
     28 * is enabled when the board comes out of reset thus it can protect against any software or hardware issue from power-on until your software starts monitoring itself. The enabling of the watchdog is still configurable via a GSC register but because those registers are battery backed by the GSC coin-cell and non-volatile once enabled it stays enabled
     29
     30For more info:
     31 * [wiki:gsc#HardwareWatchdog GSC Watchdog]
     32 * [wiki:gsc#GSCDrivers GSC Drivers]
     33
     34
     35=== Ventana (imx6) CPU watchdog ===
     36The IMX6 SoC watchdog has an 8bit timeout configuration ranging from 500ms to 128s in 500ms intervals and will issue a chip-level SoC reset. On some boards an external output can also be present to reset other peripherals.
     37
     38The linux kernel driver ({{{drivers/watchdog/imx2_wdt.c}}}) defaults to 60 seconds and allows a timeout period between 1 and 128 seconds.
     39
     40Due to some IMX6 chip errata resulting in occasional boot failures when booting from NAND flash (which is used as the primary boot device on all Ventana boards) a GSC 'boot' watchdog is used in a special mode to protect against boot failures. In this mode, the GSC 'boot' watchdog is disabled in the bootloader before launching the OS. If the GSC watchdog is enabled (not to be confused with the GSC 'boot' watchdog which can not be disabled) then the watchdog remains enabled from power-up and must be handled by software in the OS to avoid tripping.
     41
     42
     43=== Laguna (cns3xxx) CPU watchdog ===
     44The cns3xxx SoC has a 32bit count-down timer watchdog provided by the ARM11-MPCORE will issue a chip-level reset. An output from the cns3xxx is also used to reset other board peripherals such as the NOR FLASH.
     45
     46The linux kernel driver ({{{drivers/watchdog/mpcore_wdt.c}}}) defaults to 60 seconds and allows a timeout period between 0 and 65536 seconds.
     47
     48
     49=== Cambria (ixp43x) CPU watchdog ===
     50The ixp43x SoC watchdog available on the Cambria products has a 32bit watchdog with 66MHz ticks thus can have a timeout value of 0 to 65 seconds.
     51
     52The linux kernel driver ({{{drivers/watchdog/ixp4xx_wdt.c}}}) defaults to 60 seconds and allows a timeout period between 0 and 60 seconds.
     53
     54
     55== Software ==
     56The software side of a watchdog involves the software that is responsible for periodically resetting the watchdog timer (aka tickling or petting) to avoid it triggering. This can be as simple as resetting it based on a timer (without any additional checks) or can be very complex based on a series of complicated system checks.
     57
     58This is not to be confused with the concept of a 'software watchdog' which is simply code that will perform checks and issue a soft reboot if they are not met. This is usually useful when using boards that have no hardware watchdog(s) available which is not the case for Gateworks products.
     59
     60The rule of thumb is typically to tickle the watchdog at least twice as fast as its timeout however you may find that you want to increase this frequency if you are heavily loading your system and the watchdog process is not getting enough attention (this varies greatly on your CPU, application load, and kernel configuration).
     61
     62
     63=== Linux Kernel Drivers and nowayout ===
     64The Linux kernel has a [http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/watchdog/watchdog-api.txt watchdog driver API] that can be implemented to provide a common userspace API to a hardware watchdog.
     65
     66Most Linux watchdog drivers have a {{{nowayout}}} kernel parameter which can be defaulted at build time via the kernel config {{{CONFIG_WATCHDOG_NOWAYOUT}}} or passed in via a parameter during module loading or via bootargs. Drivers that support this should display the nowayout setting upon driver init. If {{{nowayout=1}}} the driver does not allow the watchdog to be disabled (no way out of the situation). This is desireable in high reliability cases as the normal API behavior is to start the watchdog when {{{/dev/watchdog}}} is opened by the userspace app, and stop/disable the watchdog when it is closed (which can happen if the userspace watchdog process is killed or even crashes).
     67
     68Example:
     69 * GSC watchdog driver with nowayout disabled
     70{{{
     71#!bash
     72[    0.000000] Kernel command line: console=ttymxc1,115200 root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs
     73[    3.949752] gsc_wdt watchdog.39: registered watchdog (nowayout=0)
     74}}}
     75 * enabling nowayout at runtime for GSC watchdog driver:
     76{{{
     77#!bash
     78Ventana> setenv extra 'gsc_wdt.nowayout=1'
     79Ventana> boot
     80...
     81[    0.000000] Kernel command line: console=ttymxc1,115200 root=ubi0:rootfs ubi.mtd=2 rootfstype=ubifs gsc_wdt.nowayout=1
     82[    3.949752] gsc_wdt watchdog.39: registered watchdog (nowayout=1)
     83}}}
     84
     85Example trying to kill the watchdog:
     86{{{
     87#!bash
     88root@ventana:~# ps
     89  PID USER       VSZ STAT COMMAND
     90    1 root      1676 S    init [5]
     91    2 root         0 SW   [kthreadd]
     92    3 root         0 SW   [ksoftirqd/0]
     93....
     94  467 root      1720 S    watchdog
     95
     96
     97root@ventana:~# kill -9 467
     98[   49.320282] watchdog watchdog0: nowayout prevents watchdog being stopped!
     99[   49.327081] watchdog watchdog0: watchdog did not stop!
     100}}}
     101
     102
     103For more info:
     104 * [http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/watchdog/watchdog-api.txt Linux Kernel Watchdog driver API]
     105
     106
     107=== Yocto BSP Traditional Linux userpsace watchdog daemon ===
     108The traditional linux userspace watchdog daemon (such as http://watchdog.sourceforge.net/) is an example of a very full featured watchdog daemon that can be configured to do controlled shutdowns before tipping the watchdog and can add all kinds of system level checks which need to pass before ticking the watchdog such as:
     109 * temperature checks
     110 * load checks
     111 * memory usage checks
     112 * process checks
     113 * network checks
     114
     115Note that the watchdog daemon is installed but not configured to run on boot. To configure it to run on boot, setup the watchdog for a 30sec timeout and simply tickle the watchdog every 5 seconds:
     116 1. Create a conf file:
     117{{{
     118#!bash
     119cat << EOF > /etc/watchdog.conf
     120watchdog-device = /dev/watchdog
     121realtime = yes
     122priority = 1
     123interval = 5
     124watchdog-timeout = 30
     125EOF
     126}}}
     127 2. create an executable init script:
     128{{{
     129#!bash
     130cat << EOF > /etc/init.d/watchdog
     131#!/bin/sh
     132
     133watchdog
     134EOF
     135chmod +x /etc/init.d/watchdog
     136}}}
     137 3. configure as a system service to start on boot, priority 1:
     138{{{
     139#!bash
     140update-rc.d watchdog defaults 1
     141sync
     142}}}
     143 * see [wiki:Yocto/services] for more info on services
     144
     145
     146For more details on configuring the traditional linux userspace watchdog see the man pages:
     147 * http://linux.die.net/man/8/watchdog
     148 * http://linux.die.net/man/5/watchdog.conf
     149
     150
     151=== OpenWrt procd Watchdog (Modern OpenWrt used in 14-08 BSP and forward) ===
     152While older versions of OpenWrt used the watchdog daemon from busybox, newer versions (including the Gateworks BSP's from 13-06 and forward) implement the watchdog daemon via procd, which is the init process (PID1). Therefore on modern OpenWrt, you will never see the watchdog process when doing a ps.
     153
     154Note that the procd watchdog functionality does not implement any specific system checks - if procd is simply running, it will tickle/reset the watchdog based on its configured period.
     155
     156The procd watchdog code always uses the primary watchdog device {{{/dev/watchdog}}}. You can configure what watchdog that is (ie GSC Watchdog or SoC watchdog) by disabling all but the desired watchdog in the kernel configuration.
     157
     158You can see the current configuration of the watchdog service via {{{ubus}}}:
     159{{{
     160#!bash
     161root@OpenWrt:/# ubus call system watchdog
     162{
     163        "status": "running",
     164        "timeout": 30,
     165        "frequency": 5
     166}
     167}}}
     168
     169While there is no uci configuration available for these options you could change them in an rc script such as rc.local if you wish:
     170{{{
     171#!bash
     172ubus call system watchdog '{ "timeout": 60 }'   # change to 60s timeout
     173ubus call system watchdog '{ "frequency": 1 }'   # change to 1s frequency
     174}}}
     175
     176To sop the service:
     177{{{
     178#!bash
     179ubus call system watchdog '{ "stop": true }'   # watchdog will cause a reset after it expires
     180}}}
     181
     182
     183=== OpenWrt busybox watchdog (Older OpenWrt used in 13-06 BSP and older) ===
     184Older OpenWrt versions such as the one used in the Gateworks 13-06 BSP use the busybox watchdog daemon. This daemon is similar to the procd based watchdog daemon in modern OpenWrt as it does not do any specific system checks - if the service is running it will tickle/reset the watchdog based on its configured period.
     185
     186There is no uci configuration for the period or frequency, they are hard-coded in the {{{/etc/init.d/watchdog}}} init script for a default timeout of 60 seconds and a tickle frequency of 5 seconds.
     187
     188usage:
     189{{{
     190#!bash
     191Usage: watchdog [-t N[ms]] [-T N[ms]] [-F] DEV
     192
     193Periodically write to watchdog device DEV
     194
     195Options:
     196        -T N    Reboot after N seconds if not reset (default 60)
     197        -t N    Reset every N seconds (default 30)
     198        -F      Run in foreground
     199
     200Use 500ms to specify period in milliseconds
     201}}}
     202
     203
     204Example:
     205{{{
     206#!bash
     207watchdog -t 30 -T 60 /dev/watchdog # reset the watchdog every 30 seconds with 60 second timeout
     208}}}
     209
     210
     211=== Android watchdog daemon ===
     212The Android OS watchdog daemon is {{{/sbin/watchdog}}} and is implemented in {{{/system/core/init/watchdogd.c}}}. It is kicked off by init and does not perform any specific checks.
     213
     214The Ventana Android BSP configures the watchdog for a 30 second timer and resets it every 10 seconds. If the GSC watchdog is enabled it will be used, otherwise the IMX6 SoC watchdog will be used.