wiki:troubleshooting

Linux Troubleshooting

See also:

Kernel Debugging

ring buffer

The kernel ring buffer is where various messages are sent by the kernel and its modules (ie debug/info/warn) as long as CONFIG_PRINTK is enabled. You can also enable CONFIG_PRINTK_TIME to enable timestamps on these messages.

Typically an application such as syslog will pull messages from here and write them to a log, however the dmesg command will display them directly from the ring buffer:

dmesg # display all messages
dmesg -c  # display and clear all messages

printk and log levels

According to Linux Device Drivers, the most common debugging technique is monitoring which is done by calling printf at suitable points. In the kernel you accomplish the same with printk.

The kernel defines various log levels or severity levels which are strings that you can pre-pend to debug statements defined in ​linux/include/linux/kern_levels.h:

  • KERN_EMERG (0) - system is unusable
  • KERN_ALERT (1) - actino must be taken immediately
  • KERN_CRIT (2) - critical conditions
  • KERN_ERR (3) - error conditions
  • KERN_WARNING (4) - warning conditions
  • KERN_NOTICE (5) - normal but significant condition
  • KERN_INFO (6) - informational
  • KERN_DEBUG (7) - debug-level messages
  • KERN_DEFAULT (d) - the default kernel loglevel

To enable various severity levels you can:

  1. set DEFAULT_MESSAGE_LOGLEVEL in kernel at build time
  2. use /proc/sys/kernel/printk:
    $ cat /proc/sys/kernel/printk
            7       4       1       7
            current default minimum boot-time-default
    $ echo 8 > /proc/sys/kernel/printk  # enable highest level
    
  3. pass the debug kernel commandline option

References:

pr-debug() and dev_dbg()

Instead of using printk() directly it is recommended to use the pr_debug() and dev_dbg() macros for the following reasons:

  • they use dynamic debugging (if enabled in the kernel, otherwise those are just printk's) which avoids performance hits caused by rendering the printk's regardless of loglevel tuning
  • they can display device-specific details and/or can have easily formatted prefixes

dynamic debug

Regardless of the system log level each printk() must still being rendered in the kernel (formatted with locking buffers) which is very expensive performance-wise. Dynamic Debug (if enabled via CONFIG_DYNAMIC_DEBUG) will instead conditionally skip debugs before they are rendered. Dynamic Debug allows you to enable/disable various debug messages at runtime based on various levels of scope:

  • per source file
  • per function
  • per module
  • format string
  • line number

The CONFIG_DYNAMIC_DEBUG option implicitly compiles in all pr_debug() and dev_dbg() calls which can enlarge the kernel text size by about 2%. Note that it does not work with printk() and if not enabled in the kernel all pr_debug() dev_dbg() calls simply become static printk's.

If a source file is compiled with the DEBUG flag set (which can be turned on by many CONFIG_*DEBUG* options), any pr_debug() calls in it are enabled by default, but can be disabled at runtime.

Dynamic debugging is controlled by the 'dynamic_debug/control' file in the debugfs filesystem. The following examples assume that debugfs is enabled and mounted to /sys/kernel/debug.

The format of each line of the file is: filename:lineno [module]function flags format:

  • filename : source file of the debug statement
  • lineno : line number of the debug statement
  • module : module that contains the debug statement
  • function : function that contains the debug statement
  • flags : '+p' means the line is turned 'on' for printing
  • format : the format used for the debug statement

For details and examples please see the kernel ​Documentation/dynamic-debug-howto.txt

References:

debugging kernel hangs by enabling initcall_debug

If you are troubleshooting a kernel hang it is usually best to pass initcall_debug to the kernel cmdline which will show you driver init calls entering and returning. If you see a driver enter init but not return, you know what driver is causing the hang.

Early Serial Debug

Typically the serial drivers don't get initialized until after some of the more critical components. This can result in any kernel crash/hang not reporting anything from the serial console after 'Starting kernel ....'

For platforms that early serial support you can enable this in the kernel config. For example on a Ventana board (all ventana boards use ttymxc1 (UART2) for serial console):

  • make menuconfig
    • Kernel hacking -> Kernel low-level debugging functions -> Kernel low-level debugging port (i.MX6Q/DL Debug UART)
    • Kernel hacking -> Kernel low-level debugging functions -> i.MX Debug UART Port Selection : 2
  • results in the following additions to .config:
    CONFIG_DEBUG_LL=y
    CONFIG_DEBUG_IMX6Q_UART=y
    CONFIG_DEBUG_IMX_UART_PORT=2
    CONFIG_DEBUG_LL_INCLUDE="debug/imx.S"
    CONFIG_DEBUG_UNCOMPRESS=y
    CONFIG_EARLY_PRINTK=y
    CONFIG_EARLY_PRINTK_DIRECT=y
    
  • Finally, some kernels require a special parameter to be passed in through the bootloader. See below for instructions:
    setenv extra ${extra} earlyprintk=serial,${console},${baudrate}
    saveenv
    

For newport, Early console is compiled into the kernel and to enable it make sure you boot with:

console=ttyAMA0,115200n8 earlycon=pl011,0x87e028000000 cma=64M

For venice, Early console is compiled into the kernel and to enable it make sure you boot with:

console=ttymxc1,115200 earlycon=ec_imx6q,0x30890000,115200

Linux Magic System Request Key Hacks

There exists a magic key combo you can hit which the kernel will respond to regardless of whatever else it is doing (unless its completely locked up).

To enable this build with CONFIG_MAGIC_SYSRQ when configuring the kernel.

This will enable a /proc/sysrq-trigger file which you can use to send various commands to. Some very useful commands for debugging are:

  • 'b' - reboot immediately (without sync)
  • 'c' - perform a system crash via a NULL pointer dereference (which will show a crashdump if enabled)
  • 'd' - show all locks that are held
  • 'e' - send a SIGTERM to all processes except for PID1 (init)
  • 'f' - call oom_kill to kill a memory hog process
  • 'h' - display help
  • 'i' - send a SIGKILL to all processes except for PID1 (init)
  • 'l' - show a stack backtrace for all active CPU's

References:

Last modified 4 years ago Last modified on 01/27/2021 09:17:11 PM
Note: See TracWiki for help on using the wiki.