Linux Troubleshooting
See also:
Kernel Debugging
ring buffer
The kernel ring buffer is where various messages are sent by the kernel and its modules (ie debug/info/warn) as long as CONFIG_PRINTK
is enabled. You can also enable CONFIG_PRINTK_TIME
to enable timestamps on these messages.
Typically an application such as syslog
will pull messages from here and write them to a log, however the dmesg
command will display them directly from the ring buffer:
dmesg # display all messages dmesg -c # display and clear all messages
printk and log levels
According to Linux Device Drivers, the most common debugging technique is monitoring which is done by calling printf
at suitable points. In the kernel you accomplish the same with printk
.
The kernel defines various log levels or severity levels which are strings that you can pre-pend to debug statements defined in linux/include/linux/kern_levels.h
:
- KERN_EMERG (0) - system is unusable
- KERN_ALERT (1) - actino must be taken immediately
- KERN_CRIT (2) - critical conditions
- KERN_ERR (3) - error conditions
- KERN_WARNING (4) - warning conditions
- KERN_NOTICE (5) - normal but significant condition
- KERN_INFO (6) - informational
- KERN_DEBUG (7) - debug-level messages
- KERN_DEFAULT (d) - the default kernel loglevel
To enable various severity levels you can:
- set
DEFAULT_MESSAGE_LOGLEVEL
in kernel at build time - use
/proc/sys/kernel/printk
:$ cat /proc/sys/kernel/printk 7 4 1 7 current default minimum boot-time-default $ echo 8 > /proc/sys/kernel/printk # enable highest level
- pass the
debug
kernel commandline option
References:
pr-debug() and dev_dbg()
Instead of using printk()
directly it is recommended to use the pr_debug()
and dev_dbg()
macros for the following reasons:
- they use dynamic debugging (if enabled in the kernel, otherwise those are just printk's) which avoids performance hits caused by rendering the printk's regardless of loglevel tuning
- they can display device-specific details and/or can have easily formatted prefixes
dynamic debug
Regardless of the system log level each printk() must still being rendered in the kernel (formatted with locking buffers) which is very expensive performance-wise. Dynamic Debug (if enabled via CONFIG_DYNAMIC_DEBUG) will instead conditionally skip debugs before they are rendered. Dynamic Debug allows you to enable/disable various debug messages at runtime based on various levels of scope:
- per source file
- per function
- per module
- format string
- line number
The CONFIG_DYNAMIC_DEBUG
option implicitly compiles in all pr_debug()
and dev_dbg()
calls which can enlarge the kernel text size by about 2%. Note that it does not work with printk()
and if not enabled in the kernel all pr_debug()
dev_dbg()
calls simply become static printk's.
If a source file is compiled with the DEBUG flag set (which can be turned on by many CONFIG_*DEBUG* options), any pr_debug() calls in it are enabled by default, but can be disabled at runtime.
Dynamic debugging is controlled by the 'dynamic_debug/control' file in the debugfs filesystem. The following examples assume that debugfs is enabled and mounted to /sys/kernel/debug.
The format of each line of the file is: filename:lineno [module]function flags format
:
- filename : source file of the debug statement
- lineno : line number of the debug statement
- module : module that contains the debug statement
- function : function that contains the debug statement
- flags : '+p' means the line is turned 'on' for printing
- format : the format used for the debug statement
For details and examples please see the kernel Documentation/dynamic-debug-howto.txt
References:
debugging kernel hangs by enabling initcall_debug
If you are troubleshooting a kernel hang it is usually best to pass initcall_debug
to the kernel cmdline which will show you driver init calls entering and returning. If you see a driver enter init but not return, you know what driver is causing the hang.
Early Serial Debug
Typically the serial drivers don't get initialized until after some of the more critical components. This can result in any kernel crash/hang not reporting anything from the serial console after 'Starting kernel ....'
For platforms that early serial support you can enable this in the kernel config. For example on a Ventana board (all ventana boards use ttymxc1 (UART2) for serial console):
- make menuconfig
- Kernel hacking -> Kernel low-level debugging functions -> Kernel low-level debugging port (i.MX6Q/DL Debug UART)
- Kernel hacking -> Kernel low-level debugging functions -> i.MX Debug UART Port Selection : 2
- results in the following additions to .config:
CONFIG_DEBUG_LL=y CONFIG_DEBUG_IMX6Q_UART=y CONFIG_DEBUG_IMX_UART_PORT=2 CONFIG_DEBUG_LL_INCLUDE="debug/imx.S" CONFIG_DEBUG_UNCOMPRESS=y CONFIG_EARLY_PRINTK=y CONFIG_EARLY_PRINTK_DIRECT=y
- Finally, some kernels require a special parameter to be passed in through the bootloader. See below for instructions:
setenv extra ${extra} earlyprintk=serial,${console},${baudrate} saveenv
For newport, Early console is compiled into the kernel and to enable it make sure you boot with:
console=ttyAMA0,115200n8 earlycon=pl011,0x87e028000000 cma=64M
For venice, Early console is compiled into the kernel and to enable it make sure you boot with:
console=ttymxc1,115200 earlycon=ec_imx6q,0x30890000,115200
Linux Magic System Request Key Hacks
There exists a magic key combo you can hit which the kernel will respond to regardless of whatever else it is doing (unless its completely locked up).
To enable this build with CONFIG_MAGIC_SYSRQ when configuring the kernel.
This will enable a /proc/sysrq-trigger file which you can use to send various commands to. Some very useful commands for debugging are:
- 'b' - reboot immediately (without sync)
- 'c' - perform a system crash via a NULL pointer dereference (which will show a crashdump if enabled)
- 'd' - show all locks that are held
- 'e' - send a SIGTERM to all processes except for PID1 (init)
- 'f' - call oom_kill to kill a memory hog process
- 'h' - display help
- 'i' - send a SIGKILL to all processes except for PID1 (init)
- 'l' - show a stack backtrace for all active CPU's
References: