[[PageOutline]] = Linux OS Code Profiling = There are several options for code profiling on the Linux OS. The kernel itself has a profiling API which can be enabled: * CONFIG_PROFILING - General profiling * CONFIG_OPROFILE - OProfile system profiling (capable of profiling the whole system including kernel, kernel modules, libraries, and applications) OProfile was the profiling tool of choice for linux devls for nearly 10 years. A few years back various kernel developers defined and implemented a new formal kernel API to access performance monitor counters (PMC's), which are hardware elements in most modern CPU's, to address needs of performance tools. Prior to this new API oPOProfileofile used a special OProfile-specific kernel module while other tools relied on patches (perctr, perfmon). The developers of the new profiling API also developed an example tool that used the new API called 'perf'. The perf tool has thus matured greatly in the past few years. oprfile is strickly a profiling tool. There are other options that are not described here: * valgrind / cachegrind / dtrace * Google CPU profiler * gprof Reference: * ​http://rhaas.blogspot.co.uk/2012/06/perf-good-bad-ugly.html * http://homepages.cwi.nl/~aeb/linux/profile.html == Basic Kernel Profiling (CONFIG_PROFILING and readprofile) == There are several facilities to see where the kernel spends its resources. A simple one which can be built-in with (CONFIG_PROFILING) will store the current EIP (instruction pointer) at each clock tick. To use this ensure the kernel is built with CONFIG_PROFILING and either boot the kernel with command line option profile=2 or enable at runtime with an echo 2 > /sys/kernel/profiling. This will cause a file /proc/profile to be created. The number provided (2 in the example above) is the number of positions EIP is shifted right when profiling. So a large number gives a coarse profile. The counters are reset by writing to /proc/profile. The utility readprofile will output statistics for you. It does not sort so you have to invoke sort explicitly. But given a memory map it will translate addresses to kernel symbols. Example: 1. boot kernel compiled with CONFIG_PROFILING 2. enable (either with placing {{{profile=2}}} on cmdline or dynamically with: {{{#!bash echo 2 > /sys/kernel/profiling # enable profiling }}} 3. (optional) clear counters {{{#!bash echo > /proc/profile # reset counters }}} 4. do some activity you wish to profile 5. use readprofile to interpret the results: {{{#!bash readprofile -m System.map | sort -nr | head -2 510502 total 0.1534 508548 default_idle 10594.7500 }}} * The first column gives the number of timer ticks. The last column gives the number of ticks divided by the size of the function. * The command readprofile -r is equivalent to echo > /proc/profile. References: * ​http://lxr.missinglinkelectronics.com/linux/Documentation/basic_profiling.txt * http://homepages.cwi.nl/~aeb/linux/profile.html * See [http://lxr.missinglinkelectronics.com/linux/kernel/profile.c ​kernel/profile.c] and ​[http://lxr.missinglinkelectronics.com/linux/fs/proc/proc_misc.c fs/proc/proc_misc.c] and [http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man1/readprofile.1.html ​readprofile(1)]. == OProfile == OProfile provides a profiler and post-processing tools for analyzing profile data, event counter. The tool used is called {{{operf}}}. Some processors are not supported by the underlying new perf_events kernel API and thus not supported by operf. If you see **Your kernel's Performance Events Subsystem does not support your processor type** then you need to try and use opcontrol for the legacy mode. References: * http://oprofile.sourceforge.net/ * http://oprofile.sourceforge.net/doc/index.html === OProfile Standard Mode (imx6) === Starting with v0.9.8, OProfile switched over to using the new perf_events kernel API with a new set of userspace tools (however OProfile still supports the legacy mode - see below). Standard mode tools: * operf * ocount - collect raw event counts on a per-app, per-process, per-cpu, or systrem-wide Using the standard mode, post-processing of collected raw events is not necessary. == Perf == In general profiling with the {{{perf}}} tool is considered easier to install and run. Example: 1. (optional) copy your current kernel's vmlinux to /tmp 2. capture 120 seconds worth of profiling data {{{#!bash perf record -p $(pidofprogram) sleep 120 }}} 3. report data (using kernel symbols): {{{#!bash perf report -k /tmp/vmlinux }}} * the -k is optional and adds kernel symbol decoding References: * https://perf.wiki.kernel.org/index.php/Tutorial = OpenWrt = OpenWrt has support for both oProfile and perf. Because perf depends on glibc (or at least is configured that way) we recommend oprofile when using OpenWrt. To enable oProfile on OpenWrt do a make menuconfig and: * Global build Settings -> Compile the kernel with profiling enabled * Development -> oprofile * Development -> oprofile-utils - Note that package/devel/oprofile/Makefile may need +librt added to DEPENDS To enable perf (glibc required): * Global build Settings -> Compile the kernel with profiling enabled * Development -> perf You likely want to run non-stripped binaries for anything you want to actually investigate. One way of doing this is to build them with CONFIG_DEBUG=y. For example building compat-wireless: {{{#!bash make target/linux/mac80211/{clean,compile} V=99 CONFIG_DEBUG=y }}} References: * [http://false.ekta.is/2012/11/cpu-profiling-applications-on-openwrt-with-perf-or-oprofile/ Profiling on OpenWrt with perf or OProfile]]