Changes between Version 2 and Version 3 of linux/profiling


Ignore:
Timestamp:
01/10/2018 11:57:12 AM (6 months ago)
Author:
Tim Harvey
Comment:

convert restored html to wiki markup

Legend:

Unmodified
Added
Removed
Modified
  • linux/profiling

    v2 v3  
    1 {{{#!html
     1[[PageOutline]]
    22
    3 <h1 id="LinuxOSCodeProfiling"><b style="color:#000;background:#ffcc99">Linux OS Code Profiling</b></h1>
    4 <p>
    5 There are several options for code <b style="color:#000;background:#66ffff">profiling</b> on the <b style="color:#000;background:#ffff66">Linux</b> OS. The kernel itself has a <b style="color:#000;background:#66ffff">profiling</b> API which can be enabled:
    6 </p>
    7 <ul><li>CONFIG_<b style="color:#000;background:#66ffff">PROFILING</b> - General <b style="color:#000;background:#66ffff">profiling</b>
    8 </li><li>CONFIG_OPROFILE - OProfile system <b style="color:#000;background:#66ffff">profiling</b> (capable of <b style="color:#000;background:#66ffff">profiling</b> the whole system including kernel, kernel modules, libraries, and applications)
    9 </li></ul><p>
    10 OProfile was the <b style="color:#000;background:#66ffff">profiling</b> tool of choice for <b style="color:#000;background:#ffff66">linux</b> devls for nearly 10 years. A few years back various kernel developers defined and implemented a new formal kernel API to access performance monitor counters (PMC's), which are hardware elements in most modern CPU's, to address needs of performance tools. Prior to this new API oPOProfileofile used a special OProfile-specific kernel module while other tools relied on patches (perctr, perfmon).
    11 </p>
    12 <p>
    13 The developers of the new <b style="color:#000;background:#66ffff">profiling</b> API also developed an example tool that used the new API called 'perf'. The perf tool has thus matured greatly in the past few years. oprfile is strickly a <b style="color:#000;background:#66ffff">profiling</b> tool.
    14 </p>
    15 <p>
     3= Linux OS Code Profiling =
     4There are several options for code profiling on the Linux OS. The kernel itself has a profiling API which can be enabled:
     5 * CONFIG_PROFILING - General profiling
     6 * CONFIG_OPROFILE - OProfile system profiling (capable of profiling the whole system including kernel, kernel modules, libraries, and applications)
     7
     8OProfile was the profiling tool of choice for linux devls for nearly 10 years. A few years back various kernel developers defined and implemented a new formal kernel API to access performance monitor counters (PMC's), which are hardware elements in most modern CPU's, to address needs of performance tools. Prior to this new API oPOProfileofile used a special OProfile-specific kernel module while other tools relied on patches (perctr, perfmon).
     9
     10The developers of the new profiling API also developed an example tool that used the new API called 'perf'. The perf tool has thus matured greatly in the past few years. oprfile is strickly a profiling tool.
     11
    1612There are other options that are not described here:
    17 </p>
    18 <ul><li>valgrind / cachegrind / dtrace
    19 </li><li>Google CPU <b style="color:#000;background:#66ffff">profiler</b>
    20 </li><li>gprof
    21 </li></ul><p>
     13 * valgrind / cachegrind / dtrace
     14 * Google CPU profiler
     15 * gprof
     16
    2217Reference:
    23 </p>
    24 <ul><li><a class="ext-link" href="http://rhaas.blogspot.co.uk/2012/06/perf-good-bad-ugly.html"><span class="icon">​</span>http://rhaas.blogspot.co.uk/2012/06/perf-good-bad-ugly.html</a>
    25 </li><li><a class="ext-link" href="http://homepages.cwi.nl/~aeb/linux/profile.html"><span class="icon">​</span>http://homepages.cwi.nl/~aeb/<b style="color:#000;background:#ffff66">linux</b>/profile.html</a>
    26 </li></ul><h2 id="BasicKernelProfilingCONFIG_PROFILINGandreadprofile">Basic Kernel <b style="color:#000;background:#66ffff">Profiling</b> (CONFIG_<b style="color:#000;background:#66ffff">PROFILING</b>  and readprofile)</h2>
    27 <p>
    28 There are several facilities to see where the kernel spends its resources. A simple one which can be built-in with (CONFIG_<b style="color:#000;background:#66ffff">PROFILING</b>) will store the current EIP (instruction pointer) at each clock tick.
    29 </p>
    30 <p>
    31 To use this ensure the kernel is built with CONFIG_<b style="color:#000;background:#66ffff">PROFILING</b> and either boot the kernel with command line option <strong>profile=2</strong> or enable at runtime with an <strong>echo 2 &gt; /sys/kernel/<b style="color:#000;background:#66ffff">profiling</b></strong>.
    32 </p>
    33 <p>
    34 This will cause a file /proc/profile to be created. The number provided (2 in the example above) is the number of positions EIP is shifted right when <b style="color:#000;background:#66ffff">profiling</b>. So a large number gives a coarse profile. The counters are reset by writing to /proc/profile.
    35 </p>
    36 <p>
     18 * ​http://rhaas.blogspot.co.uk/2012/06/perf-good-bad-ugly.html
     19 * http://homepages.cwi.nl/~aeb/linux/profile.html
     20
     21== Basic Kernel Profiling (CONFIG_PROFILING and readprofile) ==
     22There are several facilities to see where the kernel spends its resources. A simple one which can be built-in with (CONFIG_PROFILING) will store the current EIP (instruction pointer) at each clock tick.
     23
     24To use this ensure the kernel is built with CONFIG_PROFILING and either boot the kernel with command line option profile=2 or enable at runtime with an echo 2 > /sys/kernel/profiling.
     25
     26This will cause a file /proc/profile to be created. The number provided (2 in the example above) is the number of positions EIP is shifted right when profiling. So a large number gives a coarse profile. The counters are reset by writing to /proc/profile.
     27
    3728The utility readprofile will output statistics for you. It does not sort so you have to invoke sort explicitly. But given a memory map it will translate addresses to kernel symbols.
    38 </p>
    39 <p>
     29
    4030Example:
    41 </p>
    42 <ol><li>boot kernel compiled with CONFIG_<b style="color:#000;background:#66ffff">PROFILING</b>
    43 </li><li>enable (either with placing <strong>profile=2</strong> on cmdline or dynamically with:
    44 <pre class="wiki">echo 2 &gt; /sys/kernel/<b style="color:#000;background:#66ffff">profiling</b> # enable <b style="color:#000;background:#66ffff">profiling</b>
    45 </pre></li><li>(optional) clear counters
    46 <pre class="wiki">echo &gt; /proc/profile # reset counters
    47 </pre></li><li>do some activity you wish to profile
    48 </li><li>use <strong>readprofile</strong> to interpret the results:
    49 <pre class="wiki">readprofile -m System.map | sort -nr | head -2
     31 1. boot kernel compiled with CONFIG_PROFILING
     32 2. enable (either with placing {{{profile=2}}} on cmdline or dynamically with:
     33{{{#!bash
     34echo 2 > /sys/kernel/profiling # enable profiling
     35}}}
     36 3. (optional) clear counters
     37{{{#!bash
     38echo > /proc/profile # reset counters
     39}}}
     40 4. do some activity you wish to profile
     41 5. use readprofile to interpret the results:
     42{{{#!bash
     43readprofile -m System.map | sort -nr | head -2
    5044510502 total                                      0.1534
    5145508548 default_idle                           10594.7500
    52 </pre></li></ol><ul><li>The first column gives the number of timer ticks. The last column gives the number of ticks divided by the size of the function.
    53 </li><li>The command readprofile -r is equivalent to echo &gt; /proc/profile.
    54 </li></ul><p>
     46}}}
     47  * The first column gives the number of timer ticks. The last column gives the number of ticks divided by the size of the function.
     48  * The command readprofile -r is equivalent to echo > /proc/profile.
     49
    5550References:
    56 </p>
    57 <ul><li><a class="ext-link" href="http://lxr.missinglinkelectronics.com/linux/Documentation/basic_profiling.txt"><span class="icon">​</span>http://lxr.missinglinkelectronics.com/<b style="color:#000;background:#ffff66">linux</b>/Documentation/basic_<b style="color:#000;background:#66ffff">profiling</b>.txt</a>
    58 </li><li><a class="ext-link" href="http://homepages.cwi.nl/~aeb/linux/profile.html"><span class="icon">​</span>http://homepages.cwi.nl/~aeb/<b style="color:#000;background:#ffff66">linux</b>/profile.html</a>
    59 </li><li>See <a class="ext-link" href="http://lxr.missinglinkelectronics.com/linux/kernel/profile.c"><span class="icon">​</span>kernel/profile.c</a> and <a class="ext-link" href="http://lxr.missinglinkelectronics.com/linux/fs/proc/proc_misc.c"><span class="icon">​</span>fs/proc/proc_misc.c</a> and <a class="ext-link" href="http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&amp;db=man&amp;fname=/usr/share/catman/man1/readprofile.1.html"><span class="icon">​</span>readprofile(1)</a>.
    60 </li></ul><h2 id="OProfile">OProfile</h2>
    61 <p>
    62 OProfile provides a <b style="color:#000;background:#66ffff">profiler</b> and post-processing tools for analyzing profile data, event counter.
    63 </p>
    64 <p>
    65 The tool used is called <strong>operf</strong>. Some processors are not supported by the underlying new perf_events kernel API and thus not supported by operf. If you see <strong>Your kernel's Performance Events Subsystem does not support your processor type</strong> then you need to try and use opcontrol for the legacy mode.
    66 </p>
    67 <p>
     51 * ​http://lxr.missinglinkelectronics.com/linux/Documentation/basic_profiling.txt
     52 * http://homepages.cwi.nl/~aeb/linux/profile.html
     53 * See [http://lxr.missinglinkelectronics.com/linux/kernel/profile.c ​kernel/profile.c] and ​[http://lxr.missinglinkelectronics.com/linux/fs/proc/proc_misc.c fs/proc/proc_misc.c] and [http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man1/readprofile.1.html ​readprofile(1)].
     54
     55== OProfile ==
     56
     57OProfile provides a profiler and post-processing tools for analyzing profile data, event counter.
     58
     59The tool used is called {{{operf}}}. Some processors are not supported by the underlying new perf_events kernel API and thus not supported by operf. If you see **Your kernel's Performance Events Subsystem does not support your processor type** then you need to try and use opcontrol for the legacy mode.
     60
    6861References:
    69 </p>
    70 <ul><li><a class="ext-link" href="http://oprofile.sourceforge.net/"><span class="icon">​</span>http://oprofile.sourceforge.net/</a>
    71 </li><li><a class="ext-link" href="http://oprofile.sourceforge.net/doc/index.html"><span class="icon">​</span>http://oprofile.sourceforge.net/doc/index.html</a>
    72 </li></ul><h3 id="OProfileStandardModeimx6">OProfile Standard Mode (imx6)</h3>
    73 <p>
     62 * http://oprofile.sourceforge.net/
     63 * http://oprofile.sourceforge.net/doc/index.html
     64
     65=== OProfile Standard Mode (imx6) ===
    7466Starting with v0.9.8, OProfile switched over to using the new perf_events kernel API with a new set of userspace tools (however OProfile still supports the legacy mode - see below).
    75 </p>
    76 <p>
     67
    7768Standard mode tools:
    78 </p>
    79 <ul><li>operf -
    80 </li><li>ocount - collect raw event counts on a per-app, per-process, per-cpu, or systrem-wide
    81 </li></ul><p>
     69 * operf
     70 * ocount - collect raw event counts on a per-app, per-process, per-cpu, or systrem-wide
     71
    8272Using the standard mode, post-processing of collected raw events is not necessary.
    83 </p>
    84 <h3 id="OProfileLegacyModecns3xxx">OProfile Legacy Mode (cns3xxx)</h3>
    85 <p>
    86 The <strong>legacy mode</strong> (for CPU's that do not implement the new perf_events kernel <b style="color:#000;background:#66ffff">profiling</b> API. The Gateworks Laguna family using the Cavium cns3xxx CPU falls into this category.
    87 </p>
    88 <p>
     73
     74=== OProfile Legacy Mode (cns3xxx) ===
     75The legacy mode (for CPU's that do not implement the new perf_events kernel profiling API. The Gateworks Laguna family using the Cavium cns3xxx CPU falls into this category.
     76
    8977The legacy mode tools consists of:
    90 </p>
    91 <ul><li>oprofile kernel module (requires CONFIG_<b style="color:#000;background:#66ffff">PROFILING</b>=y and CONFIG_OPROFILE=m)
    92 </li><li>opcontrol - used to setup <b style="color:#000;background:#66ffff">profiling</b> (need vmlinux file)
    93 </li><li>opprofiled - the daemon (controlled via opcontrol)
    94 </li><li>opreport - report on collected samples
    95 </li></ul><p>
     78 * oprofile kernel module (requires CONFIG_PROFILING=y and CONFIG_OPROFILE=m)
     79 * opcontrol - used to setup profiling (need vmlinux file)
     80 * opprofiled - the daemon (controlled via opcontrol)
     81 * opreport - report on collected samples
    9682opcontrol parameters:
    97 </p>
    98 <ul><li>--session-dir specifies the location to store samples. It defaults to /var/lib/oprofile and you can use this (with both opcontrol and opreport) to use samples from alternate locations
    99 </li><li>--separate specifies how to seperate samples. By default they are all stored in a single file (none), but you can choose to store by:
    100 <ul><li>none - no profile separation (default)
    101 </li><li>lib - per-application profiles for libraries
    102 </li><li>kernel - per-application profiles for the kernel and kernel modules
    103 </li><li>thread - profiles for each thread and each task
    104 </li><li>cpu - profiles for each CPU
    105 </li><li>all - all of the above
    106 </li></ul></li><li>Using <strong>profile specification parameters</strong> you can choose how to sample and report data"
    107 <ul><li>cpu:0 - report just cpu0 (assuming data was collected separately (see above))
    108 </li></ul></li><li>--vmlinux=file (both for opcontrol and opreport) specifies the vmlinux kernel image required for decrypting kernel symbols
    109 </li><li>--setup will store the following list of parameters in /root/.oprofile/daemonrc to be used as default settings for opcontrol and opreport. Alternatively you can specify setup options to each program as needed
    110 </li></ul><p>
     83 * --session-dir specifies the location to store samples. It defaults to /var/lib/oprofile and you can use this (with both opcontrol and opreport) to use samples from alternate locations
     84 * --separate specifies how to seperate samples. By default they are all stored in a single file (none), but you can choose to store by:
     85  - none - no profile separation (default)
     86  - lib - per-application profiles for libraries
     87  - kernel - per-application profiles for the kernel and kernel modules
     88  - thread - profiles for each thread and each task
     89  - cpu - profiles for each CPU
     90  - all - all of the above
     91 * Using profile specification parameters you can choose how to sample and report data"
     92  - cpu:0 - report just cpu0 (assuming data was collected separately (see above))
     93 * --vmlinux=file (both for opcontrol and opreport) specifies the vmlinux kernel image required for decrypting kernel symbols
     94 * --setup will store the following list of parameters in /root/.oprofile/daemonrc to be used as default settings for opcontrol and opreport. Alternatively you can specify setup options to each program as needed
     95
    11196Example usage:
    112 </p>
    113 <ol><li>copy your current kernel's vmlinux to /tmp
    114 </li><li>(optional) setup our configuration for vmlinux symbol decrypting, specific session location, and separating events by cpu:
    115 <pre class="wiki">opcontrol --setup --vmlinux=/tmp/vmlinux --session-dir=/tmp/session1 --separate=cpu
    116 </pre></li><li>start capturing events:
    117 <pre class="wiki">opcontrol --start
    118 </pre><ul><li>you can force a flush of collected events via <strong>opcontrol --dump</strong> at any time
    119 </li><li>you can clearout current collected events via <strong>opcontrol --reset<em> at any time
    120 </em></strong></li></ul></li><li>stop capturing events (and flush data):
    121 <pre class="wiki">opcontrol --shutdown
    122 </pre></li><li>report events:
    123 <pre class="wiki">opreport --vmlinux=/tmp/vmlinux --session-dir=/tmp/session1
    124 </pre><ul><li>if capturing events from individual cpu's separately (as shown above) you can show the info for just cpu0 via <strong>opreport cpu:0</strong>
    125 </li><li>Note that opreport doesn't make use of the conf file generated by opcontrol --setup
    126 </li></ul></li></ol><p>
     97 1. copy your current kernel's vmlinux to /tmp
     98 2. (optional) setup our configuration for vmlinux symbol decrypting, specific session location, and separating events by cpu:
     99{{{#!bash
     100opcontrol --setup --vmlinux=/tmp/vmlinux --session-dir=/tmp/session1 --separate=cpu
     101}}}
     102 3. start capturing events:
     103{{{#!bash
     104opcontrol --start
     105}}}
     106  * you can force a flush of collected events via opcontrol --dump at any time
     107  * you can clearout current collected events via opcontrol --reset at any time
     108 4. stop capturing events (and flush data):
     109{{{#!bash
     110opcontrol --shutdown
     111}}}
     112 5. report events:
     113{{{#!bash
     114opreport --vmlinux=/tmp/vmlinux --session-dir=/tmp/session1
     115}}}
     116  * if capturing events from individual cpu's separately (as shown above) you can show the info for just cpu0 via opreport cpu:0
     117  * Note that opreport doesn't make use of the conf file generated by opcontrol --setup
     118
    127119Important notes:
    128 </p>
    129 <ul><li>because the cns3xxx kernel and/or hardware does not support a performance counter and this means we are forced into timer based mode using timer irq. In this mode <b style="color:#000;background:#66ffff">profiling</b> is not useful when using code that disables irqs or runs in hardirq context
    130 </li></ul><p>
     120 * because the cns3xxx kernel and/or hardware does not support a performance counter and this means we are forced into timer based mode using timer irq. In this mode profiling is not useful when using code that disables irqs or runs in hardirq context
     121
    131122References:
    132 </p>
    133 <ul><li><a class="ext-link" href="http://oprofile.sourceforge.net/doc/controlling-daemon.html"><span class="icon">​</span>http://oprofile.sourceforge.net/doc/controlling-daemon.html</a>
    134 </li><li><a class="ext-link" href="http://oprofile.sourceforge.net/doc/getting-started-with-legacy.html"><span class="icon">​</span>http://oprofile.sourceforge.net/doc/getting-started-with-legacy.html</a>
    135 </li></ul><h2 id="Perf">Perf</h2>
    136 <p>
    137 In general <b style="color:#000;background:#66ffff">profiling</b> with the <strong>perf</strong> tool is considered easier to install and run.
    138 </p>
    139 <p>
     123 * http://oprofile.sourceforge.net/doc/controlling-daemon.html
     124 * ​http://oprofile.sourceforge.net/doc/getting-started-with-legacy.html
     125
     126
     127== Perf ==
     128In general profiling with the {{{perf}}} tool is considered easier to install and run.
     129
    140130Example:
    141 </p>
    142 <ol><li>(optional) copy your current kernel's vmlinux to /tmp
    143 </li><li>capture 120 seconds worth of <b style="color:#000;background:#66ffff">profiling</b> data
    144 <pre class="wiki">perf record -p $(pidofprogram) sleep 120
    145 </pre></li><li>report data (using kernel symbols):
    146 <pre class="wiki">perf report -k /tmp/vmlinux
    147 </pre><ul><li>the -k is optional and adds kernel symbol decoding
    148 </li></ul></li></ol><p>
     131 1. (optional) copy your current kernel's vmlinux to /tmp
     132 2. capture 120 seconds worth of profiling data
     133{{{#!bash
     134perf record -p $(pidofprogram) sleep 120
     135}}}
     136 3. report data (using kernel symbols):
     137{{{#!bash
     138perf report -k /tmp/vmlinux
     139}}}
     140  * the -k is optional and adds kernel symbol decoding
     141
    149142References:
    150 </p>
    151 <ul><li><a class="ext-link" href="https://perf.wiki.kernel.org/index.php/Tutorial"><span class="icon">​</span>https://perf.wiki.kernel.org/index.php/Tutorial</a>
    152 </li></ul><h2 id="OpenWrt"><a class="wiki" href="/wiki/OpenWrt">OpenWrt</a></h2>
    153 <p>
    154 <a class="wiki" href="/wiki/OpenWrt">OpenWrt</a> has support for both oProfile and perf. Because perf depends on glibc (or at least is configured that way) we recommend oprofile when using <a class="wiki" href="/wiki/OpenWrt">OpenWrt</a>.
    155 </p>
    156 <p>
    157 To enable oProfile on <a class="wiki" href="/wiki/OpenWrt">OpenWrt</a> do a make menuconfig and:
    158 </p>
    159 <ul><li>Global build Settings -&gt; Compile the kernel with <b style="color:#000;background:#66ffff">profiling</b> enabled
    160 </li><li>Development -&gt; oprofile
    161 </li><li>Development -&gt; oprofile-utils
    162 <ul><li>Note that package/devel/oprofile/Makefile may need +librt added to DEPENDS
    163 </li></ul></li></ul><p>
     143 * https://perf.wiki.kernel.org/index.php/Tutorial
     144
     145
     146= OpenWrt =
     147OpenWrt has support for both oProfile and perf. Because perf depends on glibc (or at least is configured that way) we recommend oprofile when using OpenWrt.
     148
     149To enable oProfile on OpenWrt do a make menuconfig and:
     150 * Global build Settings -> Compile the kernel with profiling enabled
     151 * Development -> oprofile
     152 * Development -> oprofile-utils
     153  - Note that package/devel/oprofile/Makefile may need +librt added to DEPENDS
     154
    164155To enable perf (glibc required):
    165 </p>
    166 <ul><li>Global build Settings -&gt; Compile the kernel with <b style="color:#000;background:#66ffff">profiling</b> enabled
    167 </li><li>Development -&gt; perf
    168 </li></ul><p>
     156 * Global build Settings -> Compile the kernel with profiling enabled
     157 * Development -> perf
     158
    169159You likely want to run non-stripped binaries for anything you want to actually investigate. One way of doing this is to build them with CONFIG_DEBUG=y. For example building compat-wireless:
    170 </p>
    171 <pre class="wiki">make target/<b style="color:#000;background:#ffff66">linux</b>/mac80211/{clean,compile} V=99 CONFIG_DEBUG=y
    172 </pre><p>
     160{{{#!bash
     161make target/linux/mac80211/{clean,compile} V=99 CONFIG_DEBUG=y
     162}}}
     163
    173164References:
    174 </p>
    175 <ul><li><a class="ext-link" href="http://false.ekta.is/2012/11/cpu-profiling-applications-on-openwrt-with-perf-or-oprofile/"><span class="icon">​</span><b style="color:#000;background:#66ffff">Profiling</b> on OpenWrt with perf or OProfile</a>
    176 </li></ul></div>
    177          
    178           <div class="trac-modifiedby">
    179             <span><a href="/wiki/linux/profiling?action=diff&amp;version=3" title="Version 3 by tharvey: added note about cns3xxx timer based profiling limitations">Last modified</a> <a class="timeline" href="/timeline?from=2015-04-07T16%3A03%3A47-07%3A00&amp;precision=second" title="See timeline at 04/07/15 16:03:47">2 years ago</a></span>
    180             <span class="trac-print">Last modified on 04/07/15 16:03:47</span>
    181           </div>
    182        
    183        
    184       </div>
    185      
    186 
    187     </div>
    188     <div id="altlinks">
    189       <h3>Download in other formats:</h3>
    190       <ul>
    191         <li class="last first">
    192           <a rel="nofollow" href="/wiki/linux/profiling?format=txt">Plain Text</a>
    193         </li>
    194       </ul>
    195     </div>
    196     </div>
    197     <div id="footer" lang="en" xml:lang="en"><hr />
    198       <a id="tracpowered" href="http://trac.edgewall.org/"><img src="/chrome/common/trac_logo_mini.png" height="30" width="107" alt="Trac Powered" /></a>
    199       <p class="left">Powered by <a href="/about"><strong>Trac 1.0</strong></a><br />
    200         By <a href="http://www.edgewall.org/">Edgewall Software</a>.</p>
    201       <p class="right">Visit the Trac open source project at<br /><a href="http://trac.edgewall.org/">http://trac.edgewall.org/</a></p>
    202     </div>
    203 }}}
     165 * [http://false.ekta.is/2012/11/cpu-profiling-applications-on-openwrt-with-perf-or-oprofile/ Profiling on OpenWrt with perf or OProfile]]