Changes between Version 1 and Version 2 of ventana/memory


Ignore:
Timestamp:
01/09/2018 03:07:00 PM (10 months ago)
Author:
Tim Harvey
Comment:

convert restored html to wiki markup

Legend:

Unmodified
Added
Removed
Modified
  • ventana/memory

    v1 v2  
    1 {{{#!html
    2           <div id="wikipage" class="trac-content"><p>
    3 </p><div class="wiki-toc">
    4 <ol>
    5   <li>
    6     <a href="#VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></a>
    7     <ol>
    8       <li>
    9         <a href="#MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</a>
    10       </li>
    11       <li>
    12         <a href="#LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</a>
    13       </li>
    14       <li>
    15         <a href="#LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></a>
    16       </li>
    17     </ol>
    18   </li>
    19 </ol>
    20 </div><p>
    21 </p>
    22 <h1 id="VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></h1>
    23 <p>
    24 The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main <b style="color:#000;background:#66ffff">memory</b>.
    25 </p>
    26 <p>
    27 All <b style="color:#000;background:#ffff66">Ventana</b> products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit <b style="color:#000;background:#66ffff">memory</b> architecture, each <b style="color:#000;background:#ffff66">Ventana</b> model differs because
    28 </p>
    29 <table class="wiki">
    30 <tr><th> Baseboard           </th><th> width </th><th> chip arrangement </th><th> Max Addressible<sup></sup><sup>1</sup><sup></sup>
    31 </th></tr><tr><td> GW54xx/GW53xx         </td><td> 64bit   </td><td> 4x 16bit chips     </td><td> 4GB                   
    32 </td></tr><tr><td> GW51xx/GW52xx/GW552x/GW553x </td><td> 32bit   </td><td> 2x 16bit chips     </td><td> 2GB                   
    33 </td></tr><tr><td> GW551x                </td><td> 16bit   </td><td> 1x 16bit chips     </td><td> 1GB                   
    34 </td></tr></table>
    35 <ol><li>Max Addressible is the maximum possible <b style="color:#000;background:#66ffff">memory</b> assuming today's DDR3 density - contact sales@… for information on available board models.
    36 </li></ol><h2 id="MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</h2>
    37 <p>
    38 The Freescale MMDC has some profiling support built in that can allow you to examine <b style="color:#000;background:#66ffff">memory</b> utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current <b style="color:#000;background:#66ffff">memory</b> utilization.
    39 </p>
    40 <p>
     1[[PageOutline]]
     2
     3= Ventana Memory =
     4The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main memory.
     5
     6All Ventana products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit memory architecture, each Ventana model differs because
     7
     8||= Baseboard                 =||= width =||= chip arrangement =||= Max Addressable ^^^1^^^ =||
     9|| GW54xx/GW53xx               || 64bit   || 4x 16bit chips || 4GB ||
     10|| GW51xx/GW52xx/GW552x/GW553x || 32bit   || 2x 16bit chips || 2GB ||
     11|| GW551x                      || 16bit   || 1x 16bit chips || 1GB ||
     12 1. Max Addressable is the maximum possible memory assuming today's DDR3 density - contact sales@… for information on available board models.
     13
     14
     15== Memory Performance ==
     16The Freescale MMDC has some profiling support built in that can allow you to examine memory utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current memory utilization.
     17
    4118By default the mmdc2 application is installed on the Gateworks Yocto BSP gateworks-image-multimedia and gateworks-image-gui images. It is available in the imx-test package and located in /unit_tests/mmdc2.
    42 </p>
    43 <p>
     19
    4420Example usage:
    45 </p>
    46 <ul><li>show usage:
    47 <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 -h
     21 * show usage:
     22{{{#!bash
     23root@ventana:~# /unit_tests/mmdc2 -h
    4824MMDC DOES NOT KNOW -h
    4925======================MMDC v1.3===========================
     
    5430Note1: More than 1 master can be inputed. They will be profiled one by one.
    5531Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead.
    56 </pre></li><li>show total utilization:
    57 <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2
     32}}}
     33 * show total utilization:
     34{{{#!bash
     35root@ventana:~# /unit_tests/mmdc2
    5836MMDC SUM
    5937
     
    7351Overall Bus Load: 5%
    7452Bytes Access: 59
    75 </pre><ul><li>notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC
    76 </li></ul></li><li>show ARM CPU utilization:
    77 <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 ARM
     53}}}
     54  - notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC
     55 * show ARM CPU utilization:
     56{{{#!bash
     57root@ventana:~# /unit_tests/mmdc2 ARM
    7858MMDC ARM
    7959
     
    9373Overall Bus Load: 5%
    9474Bytes Access: 29
    95 </pre></li><li>show DSP2 utilization (display output)
    96 <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 DSP2
     75}}}
     76 * show DSP2 utilization (display output)
     77{{{#!bash
     78root@ventana:~# /unit_tests/mmdc2 DSP2
    9779MMDC DSP2
    9880
     
    11294Overall Bus Load: 5%
    11395Bytes Access: 60
    114 </pre><ul><li>above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via <tt>cat 1 &gt; /sys/class/graphics/fb0/blank</tt> you will notice that the 20MB/s from DSP2 drops to 0.
    115 </li></ul></li></ul><p>
     96}}}
     97  - above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via {{{cat 1 > /sys/class/graphics/fb0/blank}}} you will notice that the 20MB/s from DSP2 drops to 0.
     98
    11699The meaning of some of the results is as follows:
    117 </p>
    118 <ul><li>Read, Write, Total: Number of MB/s during the configured window of time.
    119 </li><li>Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: <tt>(read_bytes + write_bytes) / (busy_cycles * 16) * 100</tt>
    120 </li><li>Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: <tt>busy_cycles / total_cycles * 100</tt>
    121 </li></ul><p>
     100* Read, Write, Total: Number of MB/s during the configured window of time.
     101* Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: (read_bytes + write_bytes) / (busy_cycles * 16) * 100
     102* Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: busy_cycles / total_cycles * 100
     103
    122104For more information see also:
    123 </p>
    124 <ul><li><a class="ext-link" href="http://developer.ridgerun.com/wiki/index.php?title=IMX6_Memory_Bandwidth_usage"><span class="icon">​</span>http://developer.ridgerun.com/wiki/index.php?title=IMX6_<b style="color:#000;background:#66ffff">Memory</b>_Bandwidth_usage</a>
    125 </li><li>IMX6DQRM - IMX6Dual/Quad reference manual
    126 </li><li>IMX6SDLRM - IMX6Solor/Dual-lite reference manual
    127 </li></ul><p>
    128 <span class="wikianchor" id="cma"></span>
    129 </p>
    130 <h2 id="LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</h2>
    131 <p>
    132 Some devices and device-drivers require big chunks of physically contiguous <b style="color:#000;background:#66ffff">memory</b>. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA <b style="color:#000;background:#66ffff">memory</b> and thus it is not available from the general pool for other applications. The amount of CMA <b style="color:#000;background:#66ffff">memory</b> reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument.
    133 </p>
    134 <p>
     105* [http://developer.ridgerun.com/wiki/index.php?title=IMX6_Memory_Bandwidth_usage IMX6 Memory Bandwidth usage]
     106* [http://cache.freescale.com/files/32bit/doc/ref_manual/IMX6DQRM.pdf IMX6DQRM - IMX6Dual/Quad reference manual]
     107* [http://cache.freescale.com/files/32bit/doc/ref_manual/IMX6SDLRM.pdf IMX6SDLRM - IMX6Solor/Dual-lite reference manual]
     108
     109[=#cma]
     110== Linux Contiguous Memory Allocator (CMA) ==
     111
     112Some devices and device-drivers require big chunks of physically contiguous memory. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA memory and thus it is not available from the general pool for other applications. The amount of CMA memory reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument.
     113
    135114An example of devices that require CMA would be video display devices/drivers, video capture devies/drivers, or GPU devices/drivers.
    136 </p>
    137 <p>
    138 The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board <b style="color:#000;background:#66ffff">memory</b> available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the <tt>mem</tt> bootloader paramater to disable the auto-configuration performed by the bootscript.
    139 </p>
    140 <p>
     115
     116The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board memory available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the mem bootloader paramater to disable the auto-configuration performed by the bootscript.
     117
    141118For more information see also:
    142 </p>
    143 <ul><li><a class="ext-link" href="https://lwn.net/Articles/486301/"><span class="icon">​</span>Linux CMA article</a>
    144 </li><li><a class="ext-link" href="http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt"><span class="icon">​</span>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt</a> DMA-API.txt]
    145 </li><li><a class="ext-link" href="http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt"><span class="icon">​</span>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt</a> DMA-API-HOWTO.txt]
    146 </li></ul><p>
    147 <span class="wikianchor" id="coherent"></span>
    148 </p>
    149 <h2 id="LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></h2>
    150 <p>
    151 Similar to <a class="wiki" href="/wiki/ventana/memory#cma">CMA</a> a special pool of coherent <b style="color:#000;background:#66ffff">memory</b> for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices.
    152 </p>
    153 }}}
     119 * [https://lwn.net/Articles/486301/ Linux CMA article]
     120 * [http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt DMA-API.txt]
     121 * [http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt DMA-API-HOWTO.txt]
     122
     123
     124[=#coherent]
     125== Linux Coherent memory ==
     126Similar to CMA a special pool of coherent memory for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices.