Changes between Initial Version and Version 1 of ventana/memory

10/24/2017 04:37:41 AM (5 years ago)



  • ventana/memory

    v1 v1  
     2          <div id="wikipage" class="trac-content"><p>
     3</p><div class="wiki-toc">
     5  <li>
     6    <a href="#VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></a>
     7    <ol>
     8      <li>
     9        <a href="#MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</a>
     10      </li>
     11      <li>
     12        <a href="#LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</a>
     13      </li>
     14      <li>
     15        <a href="#LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></a>
     16      </li>
     17    </ol>
     18  </li>
     22<h1 id="VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></h1>
     24The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main <b style="color:#000;background:#66ffff">memory</b>.
     27All <b style="color:#000;background:#ffff66">Ventana</b> products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit <b style="color:#000;background:#66ffff">memory</b> architecture, each <b style="color:#000;background:#ffff66">Ventana</b> model differs because
     29<table class="wiki">
     30<tr><th> Baseboard           </th><th> width </th><th> chip arrangement </th><th> Max Addressible<sup></sup><sup>1</sup><sup></sup>
     31</th></tr><tr><td> GW54xx/GW53xx         </td><td> 64bit   </td><td> 4x 16bit chips     </td><td> 4GB                   
     32</td></tr><tr><td> GW51xx/GW52xx/GW552x/GW553x </td><td> 32bit   </td><td> 2x 16bit chips     </td><td> 2GB                   
     33</td></tr><tr><td> GW551x                </td><td> 16bit   </td><td> 1x 16bit chips     </td><td> 1GB                   
     35<ol><li>Max Addressible is the maximum possible <b style="color:#000;background:#66ffff">memory</b> assuming today's DDR3 density - contact sales@… for information on available board models.
     36</li></ol><h2 id="MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</h2>
     38The Freescale MMDC has some profiling support built in that can allow you to examine <b style="color:#000;background:#66ffff">memory</b> utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current <b style="color:#000;background:#66ffff">memory</b> utilization.
     41By default the mmdc2 application is installed on the Gateworks Yocto BSP gateworks-image-multimedia and gateworks-image-gui images. It is available in the imx-test package and located in /unit_tests/mmdc2.
     44Example usage:
     46<ul><li>show usage:
     47<pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 -h
     49======================MMDC v1.3===========================
     50Usage: mmdc [ARM:DSP1:DSP2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:M4:PXP:USB:SUM] [...]
     51export MMDC_SLEEPTIME can be used to define profiling duration.1 by default means 1s
     52export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop.
     53export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master
     54Note1: More than 1 master can be inputed. They will be profiled one by one.
     55Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead.
     56</pre></li><li>show total utilization:
     57<pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2
     58MMDC SUM
     60MMDC new Profiling results:
     62Measure time: 1001ms
     63Total cycles count: 528054912
     64Busy cycles count: 27694059
     65Read accesses count: 349427
     66Write accesses count: 3281
     67Read bytes count: 20971268
     68Write bytes count: 99828
     69Avg. Read burst size: 60
     70Avg. Write burst size: 30
     71Read: 19.98 MB/s /  Write: 0.10 MB/s  Total: 20.07 MB/s
     72Utilization: 4%
     73Overall Bus Load: 5%
     74Bytes Access: 59
     75</pre><ul><li>notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC
     76</li></ul></li><li>show ARM CPU utilization:
     77<pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 ARM
     78MMDC ARM
     80MMDC new Profiling results:
     82Measure time: 1000ms
     83Total cycles count: 528049328
     84Busy cycles count: 27791413
     85Read accesses count: 14119
     86Write accesses count: 2974
     87Read bytes count: 416840
     88Write bytes count: 92288
     89Avg. Read burst size: 29
     90Avg. Write burst size: 31
     91Read: 0.40 MB/s /  Write: 0.09 MB/s  Total: 0.49 MB/s
     92Utilization: 0%
     93Overall Bus Load: 5%
     94Bytes Access: 29
     95</pre></li><li>show DSP2 utilization (display output)
     96<pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 DSP2
     97MMDC DSP2
     99MMDC new Profiling results:
     101Measure time: 1000ms
     102Total cycles count: 528049384
     103Busy cycles count: 27658698
     104Read accesses count: 340772
     105Write accesses count: 0
     106Read bytes count: 20715488
     107Write bytes count: 0
     108Avg. Read burst size: 60
     109Avg. Write burst size: 0
     110Read: 19.76 MB/s /  Write: 0.00 MB/s  Total: 19.76 MB/s
     111Utilization: 4%
     112Overall Bus Load: 5%
     113Bytes Access: 60
     114</pre><ul><li>above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via <tt>cat 1 &gt; /sys/class/graphics/fb0/blank</tt> you will notice that the 20MB/s from DSP2 drops to 0.
     116The meaning of some of the results is as follows:
     118<ul><li>Read, Write, Total: Number of MB/s during the configured window of time.
     119</li><li>Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: <tt>(read_bytes + write_bytes) / (busy_cycles * 16) * 100</tt>
     120</li><li>Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: <tt>busy_cycles / total_cycles * 100</tt>
     122For more information see also:
     124<ul><li><a class="ext-link" href=""><span class="icon">​</span><b style="color:#000;background:#66ffff">Memory</b>_Bandwidth_usage</a>
     125</li><li>IMX6DQRM - IMX6Dual/Quad reference manual
     126</li><li>IMX6SDLRM - IMX6Solor/Dual-lite reference manual
     128<span class="wikianchor" id="cma"></span>
     130<h2 id="LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</h2>
     132Some devices and device-drivers require big chunks of physically contiguous <b style="color:#000;background:#66ffff">memory</b>. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA <b style="color:#000;background:#66ffff">memory</b> and thus it is not available from the general pool for other applications. The amount of CMA <b style="color:#000;background:#66ffff">memory</b> reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument.
     135An example of devices that require CMA would be video display devices/drivers, video capture devies/drivers, or GPU devices/drivers.
     138The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board <b style="color:#000;background:#66ffff">memory</b> available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the <tt>mem</tt> bootloader paramater to disable the auto-configuration performed by the bootscript.
     141For more information see also:
     143<ul><li><a class="ext-link" href=""><span class="icon">​</span>Linux CMA article</a>
     144</li><li><a class="ext-link" href=""><span class="icon">​</span></a> DMA-API.txt]
     145</li><li><a class="ext-link" href=""><span class="icon">​</span></a> DMA-API-HOWTO.txt]
     147<span class="wikianchor" id="coherent"></span>
     149<h2 id="LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></h2>
     151Similar to <a class="wiki" href="/wiki/ventana/memory#cma">CMA</a> a special pool of coherent <b style="color:#000;background:#66ffff">memory</b> for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices.