| 1 | {{{#!html |
| 2 | <div id="wikipage" class="trac-content"><p> |
| 3 | </p><div class="wiki-toc"> |
| 4 | <ol> |
| 5 | <li> |
| 6 | <a href="#VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></a> |
| 7 | <ol> |
| 8 | <li> |
| 9 | <a href="#MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</a> |
| 10 | </li> |
| 11 | <li> |
| 12 | <a href="#LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</a> |
| 13 | </li> |
| 14 | <li> |
| 15 | <a href="#LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></a> |
| 16 | </li> |
| 17 | </ol> |
| 18 | </li> |
| 19 | </ol> |
| 20 | </div><p> |
| 21 | </p> |
| 22 | <h1 id="VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></h1> |
| 23 | <p> |
| 24 | The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main <b style="color:#000;background:#66ffff">memory</b>. |
| 25 | </p> |
| 26 | <p> |
| 27 | All <b style="color:#000;background:#ffff66">Ventana</b> products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit <b style="color:#000;background:#66ffff">memory</b> architecture, each <b style="color:#000;background:#ffff66">Ventana</b> model differs because |
| 28 | </p> |
| 29 | <table class="wiki"> |
| 30 | <tr><th> Baseboard </th><th> width </th><th> chip arrangement </th><th> Max Addressible<sup></sup><sup>1</sup><sup></sup> |
| 31 | </th></tr><tr><td> GW54xx/GW53xx </td><td> 64bit </td><td> 4x 16bit chips </td><td> 4GB |
| 32 | </td></tr><tr><td> GW51xx/GW52xx/GW552x/GW553x </td><td> 32bit </td><td> 2x 16bit chips </td><td> 2GB |
| 33 | </td></tr><tr><td> GW551x </td><td> 16bit </td><td> 1x 16bit chips </td><td> 1GB |
| 34 | </td></tr></table> |
| 35 | <ol><li>Max Addressible is the maximum possible <b style="color:#000;background:#66ffff">memory</b> assuming today's DDR3 density - contact sales@… for information on available board models. |
| 36 | </li></ol><h2 id="MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</h2> |
| 37 | <p> |
| 38 | The Freescale MMDC has some profiling support built in that can allow you to examine <b style="color:#000;background:#66ffff">memory</b> utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current <b style="color:#000;background:#66ffff">memory</b> utilization. |
| 39 | </p> |
| 40 | <p> |
| 41 | By default the mmdc2 application is installed on the Gateworks Yocto BSP gateworks-image-multimedia and gateworks-image-gui images. It is available in the imx-test package and located in /unit_tests/mmdc2. |
| 42 | </p> |
| 43 | <p> |
| 44 | Example usage: |
| 45 | </p> |
| 46 | <ul><li>show usage: |
| 47 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 -h |
| 48 | MMDC DOES NOT KNOW -h |
| 49 | ======================MMDC v1.3=========================== |
| 50 | Usage: mmdc [ARM:DSP1:DSP2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:M4:PXP:USB:SUM] [...] |
| 51 | export MMDC_SLEEPTIME can be used to define profiling duration.1 by default means 1s |
| 52 | export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop. |
| 53 | export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master |
| 54 | Note1: More than 1 master can be inputed. They will be profiled one by one. |
| 55 | Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead. |
| 56 | </pre></li><li>show total utilization: |
| 57 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 |
| 58 | MMDC SUM |
| 59 | |
| 60 | MMDC new Profiling results: |
| 61 | *********************** |
| 62 | Measure time: 1001ms |
| 63 | Total cycles count: 528054912 |
| 64 | Busy cycles count: 27694059 |
| 65 | Read accesses count: 349427 |
| 66 | Write accesses count: 3281 |
| 67 | Read bytes count: 20971268 |
| 68 | Write bytes count: 99828 |
| 69 | Avg. Read burst size: 60 |
| 70 | Avg. Write burst size: 30 |
| 71 | Read: 19.98 MB/s / Write: 0.10 MB/s Total: 20.07 MB/s |
| 72 | Utilization: 4% |
| 73 | Overall Bus Load: 5% |
| 74 | Bytes Access: 59 |
| 75 | </pre><ul><li>notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC |
| 76 | </li></ul></li><li>show ARM CPU utilization: |
| 77 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 ARM |
| 78 | MMDC ARM |
| 79 | |
| 80 | MMDC new Profiling results: |
| 81 | *********************** |
| 82 | Measure time: 1000ms |
| 83 | Total cycles count: 528049328 |
| 84 | Busy cycles count: 27791413 |
| 85 | Read accesses count: 14119 |
| 86 | Write accesses count: 2974 |
| 87 | Read bytes count: 416840 |
| 88 | Write bytes count: 92288 |
| 89 | Avg. Read burst size: 29 |
| 90 | Avg. Write burst size: 31 |
| 91 | Read: 0.40 MB/s / Write: 0.09 MB/s Total: 0.49 MB/s |
| 92 | Utilization: 0% |
| 93 | Overall Bus Load: 5% |
| 94 | Bytes Access: 29 |
| 95 | </pre></li><li>show DSP2 utilization (display output) |
| 96 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 DSP2 |
| 97 | MMDC DSP2 |
| 98 | |
| 99 | MMDC new Profiling results: |
| 100 | *********************** |
| 101 | Measure time: 1000ms |
| 102 | Total cycles count: 528049384 |
| 103 | Busy cycles count: 27658698 |
| 104 | Read accesses count: 340772 |
| 105 | Write accesses count: 0 |
| 106 | Read bytes count: 20715488 |
| 107 | Write bytes count: 0 |
| 108 | Avg. Read burst size: 60 |
| 109 | Avg. Write burst size: 0 |
| 110 | Read: 19.76 MB/s / Write: 0.00 MB/s Total: 19.76 MB/s |
| 111 | Utilization: 4% |
| 112 | Overall Bus Load: 5% |
| 113 | Bytes Access: 60 |
| 114 | </pre><ul><li>above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via <tt>cat 1 > /sys/class/graphics/fb0/blank</tt> you will notice that the 20MB/s from DSP2 drops to 0. |
| 115 | </li></ul></li></ul><p> |
| 116 | The meaning of some of the results is as follows: |
| 117 | </p> |
| 118 | <ul><li>Read, Write, Total: Number of MB/s during the configured window of time. |
| 119 | </li><li>Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: <tt>(read_bytes + write_bytes) / (busy_cycles * 16) * 100</tt> |
| 120 | </li><li>Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: <tt>busy_cycles / total_cycles * 100</tt> |
| 121 | </li></ul><p> |
| 122 | For more information see also: |
| 123 | </p> |
| 124 | <ul><li><a class="ext-link" href="http://developer.ridgerun.com/wiki/index.php?title=IMX6_Memory_Bandwidth_usage"><span class="icon"></span>http://developer.ridgerun.com/wiki/index.php?title=IMX6_<b style="color:#000;background:#66ffff">Memory</b>_Bandwidth_usage</a> |
| 125 | </li><li>IMX6DQRM - IMX6Dual/Quad reference manual |
| 126 | </li><li>IMX6SDLRM - IMX6Solor/Dual-lite reference manual |
| 127 | </li></ul><p> |
| 128 | <span class="wikianchor" id="cma"></span> |
| 129 | </p> |
| 130 | <h2 id="LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</h2> |
| 131 | <p> |
| 132 | Some devices and device-drivers require big chunks of physically contiguous <b style="color:#000;background:#66ffff">memory</b>. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA <b style="color:#000;background:#66ffff">memory</b> and thus it is not available from the general pool for other applications. The amount of CMA <b style="color:#000;background:#66ffff">memory</b> reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument. |
| 133 | </p> |
| 134 | <p> |
| 135 | An example of devices that require CMA would be video display devices/drivers, video capture devies/drivers, or GPU devices/drivers. |
| 136 | </p> |
| 137 | <p> |
| 138 | The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board <b style="color:#000;background:#66ffff">memory</b> available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the <tt>mem</tt> bootloader paramater to disable the auto-configuration performed by the bootscript. |
| 139 | </p> |
| 140 | <p> |
| 141 | For more information see also: |
| 142 | </p> |
| 143 | <ul><li><a class="ext-link" href="https://lwn.net/Articles/486301/"><span class="icon"></span>Linux CMA article</a> |
| 144 | </li><li><a class="ext-link" href="http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt"><span class="icon"></span>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt</a> DMA-API.txt] |
| 145 | </li><li><a class="ext-link" href="http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt"><span class="icon"></span>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt</a> DMA-API-HOWTO.txt] |
| 146 | </li></ul><p> |
| 147 | <span class="wikianchor" id="coherent"></span> |
| 148 | </p> |
| 149 | <h2 id="LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></h2> |
| 150 | <p> |
| 151 | Similar to <a class="wiki" href="/wiki/ventana/memory#cma">CMA</a> a special pool of coherent <b style="color:#000;background:#66ffff">memory</b> for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices. |
| 152 | </p> |
| 153 | }}} |