| | 1 | {{{#!html |
| | 2 | <div id="wikipage" class="trac-content"><p> |
| | 3 | </p><div class="wiki-toc"> |
| | 4 | <ol> |
| | 5 | <li> |
| | 6 | <a href="#VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></a> |
| | 7 | <ol> |
| | 8 | <li> |
| | 9 | <a href="#MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</a> |
| | 10 | </li> |
| | 11 | <li> |
| | 12 | <a href="#LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</a> |
| | 13 | </li> |
| | 14 | <li> |
| | 15 | <a href="#LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></a> |
| | 16 | </li> |
| | 17 | </ol> |
| | 18 | </li> |
| | 19 | </ol> |
| | 20 | </div><p> |
| | 21 | </p> |
| | 22 | <h1 id="VentanaMemory"><b style="color:#000;background:#ffcc99">Ventana Memory</b></h1> |
| | 23 | <p> |
| | 24 | The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main <b style="color:#000;background:#66ffff">memory</b>. |
| | 25 | </p> |
| | 26 | <p> |
| | 27 | All <b style="color:#000;background:#ffff66">Ventana</b> products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit <b style="color:#000;background:#66ffff">memory</b> architecture, each <b style="color:#000;background:#ffff66">Ventana</b> model differs because |
| | 28 | </p> |
| | 29 | <table class="wiki"> |
| | 30 | <tr><th> Baseboard </th><th> width </th><th> chip arrangement </th><th> Max Addressible<sup></sup><sup>1</sup><sup></sup> |
| | 31 | </th></tr><tr><td> GW54xx/GW53xx </td><td> 64bit </td><td> 4x 16bit chips </td><td> 4GB |
| | 32 | </td></tr><tr><td> GW51xx/GW52xx/GW552x/GW553x </td><td> 32bit </td><td> 2x 16bit chips </td><td> 2GB |
| | 33 | </td></tr><tr><td> GW551x </td><td> 16bit </td><td> 1x 16bit chips </td><td> 1GB |
| | 34 | </td></tr></table> |
| | 35 | <ol><li>Max Addressible is the maximum possible <b style="color:#000;background:#66ffff">memory</b> assuming today's DDR3 density - contact sales@… for information on available board models. |
| | 36 | </li></ol><h2 id="MemoryPerformance"><b style="color:#000;background:#66ffff">Memory</b> Performance</h2> |
| | 37 | <p> |
| | 38 | The Freescale MMDC has some profiling support built in that can allow you to examine <b style="color:#000;background:#66ffff">memory</b> utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current <b style="color:#000;background:#66ffff">memory</b> utilization. |
| | 39 | </p> |
| | 40 | <p> |
| | 41 | By default the mmdc2 application is installed on the Gateworks Yocto BSP gateworks-image-multimedia and gateworks-image-gui images. It is available in the imx-test package and located in /unit_tests/mmdc2. |
| | 42 | </p> |
| | 43 | <p> |
| | 44 | Example usage: |
| | 45 | </p> |
| | 46 | <ul><li>show usage: |
| | 47 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 -h |
| | 48 | MMDC DOES NOT KNOW -h |
| | 49 | ======================MMDC v1.3=========================== |
| | 50 | Usage: mmdc [ARM:DSP1:DSP2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:M4:PXP:USB:SUM] [...] |
| | 51 | export MMDC_SLEEPTIME can be used to define profiling duration.1 by default means 1s |
| | 52 | export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop. |
| | 53 | export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master |
| | 54 | Note1: More than 1 master can be inputed. They will be profiled one by one. |
| | 55 | Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead. |
| | 56 | </pre></li><li>show total utilization: |
| | 57 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 |
| | 58 | MMDC SUM |
| | 59 | |
| | 60 | MMDC new Profiling results: |
| | 61 | *********************** |
| | 62 | Measure time: 1001ms |
| | 63 | Total cycles count: 528054912 |
| | 64 | Busy cycles count: 27694059 |
| | 65 | Read accesses count: 349427 |
| | 66 | Write accesses count: 3281 |
| | 67 | Read bytes count: 20971268 |
| | 68 | Write bytes count: 99828 |
| | 69 | Avg. Read burst size: 60 |
| | 70 | Avg. Write burst size: 30 |
| | 71 | Read: 19.98 MB/s / Write: 0.10 MB/s Total: 20.07 MB/s |
| | 72 | Utilization: 4% |
| | 73 | Overall Bus Load: 5% |
| | 74 | Bytes Access: 59 |
| | 75 | </pre><ul><li>notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC |
| | 76 | </li></ul></li><li>show ARM CPU utilization: |
| | 77 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 ARM |
| | 78 | MMDC ARM |
| | 79 | |
| | 80 | MMDC new Profiling results: |
| | 81 | *********************** |
| | 82 | Measure time: 1000ms |
| | 83 | Total cycles count: 528049328 |
| | 84 | Busy cycles count: 27791413 |
| | 85 | Read accesses count: 14119 |
| | 86 | Write accesses count: 2974 |
| | 87 | Read bytes count: 416840 |
| | 88 | Write bytes count: 92288 |
| | 89 | Avg. Read burst size: 29 |
| | 90 | Avg. Write burst size: 31 |
| | 91 | Read: 0.40 MB/s / Write: 0.09 MB/s Total: 0.49 MB/s |
| | 92 | Utilization: 0% |
| | 93 | Overall Bus Load: 5% |
| | 94 | Bytes Access: 29 |
| | 95 | </pre></li><li>show DSP2 utilization (display output) |
| | 96 | <pre class="wiki">root@<b style="color:#000;background:#ffff66">ventana</b>:~# /unit_tests/mmdc2 DSP2 |
| | 97 | MMDC DSP2 |
| | 98 | |
| | 99 | MMDC new Profiling results: |
| | 100 | *********************** |
| | 101 | Measure time: 1000ms |
| | 102 | Total cycles count: 528049384 |
| | 103 | Busy cycles count: 27658698 |
| | 104 | Read accesses count: 340772 |
| | 105 | Write accesses count: 0 |
| | 106 | Read bytes count: 20715488 |
| | 107 | Write bytes count: 0 |
| | 108 | Avg. Read burst size: 60 |
| | 109 | Avg. Write burst size: 0 |
| | 110 | Read: 19.76 MB/s / Write: 0.00 MB/s Total: 19.76 MB/s |
| | 111 | Utilization: 4% |
| | 112 | Overall Bus Load: 5% |
| | 113 | Bytes Access: 60 |
| | 114 | </pre><ul><li>above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via <tt>cat 1 > /sys/class/graphics/fb0/blank</tt> you will notice that the 20MB/s from DSP2 drops to 0. |
| | 115 | </li></ul></li></ul><p> |
| | 116 | The meaning of some of the results is as follows: |
| | 117 | </p> |
| | 118 | <ul><li>Read, Write, Total: Number of MB/s during the configured window of time. |
| | 119 | </li><li>Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: <tt>(read_bytes + write_bytes) / (busy_cycles * 16) * 100</tt> |
| | 120 | </li><li>Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: <tt>busy_cycles / total_cycles * 100</tt> |
| | 121 | </li></ul><p> |
| | 122 | For more information see also: |
| | 123 | </p> |
| | 124 | <ul><li><a class="ext-link" href="http://developer.ridgerun.com/wiki/index.php?title=IMX6_Memory_Bandwidth_usage"><span class="icon"></span>http://developer.ridgerun.com/wiki/index.php?title=IMX6_<b style="color:#000;background:#66ffff">Memory</b>_Bandwidth_usage</a> |
| | 125 | </li><li>IMX6DQRM - IMX6Dual/Quad reference manual |
| | 126 | </li><li>IMX6SDLRM - IMX6Solor/Dual-lite reference manual |
| | 127 | </li></ul><p> |
| | 128 | <span class="wikianchor" id="cma"></span> |
| | 129 | </p> |
| | 130 | <h2 id="LinuxContiguousMemoryAllocatorCMA">Linux Contiguous <b style="color:#000;background:#66ffff">Memory</b> Allocator (CMA)</h2> |
| | 131 | <p> |
| | 132 | Some devices and device-drivers require big chunks of physically contiguous <b style="color:#000;background:#66ffff">memory</b>. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA <b style="color:#000;background:#66ffff">memory</b> and thus it is not available from the general pool for other applications. The amount of CMA <b style="color:#000;background:#66ffff">memory</b> reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument. |
| | 133 | </p> |
| | 134 | <p> |
| | 135 | An example of devices that require CMA would be video display devices/drivers, video capture devies/drivers, or GPU devices/drivers. |
| | 136 | </p> |
| | 137 | <p> |
| | 138 | The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board <b style="color:#000;background:#66ffff">memory</b> available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the <tt>mem</tt> bootloader paramater to disable the auto-configuration performed by the bootscript. |
| | 139 | </p> |
| | 140 | <p> |
| | 141 | For more information see also: |
| | 142 | </p> |
| | 143 | <ul><li><a class="ext-link" href="https://lwn.net/Articles/486301/"><span class="icon"></span>Linux CMA article</a> |
| | 144 | </li><li><a class="ext-link" href="http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt"><span class="icon"></span>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt</a> DMA-API.txt] |
| | 145 | </li><li><a class="ext-link" href="http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt"><span class="icon"></span>http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt</a> DMA-API-HOWTO.txt] |
| | 146 | </li></ul><p> |
| | 147 | <span class="wikianchor" id="coherent"></span> |
| | 148 | </p> |
| | 149 | <h2 id="LinuxCoherentmemory">Linux Coherent <b style="color:#000;background:#66ffff">memory</b></h2> |
| | 150 | <p> |
| | 151 | Similar to <a class="wiki" href="/wiki/ventana/memory#cma">CMA</a> a special pool of coherent <b style="color:#000;background:#66ffff">memory</b> for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices. |
| | 152 | </p> |
| | 153 | }}} |