     Ventana Memory
     The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main memory.
     All Ventana products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit memory architecture, each Ventana model differs because
     29<table class="wiki">
     Baseboard           | width | chip arrangement | Max Addressible
     GW54xx/GW53xx         | 64bit   | 4x 16bit chips     | 4GB                   
     GW51xx/GW52xx/GW552x/GW553x | 32bit   | 2x 16bit chips     | 2GB                   
     GW551x                | 16bit   | 1x 16bit chips     | 1GB                   
     Max Addressible is the maximum possible memory assuming today's DDR3 density - contact sales@… for information on available board models.
     Memory Performance
     The Freescale MMDC has some profiling support built in that can allow you to examine memory utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current memory utilization.
     41By default the mmdc2 application is installed on the Gateworks Yocto BSP gateworks-image-multimedia and gateworks-image-gui images. It is available in the imx-test package and located in /unit_tests/mmdc2.
     44Example usage:
     show usage:
     root@ventana:~# /unit_tests/mmdc2 -h
     49======================MMDC v1.3===========================
     50Usage: mmdc [ARM:DSP1:DSP2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:M4:PXP:USB:SUM] [...]
     51export MMDC_SLEEPTIME can be used to define profiling duration.1 by default means 1s
     52export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop.
     53export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master
     54Note1: More than 1 master can be inputed. They will be profiled one by one.
     55Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead.
     show total utilization:
     root@ventana:~# /unit_tests/mmdc2
     58MMDC SUM
     60MMDC new Profiling results:
     62Measure time: 1001ms
     63Total cycles count: 528054912
     64Busy cycles count: 27694059
     65Read accesses count: 349427
     66Write accesses count: 3281
     67Read bytes count: 20971268
     68Write bytes count: 99828
     69Avg. Read burst size: 60
     70Avg. Write burst size: 30
     71Read: 19.98 MB/s /  Write: 0.10 MB/s  Total: 20.07 MB/s
     72Utilization: 4%
     73Overall Bus Load: 5%
     74Bytes Access: 59
     notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC
     show ARM CPU utilization:
     root@ventana:~# /unit_tests/mmdc2 ARM
     78MMDC ARM
     80MMDC new Profiling results:
     82Measure time: 1000ms
     83Total cycles count: 528049328
     84Busy cycles count: 27791413
     85Read accesses count: 14119
     86Write accesses count: 2974
     87Read bytes count: 416840
     88Write bytes count: 92288
     89Avg. Read burst size: 29
     90Avg. Write burst size: 31
     91Read: 0.40 MB/s /  Write: 0.09 MB/s  Total: 0.49 MB/s
     92Utilization: 0%
     93Overall Bus Load: 5%
     94Bytes Access: 29
     show DSP2 utilization (display output)
     root@ventana:~# /unit_tests/mmdc2 DSP2
     97MMDC DSP2
     99MMDC new Profiling results:
     101Measure time: 1000ms
     102Total cycles count: 528049384
     103Busy cycles count: 27658698
     104Read accesses count: 340772
     105Write accesses count: 0
     106Read bytes count: 20715488
     107Write bytes count: 0
     108Avg. Read burst size: 60
     109Avg. Write burst size: 0
     110Read: 19.76 MB/s /  Write: 0.00 MB/s  Total: 19.76 MB/s
     111Utilization: 4%
     112Overall Bus Load: 5%
     113Bytes Access: 60
     above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via cat 1 > /sys/class/graphics/fb0/blank you will notice that the 20MB/s from DSP2 drops to 0.

The meaning of some of the results is as follows:
     116The meaning of some of the results is as follows:
     Read, Write, Total: Number of MB/s during the configured window of time.
     Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: (read_bytes + write_bytes) / (busy_cycles * 16) * 100
     Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: busy_cycles / total_cycles * 100

For more information see also:
     122For more information see also:
     Memory_Bandwidth_usage
     125</li><li>IMX6DQRM - IMX6Dual/Quad reference manual
     126</li><li>IMX6SDLRM - IMX6Solor/Dual-lite reference manual
     128<span class="wikianchor" id="cma"></span>
     Linux Contiguous Memory Allocator (CMA)
     Some devices and device-drivers require big chunks of physically contiguous memory. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA memory and thus it is not available from the general pool for other applications. The amount of CMA memory reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument.
     135An example of devices that require CMA would be video display devices/drivers, video capture devies/drivers, or GPU devices/drivers.
     The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board memory available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the mem bootloader paramater to disable the auto-configuration performed by the bootscript.

For more information see also:
     141For more information see also:
     Linux CMA article
     DMA-API.txt
     DMA-API-HOWTO.txt
     147<span class="wikianchor" id="coherent"></span>
     Linux Coherent memory
     Similar to CMA a special pool of coherent memory for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices.