[[PageOutline]] = Ventana Memory = The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main memory. All Ventana products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit memory architecture, each Ventana model differs because ||= Baseboard =||= width =||= chip arrangement =||= Max Addressable ^^^1^^^ =|| || GW54xx/GW53xx || 64bit || 4x 16bit chips || 4GB || || GW51xx/GW52xx/GW552x/GW553x || 32bit || 2x 16bit chips || 2GB || || GW551x || 16bit || 1x 16bit chips || 1GB || 1. Max Addressable is the maximum possible memory assuming today's DDR3 density - contact sales@… for information on available board models. == Memory Performance == The Freescale MMDC has some profiling support built in that can allow you to examine memory utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current memory utilization. By default the mmdc2 application is installed on the Gateworks Yocto BSP gateworks-image-multimedia and gateworks-image-gui images. It is available in the imx-test package and located in /unit_tests/mmdc2. Example usage: * show usage: {{{#!bash root@ventana:~# /unit_tests/mmdc2 -h MMDC DOES NOT KNOW -h ======================MMDC v1.3=========================== Usage: mmdc [ARM:DSP1:DSP2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:M4:PXP:USB:SUM] [...] export MMDC_SLEEPTIME can be used to define profiling duration.1 by default means 1s export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop. export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master Note1: More than 1 master can be inputed. They will be profiled one by one. Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead. }}} * show total utilization: {{{#!bash root@ventana:~# /unit_tests/mmdc2 MMDC SUM MMDC new Profiling results: *********************** Measure time: 1001ms Total cycles count: 528054912 Busy cycles count: 27694059 Read accesses count: 349427 Write accesses count: 3281 Read bytes count: 20971268 Write bytes count: 99828 Avg. Read burst size: 60 Avg. Write burst size: 30 Read: 19.98 MB/s / Write: 0.10 MB/s Total: 20.07 MB/s Utilization: 4% Overall Bus Load: 5% Bytes Access: 59 }}} - notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC * show ARM CPU utilization: {{{#!bash root@ventana:~# /unit_tests/mmdc2 ARM MMDC ARM MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528049328 Busy cycles count: 27791413 Read accesses count: 14119 Write accesses count: 2974 Read bytes count: 416840 Write bytes count: 92288 Avg. Read burst size: 29 Avg. Write burst size: 31 Read: 0.40 MB/s / Write: 0.09 MB/s Total: 0.49 MB/s Utilization: 0% Overall Bus Load: 5% Bytes Access: 29 }}} * show DSP2 utilization (display output) {{{#!bash root@ventana:~# /unit_tests/mmdc2 DSP2 MMDC DSP2 MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528049384 Busy cycles count: 27658698 Read accesses count: 340772 Write accesses count: 0 Read bytes count: 20715488 Write bytes count: 0 Avg. Read burst size: 60 Avg. Write burst size: 0 Read: 19.76 MB/s / Write: 0.00 MB/s Total: 19.76 MB/s Utilization: 4% Overall Bus Load: 5% Bytes Access: 60 }}} - above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via {{{cat 1 > /sys/class/graphics/fb0/blank}}} you will notice that the 20MB/s from DSP2 drops to 0. The meaning of some of the results is as follows: * Read, Write, Total: Number of MB/s during the configured window of time. * Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: (read_bytes + write_bytes) / (busy_cycles * 16) * 100 * Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: busy_cycles / total_cycles * 100 For more information see also: * [http://developer.ridgerun.com/wiki/index.php?title=IMX6_Memory_Bandwidth_usage IMX6 Memory Bandwidth usage] * [http://cache.freescale.com/files/32bit/doc/ref_manual/IMX6DQRM.pdf IMX6DQRM - IMX6Dual/Quad reference manual] * [http://cache.freescale.com/files/32bit/doc/ref_manual/IMX6SDLRM.pdf IMX6SDLRM - IMX6Solor/Dual-lite reference manual] [=#cma] == Linux Contiguous Memory Allocator (CMA) == Some devices and device-drivers require big chunks of physically contiguous memory. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA memory and thus it is not available from the general pool for other applications. The amount of CMA memory reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument. An example of devices that require CMA would be video display devices/drivers, video capture devies/drivers, or GPU devices/drivers. The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board memory available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the mem bootloader paramater to disable the auto-configuration performed by the bootscript. To force a certain amount of CMA on Ventana, use the following command in the bootloader, adjusting the value (eg 96M) as needed: {{{ setenv mem 'cma=96M' }}} For more information see also: * [https://lwn.net/Articles/486301/ Linux CMA article] * [http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API.txt DMA-API.txt] * [http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/DMA-API-HOWTO.txt DMA-API-HOWTO.txt] [=#coherent] == Linux Coherent memory == Similar to CMA a special pool of coherent memory for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices.