wiki:ventana/memory

Version 3 (modified by Ryan Erbstoesser, 6 years ago) ( diff )

add example for CMA

Ventana Memory

The Freescale IMX6 Multi-Mode DDR Controller (MMDC) is what interfaces the ARM cpu cores with the shared main memory.

All Ventana products use DDR3 SDRAM and the Secondary Program Loader (SPL) (also built from U-Boot code) that pre-ceeds the actual U-Boot bootloader is in charge of configuring the MMDC and DDR3. While the IMX6 MMDC has 2 32bit channels that can be used together for a 64bit memory architecture, each Ventana model differs because

Baseboard width chip arrangement Max Addressable 1
GW54xx/GW53xx 64bit 4x 16bit chips 4GB
GW51xx/GW52xx/GW552x/GW553x 32bit 2x 16bit chips 2GB
GW551x 16bit 1x 16bit chips 1GB
  1. Max Addressable is the maximum possible memory assuming today's DDR3 density - contact sales@… for information on available board models.

Memory Performance

The Freescale MMDC has some profiling support built in that can allow you to examine memory utilization at a per hardware-block level. A simple user application exists called mmdc2 that can be used to gather and analyze the counters and provide some feedback on current memory utilization.

By default the mmdc2 application is installed on the Gateworks Yocto BSP gateworks-image-multimedia and gateworks-image-gui images. It is available in the imx-test package and located in /unit_tests/mmdc2.

Example usage:

  • show usage:
    root@ventana:~# /unit_tests/mmdc2 -h
    MMDC DOES NOT KNOW -h 
    ======================MMDC v1.3===========================
    Usage: mmdc [ARM:DSP1:DSP2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:M4:PXP:USB:SUM] [...]
    export MMDC_SLEEPTIME can be used to define profiling duration.1 by default means 1s
    export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop.
    export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master
    Note1: More than 1 master can be inputed. They will be profiled one by one.
    Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead.
    
  • show total utilization:
    root@ventana:~# /unit_tests/mmdc2 
    MMDC SUM 
    
    MMDC new Profiling results:
    ***********************
    Measure time: 1001ms 
    Total cycles count: 528054912
    Busy cycles count: 27694059
    Read accesses count: 349427
    Write accesses count: 3281
    Read bytes count: 20971268
    Write bytes count: 99828
    Avg. Read burst size: 60
    Avg. Write burst size: 30
    Read: 19.98 MB/s /  Write: 0.10 MB/s  Total: 20.07 MB/s 
    Utilization: 4%
    Overall Bus Load: 5%
    Bytes Access: 59
    
    • notice the overall bandwidth used is 20MB/s. To find out 'what' specifically is using it, look at the other hardware blocks using the MMDC
  • show ARM CPU utilization:
    root@ventana:~# /unit_tests/mmdc2 ARM 
    MMDC ARM 
    
    MMDC new Profiling results:
    ***********************
    Measure time: 1000ms 
    Total cycles count: 528049328
    Busy cycles count: 27791413
    Read accesses count: 14119
    Write accesses count: 2974
    Read bytes count: 416840
    Write bytes count: 92288
    Avg. Read burst size: 29
    Avg. Write burst size: 31
    Read: 0.40 MB/s /  Write: 0.09 MB/s  Total: 0.49 MB/s 
    Utilization: 0%
    Overall Bus Load: 5%
    Bytes Access: 29
    
  • show DSP2 utilization (display output)
    root@ventana:~# /unit_tests/mmdc2 DSP2
    MMDC DSP2 
    
    MMDC new Profiling results:
    ***********************
    Measure time: 1000ms 
    Total cycles count: 528049384
    Busy cycles count: 27658698
    Read accesses count: 340772
    Write accesses count: 0
    Read bytes count: 20715488
    Write bytes count: 0
    Avg. Read burst size: 60
    Avg. Write burst size: 0
    Read: 19.76 MB/s /  Write: 0.00 MB/s  Total: 19.76 MB/s 
    Utilization: 4%
    Overall Bus Load: 5%
    Bytes Access: 60
    
    • above you can see the the majority of the 20MB/s is from the DSP2 (display output) block. The above is from a GW5400 with analog video out enabled, which uses IPU2 and thus DSP2. If you 'blank' the display via cat 1 > /sys/class/graphics/fb0/blank you will notice that the 20MB/s from DSP2 drops to 0.

The meaning of some of the results is as follows:

  • Read, Write, Total: Number of MB/s during the configured window of time.
  • Utilization: percentage of data transfered compared to the data that could be transferred if all the busy cycles are used to transfer data. It is calculated as: (read_bytes + write_bytes) / (busy_cycles * 16) * 100
  • Overall Bus Load: number of busy cycles compared to the total number of cycles in the time window. It is calculated as: busy_cycles / total_cycles * 100

For more information see also:

Linux Contiguous Memory Allocator (CMA)

Some devices and device-drivers require big chunks of physically contiguous memory. A perfect example is the IMX6 GPU which needs CMA for certain applications. The kernel must reserve CMA memory and thus it is not available from the general pool for other applications. The amount of CMA memory reserved by the kernel defaults to 0 (in the Gateworks kernel) and can be specified by the 'cma' kernel cmdline argument.

An example of devices that require CMA would be video display devices/drivers, video capture devies/drivers, or GPU devices/drivers.

The Yocto and Android BSP's have a bootscript that among other things comes up with a default cma allocation by looking at the total board memory available. If you find you need to alter this number (ie you do not want 'any' allocated) you can set the mem bootloader paramater to disable the auto-configuration performed by the bootscript.

To force a certain amount of CMA on Ventana, use the following command in the bootloader, adjusting the value (eg 96M) as needed:

setenv mem 'cma=96M'

For more information see also:

Linux Coherent memory

Similar to CMA a special pool of coherent memory for atomic dma allocations is made available by the kernel. By default this is set to 256K but can be changed by setting the 'coheremet_pool' kernel parameter. This is typically used for DMA capable devices such as PCI radio or video capture devices.

Note: See TracWiki for help on using the wiki.