wiki:multicoreprocessing

Multi-Core Processing

Gateworks has single board computers with single, dual, and quad core processors.

This page is to reference only those boards who are using dual and quad core processors.

We encourage customers with a Ventana board to leverage the IMX community at Freescale https://community.freescale.com/community/imx

References: PLEASE UTILIZE:

Sample top command Shows Processor usage

root@OpenWrt:/#top

Mem: 37212K used, 728856K free, 0K shrd, 1212K buff, 8100K cached
CPU:   0% usr   0% sys   0% nic 100% idle   0% io   0% irq   '''0% sirq'''
Load average: 0.01 0.02 0.05 1/78 23125
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
23125  1174 root     R     1216   0%   1   0% top
 2575     1 root     S     8228   1%   0   0% /usr/sbin/collectd
 2617     1 root     S     2880   0%   2   0% batmand ath0
 1423     1 root     S     1224   0%   3   0% /sbin/syslogd -C16
 1174     1 root     S     1224   0%   3   0% /bin/ash --login
    1     0 root     S     1220   0%   0   0% init
 2630     1 root     S     1216   0%   0   0% /usr/sbin/ntpd -n -p 0.openwrt.po
 2491     1 root     S     1216   0%   3   0% /sbin/watchdog -t 5 /dev/watchdog
 2468     1 root     S     1208   0%   1   0% /usr/sbin/telnetd -l /bin/login.s
 1425     1 root     S     1204   0%   2   0% /sbin/klogd
 1441     1 root     S     1120   0%   2   0% /sbin/netifd
 1434     1 root     S      944   0%   3   0% /sbin/procd
 2335  1434 root     S      908   0%   1   0% /usr/sbin/dropbear -F -P /var/run
 2476     1 root     S      868   0%   2   0% /usr/sbin/uhttpd -f -h /www -r Op
 1427     1 root     S      836   0%   2   0% /sbin/hotplug2 --override --persi
 2558     1 nobody   S      768   0%   1   0% /usr/sbin/dnsmasq -C /var/etc/dns
 2640     1 root     S      748   0%   3   0% /usr/sbin/vnstatd -d
 1437  1434 root     S <    668   0%   2   0% ubusd
  528     2 root     SW       0   0%   0   0% [kworker/0:1]
  620     2 root     SW       0   0%   0   0% [kworker/u:3]

SMP Affinity (interrupt steering)

Symmetric multiprocessing (SMP)

The 'affinity' of an interrupt handler can be get/set via /proc/irq/<interrupt>/smp_affinity which is a bitmask of what CPU cores the interrupt handler can run on. By default the affinity for each handler is set to allow all available cores (ie for a dual-core system a value of 3 means bit0 (CPU0) and bit1 (CPU1) are both set). If you want a particular interrupt handler to always occur on a specific CPU you can change that bitmask. To see what interrupt handlers are configured and what interrupt they are on look at /proc/interrupts.

For example to set the MMC interrupt handler on a dual-core Laguna (cns3xxx) board to only run on CPU1 (and never CPU0):

cat /proc/interrupts ;# see interrupt mapping
echo 2 > /proc/irq/33/smp_affinity ;# set MMC irq to CPU1

References PLEASE UTILIZE:

Sample of command /cat/proc/interrupts

root@OpenWrt:/# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
 29:       6498       6713       4601      11306       GIC  twd
 34:          1          0          0          0       GIC  sdma
 45:       6173          0          0          0       GIC  mxs-dma
 47:          1          0          0          0       GIC  bch
 56:       3070          0          0          0       GIC  mmc0
 59:         80          0          0          0       GIC  IMX-uart
 68:         32          0          0          0       GIC  21a0000.i2c
 69:          0          0          0          0       GIC  21a4000.i2c
 70:         91          0          0          0       GIC  21a8000.i2c
 72:         29          0          0          0       GIC  ci13xxx_imx
 78:          0          0          0          0       GIC  ssi@02028000
 87:          9          0          0          0       GIC  i.MX Timer Tick
150:       3435          0          0          0       GIC  2188000.ethernet
151:          0          0          0          0       GIC  2188000.ethernet
153:          0          0          0          0       GIC  ath9k
154:          0          0          0          0       GIC  ath9k, ath9k
155:          0          0          0          0       GIC  ath9k
352:          0          0          0          0  gpio-mxc  mmc0
407:          0          0          0          0       IPU  imx_drm
412:          0          0          0          0       IPU  imx_drm
567:          0          0          0          0       IPU  imx_drm
572:          0          0          0          0       IPU  imx_drm
IPI0:          0          0          0          0  CPU wakeup interrupts
IPI1:          0          0          0          0  Timer broadcast interrupts
IPI2:       5027       5875       5953       5026  Rescheduling interrupts
IPI3:          5          4          4          5  Function call interrupts
IPI4:          3          6          6          3  Single function call interrupts
IPI5:          0          0          0          0  CPU stop interrupts
Err:          0

PCI Interrupt steering

The PCI specification calls out 4 interrupts (INTA/INTB/INTC/INTD) that are routed to PCI slots. Each slot gets two interrupts and they are shared with other slots dependent on board layout (in a technique called swizzled or barber-polled). This means that if you have a board with 4 PCI slots you can have a single unique interrupt for each slot, however if you have 5 slots or more, those extra slots will share an interrupt with another slot. If you can populate your slots such that you have unique interrupts, you can use smp affinity (above) to configure different CPU cores for the interrupt handlers of those slots which can greatly help performance if the bottleneck is interrupt processing (which usually a the 'top' linux command will help determine).

Note that performance gains are difficult to quantify as there are many factors at play. In general, you can 'tune' your system by using 'top' which shows CPU utilization (per core if you hit the '1' key while running) and moving things around to better balance your system. In general, if you have one core being underutilized, try to spread the load.

Ventana

The IMX6 SoC used on The Gateworks Ventana product family has 4 'legacy' interrupts to support the PCI INTA/INTB/INTC/INTD interrupts:

  • 152 pin1/INTD (also used as the MSI int)
  • 153 pin2/INTC
  • 154 pin3/INTB
  • 155 pin4/INTA

Which slot is routed to each depends on the baseboard and expansion mezzanine board stackup and the best way to determine the mapping for your particular board stackup is to populate a device one slot at a time and check /proc/interrupts for the mapping.

Depending on your interrupt routing (board stackup), device slot placement (what device is in what slot), and CPU (number of cores) you can then choose to spread interrupts according to your application needs.

Laguna

The cns3xxx uses irq61 for pcie0_intr which in the case of a PCIe-to-PCI bridge ends up combining INTA/B/C/D on a single ARM CPU interrupt. This is not optimal when you have multiple cores. To overcome this limitation an enhancement was made on the GW2388-4-H (the model of your GW2388 is displayed by the bootloader on bootup) by additionally routing the INTA/B/C/D signals to unique external ARM CPU interrupts:

  • J5: irq95
  • J7: irq94
  • J4: irq93
  • J6: irq154

To determine if you have a Laguna board supporting isolated PCI interrupts, check the PCB part number and revision under the 6digit bar-code label on the top of the board:

  • GW2388-4 RevH will have PCB 02210082-07. The 07 indicates the PCB revision and anything above rev 07 supports isolated interrupts

A linux kernel patch is necessary to detect boards that support the isolated PCI interrupts and configure them to be used for the PCI host controller's interrupts. This is supported in:

Specifying and determining CPU for a userspace process

The default is for userspace processes to be able to be scheduled on all CPU cores.

The 'taskset' application can be used to specify an smp_affinity for a specific task. As above the affinity is a bitmask specifying what CPU's the task can run on (ie 0x3 for CPU0|CPU1, 0x1 for CPU0, 0x2 for CPU1):

  • set the affinity for an existing PID (ie PID 1):
    taskset -p 0x1 1 ;# set PID1 to only run on CPU0
    taskset -p 0x2 1 ;# set PID1 to only run on CPU1
    taskset -p 0x3 1 ;# set PID1 to run on either CPU0 or CPU1
    
  • launch a process with a specific affinity:
    taskset 0x1 top ;# run top on CPU0
    

You can obtain details about what CPU's a process is allowed to run on via proc or tools like top:

  • using /proc/<pid>/status to see what CPU's PID1 is allowed on:
    # grep Cpus /proc/1/status ;# see current affinity
    Cpus_allowed:   3
    Cpus_allowed_list:      0-1
    # taskset -p 0x1 1 ;# set affinity to CPU0
    pid 1's current affinity mask: 3
    pid 1's new affinity mask: 1
    # grep Cpus /proc/1816/status 
    Cpus_allowed:   1
    Cpus_allowed_list:      0
    
  • using top (see the CPU column)
    # top -n1 ;# 1 iteration of top
    Mem: 19704K used, 236320K free, 0K shrd, 2892K buff, 4460K cached
    CPU:   0% usr   6% sys   0% nic  11% idle   0% io   0% irq  81% sirq
    Load average: 0.05 0.11 0.13 1/44 3590
      PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
     3588  1423 root     R     1204   0%   1  25% top -n1
     3073     2 root     SW       0   0%   1   5% [kworker/u:2]
      695     1 root     S     1212   0%   1   0% /sbin/syslogd -C16
        1     0 root     S     1208   0%   0   0% init
     1423  1345 root     S     1208   0%   1   0% /bin/ash --login
     1493  1345 root     S     1208   0%   0   0% /bin/ash --login
      627     1 root     S     1208   0%   0   0% /bin/ash --login
     1345     1 root     S     1204   0%   0   0% /usr/sbin/telnetd -l /bin/login.s
     1816   627 root     S     1200   0%   0   0% {doup} /bin/sh ./doup
      697     1 root     S     1192   0%   0   0% /sbin/klogd
     3590  1816 root     S     1192   0%   0   0% sleep 1
     1126     1 root     S     1140   0%   0   0% hostapd -P /var/run/wifi-phy0.pid
      719     1 root     S     1116   0%   1   0% /sbin/netifd
      706     1 root     S      940   0%   0   0% /sbin/procd
      699     1 root     S      700   0%   0   0% /sbin/hotplug2 --override --persi
      712   706 root     S <    664   0%   0   0% ubusd
        6     2 root     SW       0   0%   0   0% [kworker/u:0]
       16     2 root     SW       0   0%   1   0% [kworker/u:1]
        8     2 root     SW       0   0%   0   0% [migration/0]
      390     2 root     SW       0   0%   1   0% [kworker/1:1]
    

Network Packet Steering

The Receive Packet Steering (RPS) uses a hashing algorithm that takes the ip address and port to generate a hash index that then uses a hash table that map the hash index to CPU #. In the end, it will use the same CPU# for ip address/port combination. This is done by design so that the number of cache hits increase during packet processing of the same network stream. In order to realize the benefits of RPS, you would need to use multiple streams. You can use the '-P' option on iperf to execute such a test.

Limiting Number of CPU cores

There may be times when a developer wants to test performance or functionality with a limited number of CPU cores on a multi-core system. Because the Linux network driver system does not allow very granular sharing of packet processing over multiple cores, in the case of the CNS3xxx dual-core processor used on the Laguna product family you may even see a slight performance boost by limiting the system to a single core.

The Linux Symmetric Multiprocessing (SMP) support allows some kernel command-line parameters to override CPU core detection and either disable SMP completely or specify the number of cores to use:

  • To disable SMP completely you can use the nosmp kernel command-line parameter.
    • Note that the Cavium CNS3xxx cpu used in the Laguna product family does not support 'nosmp' in the Gateworks kernels, but if you want to limit SMP to a single core you can do so with 'maxcpus=1'
  • To specify the number of cores you can add the maxcpus kernel command-line parameter.

You can alter the bootargs env variable the bootloader passes to the kernel. For Laguna you can set the bootargs var directly and for Ventana's bootloader you can more easily add things to the bootargs by setting the extra variable (which gets added to the bootargs by the boot scripts).

Examples:

  • Laguna:
    • Limit to single CPU:
      Laguna > setenv bootargs console=ttyS0,115200 root=/dev/mtdblock3 rootfstype=squashfs,jffs2 maxcpus=1
      Laguna > saveenv # optinally save this env
      Laguna > boot
      
  • Ventana:
    • Disable SMP:
      Ventana > setenv extra nosmp
      Ventana > saveenv # optionally save this env
      Ventana > boot
      
    • Limit CPU's to 2 (ie for boards with an IMX6Q with 4 cores)
      Ventana > setenv extra maxcpus=2
      Ventana > saveenv # optionally save this env
      Ventana > boot
      

You can check the number of CPU's used by Linux SMP by looking at /proc/cpuinfo:

  • Examples:
    • Ventana with SMP disabled or maxcpus=1 showing 1 core
      # cat /proc/cmdline 
      console=ttymxc1,115200 root=/dev/sda1 rootfstype=ext4 rootwait rw video=mxcfb0:off video=mxcfb1:off video=mxcfb2:off video=mxcfb3:off nosmp cma=384M galcore.initgpu3DMinClock=3
      # cat /proc/cpuinfo 
      processor       : 0
      model name      : ARMv7 Processor rev 10 (v7l)
      BogoMIPS        : 3.00
      Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 
      CPU implementer : 0x41
      CPU architecture: 7
      CPU variant     : 0x2
      CPU part        : 0xc09
      CPU revision    : 10
      
      Hardware        : Freescale i.MX6 Quad/DualLite (Device Tree)
      Revision        : 0000
      Serial          : 668487
      
    • Ventana with maxcpus=2
      # cat /proc/cmdline 
      console=ttymxc1,115200 root=/dev/sda1 rootfstype=ext4 rootwait rw video=mxcfb0:off video=mxcfb1:off video=mxcfb2:off video=mxcfb3:off maxcpus=2 cma=384M galcore.initgpu3DMinClock=3
      # cat /proc/cpuinfo 
      processor       : 0
      model name      : ARMv7 Processor rev 10 (v7l)
      BogoMIPS        : 3.00
      Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 
      CPU implementer : 0x41
      CPU architecture: 7
      CPU variant     : 0x2
      CPU part        : 0xc09
      CPU revision    : 10
      
      processor       : 1
      model name      : ARMv7 Processor rev 10 (v7l)
      BogoMIPS        : 3.00
      Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 
      CPU implementer : 0x41
      CPU architecture: 7
      CPU variant     : 0x2
      CPU part        : 0xc09
      CPU revision    : 10
      
      Hardware        : Freescale i.MX6 Quad/DualLite (Device Tree)
      Revision        : 0000
      Serial          : 668487
      

Single Core Power Consumption (Laguna Family Only)

Some quick measurements on a GW2388 with VIN=10V:

  • At the prompt, the current difference was only 2ma (396mA single vs 398mA dual )
  • Under stress the current difference was fairly small, 44mA (438 single core vs 482 mA dual core)
Last modified 5 months ago Last modified on 04/06/17 15:57:44

Attachments (1)

Download all attachments as: .zip