Changes between Initial Version and Version 1 of multicoreprocessing


Ignore:
Timestamp:
10/22/2017 05:28:45 AM (7 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • multicoreprocessing

    v1 v1  
     1[[PageOutline]]
     2
     3= Multi-Core Processing =
     4Gateworks has single board computers with single, dual, and quad core processors.
     5
     6This page is to reference only those boards who are using dual and quad core processors.
     7
     8We encourage customers with a Ventana board to leverage the IMX community at Freescale [https://community.freescale.com/community/imx]
     9
     10References: '''PLEASE UTILIZE''':
     11 * [https://github.com/torvalds/linux/blob/master/Documentation/networking/scaling.txt]
     12 * [http://www.embedded.com/design/embedded/4236957/2/Multicore-networking-in-L]
     13 * See also our wiki page for [wiki:performance_tuning Performance Tuning]
     14
     15Sample top command '''Shows Processor usage'''
     16{{{
     17root@OpenWrt:/#top
     18
     19Mem: 37212K used, 728856K free, 0K shrd, 1212K buff, 8100K cached
     20CPU:   0% usr   0% sys   0% nic 100% idle   0% io   0% irq   '''0% sirq'''
     21Load average: 0.01 0.02 0.05 1/78 23125
     22  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
     2323125  1174 root     R     1216   0%   1   0% top
     24 2575     1 root     S     8228   1%   0   0% /usr/sbin/collectd
     25 2617     1 root     S     2880   0%   2   0% batmand ath0
     26 1423     1 root     S     1224   0%   3   0% /sbin/syslogd -C16
     27 1174     1 root     S     1224   0%   3   0% /bin/ash --login
     28    1     0 root     S     1220   0%   0   0% init
     29 2630     1 root     S     1216   0%   0   0% /usr/sbin/ntpd -n -p 0.openwrt.po
     30 2491     1 root     S     1216   0%   3   0% /sbin/watchdog -t 5 /dev/watchdog
     31 2468     1 root     S     1208   0%   1   0% /usr/sbin/telnetd -l /bin/login.s
     32 1425     1 root     S     1204   0%   2   0% /sbin/klogd
     33 1441     1 root     S     1120   0%   2   0% /sbin/netifd
     34 1434     1 root     S      944   0%   3   0% /sbin/procd
     35 2335  1434 root     S      908   0%   1   0% /usr/sbin/dropbear -F -P /var/run
     36 2476     1 root     S      868   0%   2   0% /usr/sbin/uhttpd -f -h /www -r Op
     37 1427     1 root     S      836   0%   2   0% /sbin/hotplug2 --override --persi
     38 2558     1 nobody   S      768   0%   1   0% /usr/sbin/dnsmasq -C /var/etc/dns
     39 2640     1 root     S      748   0%   3   0% /usr/sbin/vnstatd -d
     40 1437  1434 root     S <    668   0%   2   0% ubusd
     41  528     2 root     SW       0   0%   0   0% [kworker/0:1]
     42  620     2 root     SW       0   0%   0   0% [kworker/u:3]
     43}}}
     44
     45== SMP Affinity (interrupt steering) ==
     46Symmetric multiprocessing (SMP)
     47
     48The 'affinity' of an interrupt handler can be get/set via /proc/irq/<interrupt>/smp_affinity which is a bitmask of what CPU cores the interrupt handler can run on.  By default the affinity for each handler is set to allow all available cores (ie for a dual-core system a value of 3 means bit0 (CPU0) and bit1 (CPU1) are both set).  If you want a particular interrupt handler to always occur on a specific CPU you can change that bitmask.  To see what interrupt handlers are configured and what interrupt they are on look at /proc/interrupts.
     49
     50For example to set the MMC interrupt handler on a dual-core Laguna (cns3xxx) board to only run on CPU1 (and never CPU0):
     51{{{
     52cat /proc/interrupts ;# see interrupt mapping
     53echo 2 > /proc/irq/33/smp_affinity ;# set MMC irq to CPU1
     54}}}
     55
     56References '''PLEASE UTILIZE''':
     57 * http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
     58 * http://elinux.org/images/4/43/Understanding_And_Using_SMP_Multicore_Processors_Anderson.pdf
     59
     60Sample of command /cat/proc/interrupts
     61{{{
     62root@OpenWrt:/# cat /proc/interrupts
     63           CPU0       CPU1       CPU2       CPU3       
     64 29:       6498       6713       4601      11306       GIC  twd
     65 34:          1          0          0          0       GIC  sdma
     66 45:       6173          0          0          0       GIC  mxs-dma
     67 47:          1          0          0          0       GIC  bch
     68 56:       3070          0          0          0       GIC  mmc0
     69 59:         80          0          0          0       GIC  IMX-uart
     70 68:         32          0          0          0       GIC  21a0000.i2c
     71 69:          0          0          0          0       GIC  21a4000.i2c
     72 70:         91          0          0          0       GIC  21a8000.i2c
     73 72:         29          0          0          0       GIC  ci13xxx_imx
     74 78:          0          0          0          0       GIC  ssi@02028000
     75 87:          9          0          0          0       GIC  i.MX Timer Tick
     76150:       3435          0          0          0       GIC  2188000.ethernet
     77151:          0          0          0          0       GIC  2188000.ethernet
     78153:          0          0          0          0       GIC  ath9k
     79154:          0          0          0          0       GIC  ath9k, ath9k
     80155:          0          0          0          0       GIC  ath9k
     81352:          0          0          0          0  gpio-mxc  mmc0
     82407:          0          0          0          0       IPU  imx_drm
     83412:          0          0          0          0       IPU  imx_drm
     84567:          0          0          0          0       IPU  imx_drm
     85572:          0          0          0          0       IPU  imx_drm
     86IPI0:          0          0          0          0  CPU wakeup interrupts
     87IPI1:          0          0          0          0  Timer broadcast interrupts
     88IPI2:       5027       5875       5953       5026  Rescheduling interrupts
     89IPI3:          5          4          4          5  Function call interrupts
     90IPI4:          3          6          6          3  Single function call interrupts
     91IPI5:          0          0          0          0  CPU stop interrupts
     92Err:          0
     93}}}
     94
     95
     96=== PCI Interrupt steering ===
     97
     98The PCI specification calls out 4 interrupts (INTA/INTB/INTC/INTD) that are routed to PCI slots.  Each slot gets two interrupts and they are shared with other slots dependent on board layout (in a technique called swizzled or barber-polled).  This means that if you have a board with 4 PCI slots you can have a single unique interrupt for each slot, however if you have 5 slots or more, those extra slots will share an interrupt with another slot.  If you can populate your slots such that you have unique interrupts, you can use smp affinity (above) to configure different CPU cores for the interrupt handlers of those slots which can greatly help performance if the bottleneck is interrupt processing (which usually a the 'top' linux command will help determine).
     99
     100Note that performance gains are difficult to quantify as there are many factors at play. In general, you can 'tune' your system by using 'top' which shows CPU utilization (per core if you hit the '1' key while running) and moving things around to better balance your system. In general, if you have one core being underutilized, try to spread the load.
     101
     102==== Ventana ====
     103The IMX6 SoC used on The Gateworks Ventana product family has 4 'legacy' interrupts to support the PCI INTA/INTB/INTC/INTD interrupts:
     104 * 152 pin1/INTD (also used as the MSI int)
     105 * 153 pin2/INTC
     106 * 154 pin3/INTB
     107 * 155 pin4/INTA
     108
     109Which slot is routed to each depends on the baseboard and expansion mezzanine board stackup and the best way to determine the mapping for your particular board stackup is to populate a device one slot at a time and check /proc/interrupts for the mapping.
     110
     111Depending on your interrupt routing (board stackup), device slot placement (what device is in what slot), and CPU (number of cores) you can then choose to spread interrupts according to your application needs.
     112
     113==== Laguna ====
     114The cns3xxx uses irq61 for pcie0_intr which in the case of a PCIe-to-PCI bridge ends up combining INTA/B/C/D on a single ARM CPU interrupt. This is not optimal when you have multiple cores. To overcome this limitation an enhancement was made on the GW2388-4-H (the model of your GW2388 is displayed by the bootloader on bootup) by additionally routing the INTA/B/C/D signals to unique external ARM CPU interrupts:
     115 * J5: irq95
     116 * J7: irq94
     117 * J4: irq93
     118 * J6: irq154
     119
     120To determine if you have a Laguna board supporting isolated PCI interrupts, check the PCB part number and revision under the 6digit bar-code label on the top of the board:
     121 * GW2388-4 RevH will have PCB 02210082-07. The 07 indicates the PCB revision and anything above rev 07 supports isolated interrupts
     122
     123A linux kernel patch is necessary to detect boards that support the isolated PCI interrupts and configure them to be used for the PCI host controller's interrupts. This is supported in:
     124 * OpenWrt trunk BSP r553
     125 * OpenWrt 13.06 BSP branch r554
     126
     127== Specifying and determining CPU for a userspace process ==
     128The default is for userspace processes to be able to be scheduled on all CPU cores.
     129
     130The 'taskset' application can be used to specify an smp_affinity for a specific task. As above the affinity is a bitmask specifying what CPU's the task can run on (ie 0x3 for CPU0|CPU1, 0x1 for CPU0, 0x2 for CPU1):
     131 * set the affinity for an existing PID (ie PID 1):
     132{{{
     133taskset -p 0x1 1 ;# set PID1 to only run on CPU0
     134taskset -p 0x2 1 ;# set PID1 to only run on CPU1
     135taskset -p 0x3 1 ;# set PID1 to run on either CPU0 or CPU1
     136}}}
     137 * launch a process with a specific affinity:
     138{{{
     139taskset 0x1 top ;# run top on CPU0
     140}}}
     141
     142You can obtain details about what CPU's a process is allowed to run on via proc or tools like top:
     143 * using /proc/<pid>/status to see what CPU's PID1 is allowed on:
     144{{{
     145# grep Cpus /proc/1/status ;# see current affinity
     146Cpus_allowed:   3
     147Cpus_allowed_list:      0-1
     148# taskset -p 0x1 1 ;# set affinity to CPU0
     149pid 1's current affinity mask: 3
     150pid 1's new affinity mask: 1
     151# grep Cpus /proc/1816/status
     152Cpus_allowed:   1
     153Cpus_allowed_list:      0
     154}}}
     155 * using top (see the CPU column)
     156{{{
     157# top -n1 ;# 1 iteration of top
     158Mem: 19704K used, 236320K free, 0K shrd, 2892K buff, 4460K cached
     159CPU:   0% usr   6% sys   0% nic  11% idle   0% io   0% irq  81% sirq
     160Load average: 0.05 0.11 0.13 1/44 3590
     161  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
     162 3588  1423 root     R     1204   0%   1  25% top -n1
     163 3073     2 root     SW       0   0%   1   5% [kworker/u:2]
     164  695     1 root     S     1212   0%   1   0% /sbin/syslogd -C16
     165    1     0 root     S     1208   0%   0   0% init
     166 1423  1345 root     S     1208   0%   1   0% /bin/ash --login
     167 1493  1345 root     S     1208   0%   0   0% /bin/ash --login
     168  627     1 root     S     1208   0%   0   0% /bin/ash --login
     169 1345     1 root     S     1204   0%   0   0% /usr/sbin/telnetd -l /bin/login.s
     170 1816   627 root     S     1200   0%   0   0% {doup} /bin/sh ./doup
     171  697     1 root     S     1192   0%   0   0% /sbin/klogd
     172 3590  1816 root     S     1192   0%   0   0% sleep 1
     173 1126     1 root     S     1140   0%   0   0% hostapd -P /var/run/wifi-phy0.pid
     174  719     1 root     S     1116   0%   1   0% /sbin/netifd
     175  706     1 root     S      940   0%   0   0% /sbin/procd
     176  699     1 root     S      700   0%   0   0% /sbin/hotplug2 --override --persi
     177  712   706 root     S <    664   0%   0   0% ubusd
     178    6     2 root     SW       0   0%   0   0% [kworker/u:0]
     179   16     2 root     SW       0   0%   1   0% [kworker/u:1]
     180    8     2 root     SW       0   0%   0   0% [migration/0]
     181  390     2 root     SW       0   0%   1   0% [kworker/1:1]
     182}}}
     183
     184== Network Packet Steering ==
     185The Receive Packet Steering (RPS) uses a hashing algorithm that takes the ip address and port to generate a hash index that then uses a hash table that map the hash index to CPU #. In the end, it will use the same CPU# for ip address/port combination. This is done by design so that the number of cache hits increase during packet processing of the same network stream. In order to realize the benefits of RPS, you would need to use multiple streams. You can use  the '-P' option on iperf to execute such a test.
     186
     187= Single Core Processing =
     188
     189There may be times when a developer needs to only use one CPU because of driver issues, etc. We have found in many instances, especially with wireless application, that they are interrupt intensive and thus the CPU will be running at very low utilization but the interrupt controller is saturated. Since the wireless drivers can only operate on a single core any reduction in overall interrupt traffic helps with performance. When in dual core mode, we use FIQs for inter-processor communication so by running in single core mode we actually reduce the load on the interrupt controller which we’ve seen in some cases provide 5-10% performance improvement. If you are running a lot of user applications then the dual core can definitely provide a benefit it really just depends on your application.  '''NOTE: This mainly applies to the Laguna Family line of ARM11 processors. The ARM9 Ventana has better interrupt capabilities'''
     190
     191To do this, we will modify the bootargs in the bootloader.
     192
     193For CNS3xxx Laguna boards, break into the bootloader by pressing a key at bootup:
     194
     195Modify the variable bootargs to include maxcpus=1 at the end of the line as shown below:
     196{{{
     197Laguna > setenv bootargs console=ttyS0,115200 root=/dev/mtdblock3 rootfstype=squashfs,jffs2 noinitrd init=/etc/preinit maxcpus=1
     198}}}
     199
     200Then save the bootargs with:
     201{{{
     202Laguna > saveenv
     203}}}
     204
     205Note /proc/interrupts at 2 cores:
     206{{{
     207root@OpenWrt:/# cat /proc/interrupts
     208           CPU0       CPU1       
     209 29:       7231       7367       GIC  twd
     210 33:        884          0       GIC  mmc0
     211 39:        206          0       GIC  cns3xxx-i2c
     212 45:        168          0       GIC  serial
     213 49:          3          0       GIC  gig_stat
     214 51:         82        168       GIC  gig_switch
     215 63:          0          0       GIC  dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1
     216 64:          0          0       GIC  ehci_hcd:usb2
     217 89:         28          0       GIC  timer
     218 91:          1          0       GIC  ohci_hcd:usb3
     219FIQ:        353        441       cns3xxx-fiq
     220IPI0:          0          0  CPU wakeup interrupts
     221IPI1:          0          1  Timer broadcast interrupts
     222IPI2:       2102       2420  Rescheduling interrupts
     223IPI3:          0          0  Function call interrupts
     224IPI4:       1775       1722  Single function call interrupts
     225IPI5:          0          0  CPU stop interrupts
     226Err:          0
     227}}}
     228
     229Note /proc/interrupts with one core:
     230{{{
     231root@OpenWrt:/# cat /proc/interrupts
     232           CPU0       
     233 29:      12577       GIC  twd
     234 33:       1469       GIC  mmc0
     235 39:        398       GIC  cns3xxx-i2c
     236 45:        216       GIC  serial
     237 49:          5       GIC  gig_stat
     238 51:        489       GIC  gig_switch
     239 63:          0       GIC  dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1
     240 64:          0       GIC  ehci_hcd:usb2
     241 89:         28       GIC  timer
     242 91:          1       GIC  ohci_hcd:usb3
     243FIQ:          0          0       cns3xxx-fiq
     244IPI0:          0  CPU wakeup interrupts
     245IPI1:          0  Timer broadcast interrupts
     246IPI2:          0  Rescheduling interrupts
     247IPI3:          0  Function call interrupts
     248IPI4:          0  Single function call interrupts
     249IPI5:          0  CPU stop interrupts
     250Err:          0
     251}}}
     252Note the top command shows all processes on CPU 0:[[BR]]
     253
     254[[Image(cpu0.png)]]
     255
     256== Single Core Power Consumption (Laguna Family Only)==
     257Some quick measurements on a GW2388 with VIN=10V:
     258 * At the prompt, the current difference was only 2ma (396mA single vs 398mA dual )
     259 * Under stress the current difference was fairly small, 44mA (438 single core vs 482 mA dual core)
     260