wiki:performance_tuning

Version 1 (modified by trac, 7 years ago) ( diff )

--

Performance Tuning

Various performance aspects rely heavily on the configuration of your Linux operating system. Here are some things to keep in mind:

  • L2 Cache - enabling L2 cache can greatly increase the performance of some things, but greatly hurt others - see here
  • Kernel modules - kernel modules that are not needed can bog down certain paths, such as network routing (iptables/ebtables) (see below)
  • Userland services - Various services and daemons that are not needed can chew up system resources (CPU cycles, memory footprint) (see below)

See also Multicore Processing Page

Routing Performance

If you are trying to optimize network routing you can try the following:

  • GigE:
    • make sure you have a GigE link (where appropriate) in every network segment between your test endpoints (switch segments, target endpoint computer etc)
    • if using PoE make sure you have a PoE injector capable of GigE
  • General:
    • eliminate unnecessary kernel modules which may be present for packet filtering (such as ipt and ebtables related modules). To see an example on removing kernel modules see the OpenWrt/kernelconfig page. If you are using hardware that requires some modules be sure to leave them in place.
      • Caution: Please make informed decisions when removing kernel modules as removing hardware related modules may have unintended effects.
    • minimize hardware in-between in case it is problematic (direct connection between endpoints)
    • eliminate unnecessary userspace applications which may be present. To eliminate virtually all of them (you may need to configure network by hand after doing this) you can use 'for i in $(ls /etc/init.d); do /etc/init.d/$i disable; done; /etc/init.d/enable boot; /etc/init.d/enable done'
    • using iperf as a network test tool pay attention to window size which can greatly affect throughput (understand what it means)
    • be aware that generating traffic on an embedded node creates a performance hit on that node vs sending traffic 'through' the node
    • run 'top' while testing to see where the bottlenecks may be: Understand that if virtually 100% of utilization occurs in sirq (soft irq), irq (hard irq), and nic (network driver) then you have maxed out the performance due to raw interrupts and low level packet handling

iperf

Use iperf to test throughput.

Please read up on iperf on google or this tutorial http://openmaniak.com/iperf.php

Basic Setup:

There is a client and a server. This is differentiated based on the command.

Server:

iperf -s

Client:

iperf -c 192.168.1.1

Note: For UDP, a bandwidth limit is needed. Use the flag -b followed by the bandwidth limit desired (1m, 10m, 100m, 200m, 300m, 500m, 1g, etc). Because iperf is processor intensive, there is no need to generate more traffic than the processor can handle. Therefore, incrementally increase the bandwidth limit until the results are slightly below the limit, thus not creating larger amounts of overhead.

For example a bandwidth limit of 10m will easily be hit on a GBe link. However, once at 200m bandwidth limit, throughput numbers may hit only 187Mbits/sec thus creating very little processor overhead.

VERY IMPORTANT: ORDER MATTERS. The iperf manual (please read it) states that the bandwidth flag must be placed at the end of the command to work'''

From the client type:

iperf -u -c 192.168.4.1 -b 10m

From the server type:

iperf -s -u



Increasing Bandwidth Limit Example:

root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 10m
------------------------------------------------------------
Client connecting to 192.168.4.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  160 KByte (default)
------------------------------------------------------------
[  3] local 192.168.4.2 port 56615 connected with 192.168.4.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  11.9 MBytes  10.0 Mbits/sec
[  3] Sent 8505 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec  11.9 MBytes  10.0 Mbits/sec   0.028 ms    1/ 8506 (0.012%)
root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 100m
------------------------------------------------------------
Client connecting to 192.168.4.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  160 KByte (default)
------------------------------------------------------------
[  3] local 192.168.4.2 port 33153 connected with 192.168.4.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   120 MBytes   100 Mbits/sec
[  3] Sent 85304 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec   119 MBytes  99.8 Mbits/sec   0.077 ms  514/85305 (0.6%)
root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 200m
------------------------------------------------------------
Client connecting to 192.168.4.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  160 KByte (default)
------------------------------------------------------------
[  3] local 192.168.4.2 port 56998 connected with 192.168.4.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   222 MBytes   186 Mbits/sec
[  3] Sent 158150 datagrams
[  3] Server Report:
[  3]  0.0-10.2 sec  59.3 MBytes  48.6 Mbits/sec  15.794 ms 115820/158151 (73%)
root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 300m
------------------------------------------------------------
Client connecting to 192.168.4.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  160 KByte (default)
------------------------------------------------------------
[  3] local 192.168.4.2 port 52109 connected with 192.168.4.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   223 MBytes   187 Mbits/sec
[  3] Sent 158814 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec  58.0 MBytes  48.7 Mbits/sec   0.240 ms 117423/158815 (74%)

Examples

Here are some examples showing various tunings of a GW2388 (dual core 600MHz ARM with dual GigE ports) with iperf network bandwidth test between GW2388 and a PC through a Netgear GigE switch:

  • pre-built firmware (12-10 release), unmodified (186/491mbps tx/rx)
    root@OpenWrt:/# iperf -c 192.168.1.146
    ------------------------------------------------------------
    Client connecting to 192.168.1.146, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.1.83 port 56088 connected with 192.168.1.146 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec   223 MBytes   186 Mbits/sec
    root@OpenWrt:/# iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.1.83 port 5001 connected with 192.168.1.146 port 55929
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec   500 MBytes   419 Mbits/sec
    
  • disabling all kernel modules (306/732mbps tx/rx) (iptables/ebtables is a big performance hit)
    root@OpenWrt:/# mv /etc/modules.d /etc/modules.old
    root@OpenWrt:/# reboot
    root@OpenWrt:/# iperf -c 192.168.1.146
    ------------------------------------------------------------
    Client connecting to 192.168.1.146, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.1.83 port 58570 connected with 192.168.1.146 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec   365 MBytes   306 Mbits/sec
    root@gw2388-test:/# iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.1.83 port 5001 connected with 192.168.1.146 port 56554
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec   873 MBytes   732 Mbits/sec
    
  • disabling all init scripts as well (378/754mbps tx/rx)
    root@OpenWrt:/# for i in $(ls /etc/init.d); do /etc/init.d/$i disable; done
    root@OpenWrt:/# /etc/init.d/boot enable
    root@OpenWrt:/# reboot
    root@OpenWrt:/# iperf -c 192.168.1.146
    ------------------------------------------------------------
    Client connecting to 192.168.1.146, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.1.93 port 52295 connected with 192.168.1.146 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec   451 MBytes   378 Mbits/sec
    root@OpenWrt:/# iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.1.93 port 5001 connected with 192.168.1.146 port 37005
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec   900 MBytes   754 Mbits/sec
    

Here are some examples using a GW2380 (single core 300MHz ARM with single GigE port) looking at transmit performance:

  • pre-built firmware (12-10 release), unmodified (153/138mbps tx/rx)
    root@OpenWrt:/# iperf -c 192.168.1.146
    ------------------------------------------------------------
    Client connecting to 192.168.1.146, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.1.87 port 33327 connected with 192.168.1.146 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec   183 MBytes   153 Mbits/sec
    root@OpenWrt:/# iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.1.87 port 5001 connected with 192.168.1.146 port 53363
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.1 sec   166 MBytes   138 Mbits/sec
    
  • disabling all kernel modules (207/214mbps tx/rx) (iptables/ebtables is a big performance hit)
    root@OpenWrt:/# mv /etc/modules.d /etc/modules.old
    root@OpenWrt:/# reboot
    root@OpenWrt:/# iperf -c 192.168.1.146
    ------------------------------------------------------------
    Client connecting to 192.168.1.146, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.1.97 port 60819 connected with 192.168.1.146 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec   248 MBytes   207 Mbits/sec
    root@OpenWrt:/# iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.1.87 port 5001 connected with 192.168.1.146 port 53387
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec   255 MBytes   214 Mbits/sec
    
  • disabling all init scripts as well (222/214mbps tx/rx)
    root@OpenWrt:/# for i in $(ls /etc/init.d); do /etc/init.d/$i disable; done
    root@OpenWrt:/# /etc/init.d/boot enable
    root@OpenWrt:/# /etc/init.d/done enable
    root@OpenWrt:/# reboot
    root@OpenWrt:/# iperf -c 192.168.1.146
    ------------------------------------------------------------
    Client connecting to 192.168.1.146, TCP port 5001
    TCP window size: 16.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 192.168.1.97 port 34382 connected with 192.168.1.146 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec   265 MBytes   222 Mbits/sec
    root@OpenWrt:/# iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    [  4] local 192.168.1.87 port 5001 connected with 192.168.1.146 port 53387
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec   255 MBytes   214 Mbits/sec
    

Notes:

  • the above tests are using 'iperf -s' on an Unbuntu Linux GigE PC Host through a GigE Netgear switch. This uses a 16KB TCP window size - you can acheive higher tx throughput by using larger TCP windows.

Wireless Tuning

Wireless performance is a very finicky process.

Wireless is dependent on so many factors. Wireless N speeds at 300 Mbps are not typically achieved in real life. An ARM processor on a Gateworks board is not quite the same as a 3.4 GHz X86 Machine.

Here are a few tips for wireless:

  1. Wireless encryption protocols can affect performance because of computation. Choose wisely.
  2. Antenna type and orientation and distance are VERY important. Be sure to hook up all antennas to the wireless card and testing 2 boards 3 feet apart may not achieve the best results.
  3. Wireless N will sometimes run faster at 40MHz vs 20 MHz.
  4. Try Channel 153 for Wireless N.
  5. Use a uncluttered channel that other devices are not using.
  6. The hardware mode for openwrt for the N radios would typically be 11na.
  7. 5.8GHz will run faster than 2.4GHz.
  8. Optimize the software as noted below. The more items running on the board, the less processor that can be utilized for all of the wireless traffic.

We have removed some modules at times to reduce the software load and try to obtain higher throughput.

These are the commands that have been using to tune performance for wireless testing on both the Cambria and Laguna platforms. This is not a fully supported and will disable certain features. Please use at your own risk.

*WARNING: Run these commands on the Gateworks board once it is booted up. Please only do so over serial as telnet is disabled below

/etc/init.d/batmand disable
/etc/init.d/collectd disable
/etc/init.d/cron disable
/etc/init.d/dnsmasq disable
/etc/init.d/dropbear disable
/etc/init.d/firewall disable
/etc/init.d/gpsd disable
/etc/init.d/gscd disable
/etc/init.d/led disable
/etc/init.d/luci_bwc disable
/etc/init.d/luci_dhcp_migrate disable
/etc/init.d/luci_fixtime disable
/etc/init.d/luci_statistics disable
/etc/init.d/miniupnpd disable
/etc/init.d/openvpn disable
/etc/init.d/qos disable
/etc/init.d/rcS disable
/etc/init.d/relayd disable
/etc/init.d/serialoverip disable
/etc/init.d/sysntpd disable
/etc/init.d/ntpd disable
/etc/init.d/telnet disable
/etc/init.d/tinyproxy disable
/etc/init.d/uhttpd disable
/etc/init.d/umount disable
/etc/init.d/usb disable
/etc/init.d/vnstat disable

mkdir /etc/modules.old
mv /etc/modules.d/* /etc/modules.old
mv /etc/modules.old/*crypto* /etc/modules.d
mv /etc/modules.old/*80211* /etc/modules.d
mv /etc/modules.old/*nls-base* /etc/modules.d
mv /etc/modules.old/*usb-core* /etc/modules.d
mv /etc/modules.old/*ath5k* /etc/modules.d
mv /etc/modules.old/*ath9k* /etc/modules.d
mv /etc/modules.old/26-ath  /etc/modules.d
mv /etc/modules.old/50-madwifi  /etc/modules.d

Wireless configuration used:

ACCESS POINT WIRELESS UCI Config:
root@OpenWrt:/# uci show wireless
wireless.radio0=wifi-device
wireless.radio0.type=mac80211
wireless.radio0.phy=phy0
wireless.radio0.ht_capab=SHORT-GI-40 TX-STBC RX-STBC1 DSSS_CCK-40
wireless.radio0.disabled=0
wireless.radio0.country=US
wireless.radio0.txpower=17
wireless.radio0.htmode=HT40-
wireless.radio0.channel=153
wireless.radio0.hwmode=11na
wireless.@wifi-iface[0]=wifi-iface
wireless.@wifi-iface[0].device=radio0
wireless.@wifi-iface[0].network=lan
wireless.@wifi-iface[0].mode=ap
wireless.@wifi-iface[0].ssid=gateworks
wireless.@wifi-iface[0].wds=1
wireless.@wifi-iface[0].encryption=psk2
wireless.@wifi-iface[0].key=abc

CLIENT WDS WIRELESS UCI Config:
root@OpenWrt:/# uci show wireless
wireless.radio0=wifi-device
wireless.radio0.type=mac80211
wireless.radio0.phy=phy0
wireless.radio0.ht_capab=SHORT-GI-40 TX-STBC RX-STBC1 DSSS_CCK-40
wireless.radio0.disabled=0
wireless.radio0.txpower=17
wireless.radio0.country=US
wireless.radio0.htmode=HT40-
wireless.radio0.channel=153
wireless.radio0.hwmode=11na
wireless.@wifi-iface[0]=wifi-iface
wireless.@wifi-iface[0].device=radio0
wireless.@wifi-iface[0].network=lan
wireless.@wifi-iface[0].ssid=gateworks
wireless.@wifi-iface[0].mode=sta
wireless.@wifi-iface[0].wds=1
wireless.@wifi-iface[0].encryption=psk2
wireless.@wifi-iface[0].key=abc

We then ran iperf from the the two pc's, with one being the server and one the client.

Server:
iperf -s

Client:
iperf -t10 -w 512k -c 192.168.0.23

iperf TCP results:

------------------------------------------------------------
Client connecting to 192.168.0.23, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  512 KByte)
------------------------------------------------------------
[  3] local 192.168.0.22 port 52644 connected with 192.168.0.23 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  99.9 MBytes  83.7 Mbits/sec

Wireless Modulation Rate

The modulation rate is dynamic. The rate adjusts depending on the quality of the RF signal. The better the signal, the better the rate. The table below indicates what rate you are currently achieving. If you have a terrible rate, you will notice your bandwidth much lower and need to adjust the antennas / distance / obstructions to get a better rate. The highest rates can be very very hard to achieve.

root@OpenWrt:/# cat /sys/kernel/debug/ieee80211/phy0/netdev\:wlan0/stations/a8\:
54\:b2\:00\:04\:2e/rc_stats
type      rate     throughput  ewma prob   this prob  this succ/attempt   success    attempts
HT20/LGI    MCS0        6.1       92.0      100.0          0(  0)        106         111
HT20/LGI    MCS1       12.8       96.7      100.0          0(  0)        109         116
HT20/LGI    MCS2       19.5       99.7      100.0          0(  0)        112         119
HT20/LGI    MCS3       25.8       99.9      100.0          0(  0)        108         110
HT20/LGI    MCS4       36.2       95.2      100.0          0(  0)        102         112
HT20/LGI    MCS5       44.2       89.8      100.0          0(  0)        108         111
HT20/LGI    MCS6       48.4       88.7      100.0          0(  0)       2442        2816
HT20/LGI   PMCS7       48.5       79.1       56.6         68(120)       3616        4236
HT20/LGI    MCS8       12.1       92.1      100.0          0(  0)        103         110
HT20/LGI    MCS9       24.5       95.1      100.0          0(  0)        105         117
HT20/LGI    MCS10      37.8       99.5      100.0          0(  0)        118         123
HT20/LGI    MCS11      48.3       98.1      100.0          0(  0)        121         127
HT20/LGI    MCS12      58.8       81.7       33.3         13( 39)      14294       17331
HT20/LGI  t MCS13      66.9       74.2       60.8          0(  0)      62514       85221
HT20/LGI T  MCS14      67.9       67.2       58.2          0(  0)     110248      163970
HT20/LGI    MCS15      61.4       55.9       28.0         32(114)     141375      243633

T= what we are on
t= what we are on next

Modulation Table Online

Measuring Performance

In most cases, measuring the performance of a particular system function can be done with specialized tests like iperf. When a more thorough measurement is desired of the overall system, a small and portable benchmarking suite like lmbench is more appropriate.

lmbench

lmbench is a micro-benchmark suite designed to focus attention on the basic building blocks of many common system applications, such as databases, simulations, software development, and networking. It provides a suite of benchmarks that attempt to measure the most commonly found performance bottlenecks in a wide range of system applications. lmbench is designed to identify, isolate, and reproduce these performance bottlenecks using a set of small microbenchmarks which measure system latency and bandwidth of data movement among the processor and memory, network, file system, and disk.

Installation can be done on Ubuntu systems via sudo apt-get install lmbench. Alternatively, you can compile the source yourself which can be found at the lmbench website.

Run the following commands to compile the source:

tar xvf lmbench3.tar.gz
cd lmbench3
mkdir SCCS; touch SCCS/s.ChangeSet
make -C src/

Running the benchmark suite is done via make results. The binaries for the individual benchmarks can be found in the bin/ directory.

The src/Makefile has additional make targets for your convenience:

# lmbench       [default] builds the benchmark suite for the current os/arch
# results       builds, configures run parameters, and runs the benchmark
# rerun         reruns the benchmark using the same parameters as last time
# scaling       reruns the benchmark using same parameters as last time,
#               except it asks what scaling value to use
# hardware      reruns the hardware benchmarks using the same parameters
# os            reruns the OS benchmarks using the same parameters
# clean         cleans out sources and run configuration
# clobber       clean and removes the bin directories
# shar          obsolete, use cd .. && make shar
# depend        builds make dependencies (needs gcc)
# debug         builds all the benchmarks with '-g' debugging flag
# assembler     builds the .s files for each benchmark3

After the test has been completed, you can compare results from multiple runs with make -C results/ LIST=*

Read the man pages for the individual benchmarks in the doc/ directory, or the lmbench introduction to learn more about the test suite.

Note: See TracWiki for help on using the wiki.