Version 1 (modified by 7 years ago) ( diff ) | ,
---|
Performance Tuning
Various performance aspects rely heavily on the configuration of your Linux operating system. Here are some things to keep in mind:
- L2 Cache - enabling L2 cache can greatly increase the performance of some things, but greatly hurt others - see here
- Kernel modules - kernel modules that are not needed can bog down certain paths, such as network routing (iptables/ebtables) (see below)
- Userland services - Various services and daemons that are not needed can chew up system resources (CPU cycles, memory footprint) (see below)
See also Multicore Processing Page
Routing Performance
If you are trying to optimize network routing you can try the following:
- GigE:
- make sure you have a GigE link (where appropriate) in every network segment between your test endpoints (switch segments, target endpoint computer etc)
- if using PoE make sure you have a PoE injector capable of GigE
- General:
- eliminate unnecessary kernel modules which may be present for packet filtering (such as ipt and ebtables related modules). To see an example on removing kernel modules see the OpenWrt/kernelconfig page. If you are using hardware that requires some modules be sure to leave them in place.
- Caution: Please make informed decisions when removing kernel modules as removing hardware related modules may have unintended effects.
- minimize hardware in-between in case it is problematic (direct connection between endpoints)
- eliminate unnecessary userspace applications which may be present. To eliminate virtually all of them (you may need to configure network by hand after doing this) you can use 'for i in $(ls /etc/init.d); do /etc/init.d/$i disable; done; /etc/init.d/enable boot; /etc/init.d/enable done'
- using iperf as a network test tool pay attention to window size which can greatly affect throughput (understand what it means)
- be aware that generating traffic on an embedded node creates a performance hit on that node vs sending traffic 'through' the node
- run 'top' while testing to see where the bottlenecks may be: Understand that if virtually 100% of utilization occurs in sirq (soft irq), irq (hard irq), and nic (network driver) then you have maxed out the performance due to raw interrupts and low level packet handling
- eliminate unnecessary kernel modules which may be present for packet filtering (such as ipt and ebtables related modules). To see an example on removing kernel modules see the OpenWrt/kernelconfig page. If you are using hardware that requires some modules be sure to leave them in place.
iperf
Use iperf to test throughput.
Please read up on iperf on google or this tutorial http://openmaniak.com/iperf.php
Basic Setup:
There is a client and a server. This is differentiated based on the command.
Server:
iperf -s
Client:
iperf -c 192.168.1.1
Note: For UDP, a bandwidth limit is needed. Use the flag -b followed by the bandwidth limit desired (1m, 10m, 100m, 200m, 300m, 500m, 1g, etc).
Because iperf is processor intensive, there is no need to generate more traffic than the processor can handle. Therefore, incrementally increase the bandwidth limit until the results are slightly below the limit, thus not creating larger amounts of overhead.
For example a bandwidth limit of 10m will easily be hit on a GBe link. However, once at 200m bandwidth limit, throughput numbers may hit only 187Mbits/sec thus creating very little processor overhead.
VERY IMPORTANT: ORDER MATTERS. The iperf manual (please read it) states that the bandwidth flag must be placed at the end of the command to work'''
From the client type:
iperf -u -c 192.168.4.1 -b 10m
From the server type:
iperf -s -u
Increasing Bandwidth Limit Example:
root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 10m ------------------------------------------------------------ Client connecting to 192.168.4.1, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 160 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.4.2 port 56615 connected with 192.168.4.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 11.9 MBytes 10.0 Mbits/sec [ 3] Sent 8505 datagrams [ 3] Server Report: [ 3] 0.0-10.0 sec 11.9 MBytes 10.0 Mbits/sec 0.028 ms 1/ 8506 (0.012%) root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 100m ------------------------------------------------------------ Client connecting to 192.168.4.1, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 160 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.4.2 port 33153 connected with 192.168.4.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 120 MBytes 100 Mbits/sec [ 3] Sent 85304 datagrams [ 3] Server Report: [ 3] 0.0-10.0 sec 119 MBytes 99.8 Mbits/sec 0.077 ms 514/85305 (0.6%) root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 200m ------------------------------------------------------------ Client connecting to 192.168.4.1, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 160 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.4.2 port 56998 connected with 192.168.4.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 222 MBytes 186 Mbits/sec [ 3] Sent 158150 datagrams [ 3] Server Report: [ 3] 0.0-10.2 sec 59.3 MBytes 48.6 Mbits/sec 15.794 ms 115820/158151 (73%) root@OpenWrt:/# iperf -u -c 192.168.4.1 -b 300m ------------------------------------------------------------ Client connecting to 192.168.4.1, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 160 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.4.2 port 52109 connected with 192.168.4.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 223 MBytes 187 Mbits/sec [ 3] Sent 158814 datagrams [ 3] Server Report: [ 3] 0.0-10.0 sec 58.0 MBytes 48.7 Mbits/sec 0.240 ms 117423/158815 (74%)
Examples
Here are some examples showing various tunings of a GW2388 (dual core 600MHz ARM with dual GigE ports) with iperf network bandwidth test between GW2388 and a PC through a Netgear GigE switch:
- pre-built firmware (12-10 release), unmodified (186/491mbps tx/rx)
root@OpenWrt:/# iperf -c 192.168.1.146 ------------------------------------------------------------ Client connecting to 192.168.1.146, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.83 port 56088 connected with 192.168.1.146 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 223 MBytes 186 Mbits/sec root@OpenWrt:/# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.83 port 5001 connected with 192.168.1.146 port 55929 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 500 MBytes 419 Mbits/sec
- disabling all kernel modules (306/732mbps tx/rx) (iptables/ebtables is a big performance hit)
root@OpenWrt:/# mv /etc/modules.d /etc/modules.old root@OpenWrt:/# reboot root@OpenWrt:/# iperf -c 192.168.1.146 ------------------------------------------------------------ Client connecting to 192.168.1.146, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.83 port 58570 connected with 192.168.1.146 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 365 MBytes 306 Mbits/sec root@gw2388-test:/# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.83 port 5001 connected with 192.168.1.146 port 56554 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 873 MBytes 732 Mbits/sec
- disabling all init scripts as well (378/754mbps tx/rx)
root@OpenWrt:/# for i in $(ls /etc/init.d); do /etc/init.d/$i disable; done root@OpenWrt:/# /etc/init.d/boot enable root@OpenWrt:/# reboot root@OpenWrt:/# iperf -c 192.168.1.146 ------------------------------------------------------------ Client connecting to 192.168.1.146, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.93 port 52295 connected with 192.168.1.146 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 451 MBytes 378 Mbits/sec root@OpenWrt:/# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.93 port 5001 connected with 192.168.1.146 port 37005 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 900 MBytes 754 Mbits/sec
Here are some examples using a GW2380 (single core 300MHz ARM with single GigE port) looking at transmit performance:
- pre-built firmware (12-10 release), unmodified (153/138mbps tx/rx)
root@OpenWrt:/# iperf -c 192.168.1.146 ------------------------------------------------------------ Client connecting to 192.168.1.146, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.87 port 33327 connected with 192.168.1.146 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 183 MBytes 153 Mbits/sec root@OpenWrt:/# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.87 port 5001 connected with 192.168.1.146 port 53363 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.1 sec 166 MBytes 138 Mbits/sec
- disabling all kernel modules (207/214mbps tx/rx) (iptables/ebtables is a big performance hit)
root@OpenWrt:/# mv /etc/modules.d /etc/modules.old root@OpenWrt:/# reboot root@OpenWrt:/# iperf -c 192.168.1.146 ------------------------------------------------------------ Client connecting to 192.168.1.146, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.97 port 60819 connected with 192.168.1.146 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 248 MBytes 207 Mbits/sec root@OpenWrt:/# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.87 port 5001 connected with 192.168.1.146 port 53387 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 255 MBytes 214 Mbits/sec
- disabling all init scripts as well (222/214mbps tx/rx)
root@OpenWrt:/# for i in $(ls /etc/init.d); do /etc/init.d/$i disable; done root@OpenWrt:/# /etc/init.d/boot enable root@OpenWrt:/# /etc/init.d/done enable root@OpenWrt:/# reboot root@OpenWrt:/# iperf -c 192.168.1.146 ------------------------------------------------------------ Client connecting to 192.168.1.146, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.97 port 34382 connected with 192.168.1.146 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 265 MBytes 222 Mbits/sec root@OpenWrt:/# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.1.87 port 5001 connected with 192.168.1.146 port 53387 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 255 MBytes 214 Mbits/sec
Notes:
- the above tests are using 'iperf -s' on an Unbuntu Linux GigE PC Host through a GigE Netgear switch. This uses a 16KB TCP window size - you can acheive higher tx throughput by using larger TCP windows.
Wireless Tuning
Wireless performance is a very finicky process.
Wireless is dependent on so many factors. Wireless N speeds at 300 Mbps are not typically achieved in real life. An ARM processor on a Gateworks board is not quite the same as a 3.4 GHz X86 Machine.
Here are a few tips for wireless:
- Wireless encryption protocols can affect performance because of computation. Choose wisely.
- Antenna type and orientation and distance are VERY important. Be sure to hook up all antennas to the wireless card and testing 2 boards 3 feet apart may not achieve the best results.
- Wireless N will sometimes run faster at 40MHz vs 20 MHz.
- Try Channel 153 for Wireless N.
- Use a uncluttered channel that other devices are not using.
- The hardware mode for openwrt for the N radios would typically be 11na.
- 5.8GHz will run faster than 2.4GHz.
- Optimize the software as noted below. The more items running on the board, the less processor that can be utilized for all of the wireless traffic.
We have removed some modules at times to reduce the software load and try to obtain higher throughput.
These are the commands that have been using to tune performance for wireless testing on both the Cambria and Laguna platforms. This is not a fully supported and will disable certain features. Please use at your own risk.
*WARNING: Run these commands on the Gateworks board once it is booted up. Please only do so over serial as telnet is disabled below
/etc/init.d/batmand disable /etc/init.d/collectd disable /etc/init.d/cron disable /etc/init.d/dnsmasq disable /etc/init.d/dropbear disable /etc/init.d/firewall disable /etc/init.d/gpsd disable /etc/init.d/gscd disable /etc/init.d/led disable /etc/init.d/luci_bwc disable /etc/init.d/luci_dhcp_migrate disable /etc/init.d/luci_fixtime disable /etc/init.d/luci_statistics disable /etc/init.d/miniupnpd disable /etc/init.d/openvpn disable /etc/init.d/qos disable /etc/init.d/rcS disable /etc/init.d/relayd disable /etc/init.d/serialoverip disable /etc/init.d/sysntpd disable /etc/init.d/ntpd disable /etc/init.d/telnet disable /etc/init.d/tinyproxy disable /etc/init.d/uhttpd disable /etc/init.d/umount disable /etc/init.d/usb disable /etc/init.d/vnstat disable mkdir /etc/modules.old mv /etc/modules.d/* /etc/modules.old mv /etc/modules.old/*crypto* /etc/modules.d mv /etc/modules.old/*80211* /etc/modules.d mv /etc/modules.old/*nls-base* /etc/modules.d mv /etc/modules.old/*usb-core* /etc/modules.d mv /etc/modules.old/*ath5k* /etc/modules.d mv /etc/modules.old/*ath9k* /etc/modules.d mv /etc/modules.old/26-ath /etc/modules.d mv /etc/modules.old/50-madwifi /etc/modules.d
Wireless configuration used:
ACCESS POINT WIRELESS UCI Config: root@OpenWrt:/# uci show wireless wireless.radio0=wifi-device wireless.radio0.type=mac80211 wireless.radio0.phy=phy0 wireless.radio0.ht_capab=SHORT-GI-40 TX-STBC RX-STBC1 DSSS_CCK-40 wireless.radio0.disabled=0 wireless.radio0.country=US wireless.radio0.txpower=17 wireless.radio0.htmode=HT40- wireless.radio0.channel=153 wireless.radio0.hwmode=11na wireless.@wifi-iface[0]=wifi-iface wireless.@wifi-iface[0].device=radio0 wireless.@wifi-iface[0].network=lan wireless.@wifi-iface[0].mode=ap wireless.@wifi-iface[0].ssid=gateworks wireless.@wifi-iface[0].wds=1 wireless.@wifi-iface[0].encryption=psk2 wireless.@wifi-iface[0].key=abc CLIENT WDS WIRELESS UCI Config: root@OpenWrt:/# uci show wireless wireless.radio0=wifi-device wireless.radio0.type=mac80211 wireless.radio0.phy=phy0 wireless.radio0.ht_capab=SHORT-GI-40 TX-STBC RX-STBC1 DSSS_CCK-40 wireless.radio0.disabled=0 wireless.radio0.txpower=17 wireless.radio0.country=US wireless.radio0.htmode=HT40- wireless.radio0.channel=153 wireless.radio0.hwmode=11na wireless.@wifi-iface[0]=wifi-iface wireless.@wifi-iface[0].device=radio0 wireless.@wifi-iface[0].network=lan wireless.@wifi-iface[0].ssid=gateworks wireless.@wifi-iface[0].mode=sta wireless.@wifi-iface[0].wds=1 wireless.@wifi-iface[0].encryption=psk2 wireless.@wifi-iface[0].key=abc
We then ran iperf from the the two pc's, with one being the server and one the client.
Server: iperf -s Client: iperf -t10 -w 512k -c 192.168.0.23 iperf TCP results: ------------------------------------------------------------ Client connecting to 192.168.0.23, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 512 KByte) ------------------------------------------------------------ [ 3] local 192.168.0.22 port 52644 connected with 192.168.0.23 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 99.9 MBytes 83.7 Mbits/sec
Wireless Modulation Rate
The modulation rate is dynamic. The rate adjusts depending on the quality of the RF signal. The better the signal, the better the rate. The table below indicates what rate you are currently achieving. If you have a terrible rate, you will notice your bandwidth much lower and need to adjust the antennas / distance / obstructions to get a better rate. The highest rates can be very very hard to achieve.
root@OpenWrt:/# cat /sys/kernel/debug/ieee80211/phy0/netdev\:wlan0/stations/a8\: 54\:b2\:00\:04\:2e/rc_stats type rate throughput ewma prob this prob this succ/attempt success attempts HT20/LGI MCS0 6.1 92.0 100.0 0( 0) 106 111 HT20/LGI MCS1 12.8 96.7 100.0 0( 0) 109 116 HT20/LGI MCS2 19.5 99.7 100.0 0( 0) 112 119 HT20/LGI MCS3 25.8 99.9 100.0 0( 0) 108 110 HT20/LGI MCS4 36.2 95.2 100.0 0( 0) 102 112 HT20/LGI MCS5 44.2 89.8 100.0 0( 0) 108 111 HT20/LGI MCS6 48.4 88.7 100.0 0( 0) 2442 2816 HT20/LGI PMCS7 48.5 79.1 56.6 68(120) 3616 4236 HT20/LGI MCS8 12.1 92.1 100.0 0( 0) 103 110 HT20/LGI MCS9 24.5 95.1 100.0 0( 0) 105 117 HT20/LGI MCS10 37.8 99.5 100.0 0( 0) 118 123 HT20/LGI MCS11 48.3 98.1 100.0 0( 0) 121 127 HT20/LGI MCS12 58.8 81.7 33.3 13( 39) 14294 17331 HT20/LGI t MCS13 66.9 74.2 60.8 0( 0) 62514 85221 HT20/LGI T MCS14 67.9 67.2 58.2 0( 0) 110248 163970 HT20/LGI MCS15 61.4 55.9 28.0 32(114) 141375 243633 T= what we are on t= what we are on next
Measuring Performance
In most cases, measuring the performance of a particular system function can be done with specialized tests like iperf
. When a more thorough measurement is desired of the overall system, a small and portable benchmarking suite like lmbench
is more appropriate.
lmbench
lmbench
is a micro-benchmark suite designed to focus attention on the basic building blocks of many common system applications, such as databases, simulations, software development, and networking. It provides a suite of benchmarks that attempt to measure the most commonly found performance bottlenecks in a wide range of system applications. lmbench
is designed to identify, isolate, and reproduce these performance bottlenecks using a set of small microbenchmarks which measure system latency and bandwidth of data movement among the processor and memory, network, file system, and disk.
Installation can be done on Ubuntu systems via sudo apt-get install lmbench
. Alternatively, you can compile the source yourself which can be found at the lmbench website.
Run the following commands to compile the source:
tar xvf lmbench3.tar.gz cd lmbench3 mkdir SCCS; touch SCCS/s.ChangeSet make -C src/
Running the benchmark suite is done via make results
. The binaries for the individual benchmarks can be found in the bin/
directory.
The src/Makefile
has additional make targets for your convenience:
# lmbench [default] builds the benchmark suite for the current os/arch # results builds, configures run parameters, and runs the benchmark # rerun reruns the benchmark using the same parameters as last time # scaling reruns the benchmark using same parameters as last time, # except it asks what scaling value to use # hardware reruns the hardware benchmarks using the same parameters # os reruns the OS benchmarks using the same parameters # clean cleans out sources and run configuration # clobber clean and removes the bin directories # shar obsolete, use cd .. && make shar # depend builds make dependencies (needs gcc) # debug builds all the benchmarks with '-g' debugging flag # assembler builds the .s files for each benchmark3
After the test has been completed, you can compare results from multiple runs with make -C results/ LIST=*
Read the man pages for the individual benchmarks in the doc/
directory, or the lmbench introduction to learn more about the test suite.