Changes between Version 10 and Version 11 of PCI


Ignore:
Timestamp:
05/16/2022 08:13:06 PM (2 years ago)
Author:
Tim Harvey
Comment:

include more detail on PCI performance analysis and add enumeration section

Legend:

Unmodified
Added
Removed
Modified
  • PCI

    v10 v11  
    22
    33See also:
     4 * [wiki:venice/PCIe]
    45 * [wiki:newport/PCIe]
    56 * [wiki:ventana/PCIe]
     
    78Gateworks PCI support:
    89||= Product Family =||= Capabilities =||
    9 || Venice           || PCIe Gen2 ||
     10|| Venice           || PCIe Gen2 ^^^3^^^ ||
    1011|| Newport          || PCIe Gen2 ^^^2^^^ ||
    1112|| Ventana          || PCIe Gen1 ^^^1^^^ ||
    1213 1. Ventana boards with external clock generators can theoretically support Gen2 however some software modification would be necessary for the PCIe clock configuration.
    1314 2. Newport can support PCIe Gen3 via a Gateworks special which modifies a strapping resistor to move the coprocessor clock (SCLK) from 350MHz to 550Mhz (at the cost of ~500mW of power draw).
     15 3. Venice i.MX 8M has a limitation when the inbound write data transfer size exceeds 400 Bytes, the number of inbound MWr TLP transactions the controller can support is up to the combination of 12 hearders and 400 bytes of data (see [https://comm.eefocus.com/media/download/index/id-1021154 AN13164 iMX8MP PCIe Bandwidth Analysis]. Higher performance can be obtained by having the i.MX 8M Plus issue outbound MRd transactions instead of using inbound MWr.
     16
    1417
    1518= PCI
     
    6366[=#throughput]
    6467= PCI Throughput
     68There are several factors that can affect PCIe performance. The most obvious factor is how many lanes (pairs of TX/RX SERDES channels) you have: 1x, 2x, 3x, 4x etc which are pure multipliers to the rates that can be achieved over a single lane. The next most obvious factor is what generation of PCIe your host controller (root complex or RC) and device (endpoint or EP) supports: Gen1, Gen2, Gen3 etc which factors into the transfer rate and data transfer overhead [1]. Digging deeper into the Transaction Layer Packet (TLP) overhead is not as obvious as RC's and EP's have varying max payload packet sizes. Digging even deeper than this you may end up running into limits that have to do with the implementation of the host controller and SoC resources.
     69
     70MiniPCIe connectors provide a single lane (1x) where as M.2 sockets can allow additional lanes depending on the socket.
     71
    6572The PCI specification has evolved to support various generations capable of increasing bus speeds:
    6673 * PCI Gen3 (8.0GT/sec or 6.4Gbps)
     
    7178
    7279The bus speed represents a theoretical maximum throughput and does not account for host processing speed or bus contention from multiple masters.
     80
     818B-2B bit encoding is used on the data for gen1/gen2 (8 data, 2 checksum) which is 20% overhead and 80% of data thus 80% of 5000 is 4000 theoretical max for a gen2 link. For gen3 128B/130B encoding is used for a 98.75% efficiency. Additional data overhead would be specific to the PCIe device in question. A GbE and/or an NVMe should have low data overhead for example.
     82
     83PCIe max bw considering clock rate and data encoding (1x means 1 lane):
     84- pcie gen1 x1 : 2500MT/s*1lane*80% (8B/10B encoding) = 2000Mbps = 250MB/s (187MB/s with TLP=128)
     85- pcie gen2 x1 : 5000MT/s*1lane*80% (8B/10B encoding) = 4000Mbps = 500MB/s
     86- pcie gen3 x1 : 8000MT/s*1lane*98.75% (128B/130B encoding) = 7900Mbps = 987.5MB/s
     87
     88Next comes Packet Efficiency based on Transaction Layer Packet (TLP) overhead bound by the max TLP size between links :
     89||= MPS (Bytes) =||= Calculation =||= Packet Efficiency (%) =||
     90|| 128  || 128 / (128 + 20) = 86 || 86 ||
     91|| 256 || 256 / (256 + 20) = 92  || 92 ||
     92|| 512 || 512 / (512 + 20) = 96 || 96 ||
     93|| 1024 || 1024 / (1024 + 20) = 98 || 98 ||
     94 * see [https://docs.xilinx.com/v/u/en-US/wp350 Understanding Performance of PCI Express Systems] Table 3
     95
     96The lscpi command will show you the "!MaxPayload" size of the specific ports:
     97{{{#!bash
     98lspci -vvv
     9900:06.0 System peripheral: Cavium, Inc. THUNDERX GPIO Controller (rev 02)
     100        Subsystem: Cavium, Inc. THUNDERX GPIO Controller
     101        Device tree node: /sys/firmware/devicetree/base/soc@0/pci@848000000000/gpio0@6,0
     102        Flags: bus master, fast devsel, latency 0, NUMA node 0
     103        Region 0: Memory at 8430a0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M]
     104        Region 4: Memory at 8430e0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M]
     105        Capabilities: [70] Express (v2) Endpoint, MSI 00
     106                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
     107                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
     108                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
     109                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
     110                        MaxPayload 128 bytes, MaxReadReq 128 bytes
     111}}}
     112 * !MaxPayload under !DevCap indicates what the device is capable of (up to 128B payloads here)
     113 * !MaxPayload under !DevCtl indicates what the device is configured for (128B payloads here)
     114
     115Taking into account lane encoding and 128B payloads, the theoretical max per lane would be:
     116||= gen =|| transfer rate (MT/s) =||= encoding =||= TLP rate =||
     117|| 1 || 2500 || 8B/10B 80% = 250MB/s =||= 86% 215MB/s ||
     118|| 2 || 8000 || 8B/10B 80% = 500MB/s =||= 86% 430MB/s ||
     119|| 3 || 8000 || 128B/130B 98.75% = 987.5MB/s =||= 86% 849.25MB/s ||
     120
     121References:
     122 - [https://docs.xilinx.com/v/u/en-US/wp350 Understanding Performance of PCI Express Systems]
     123
    73124
    74125[=#linux-pci-debug]
     
    209260Note that by default the Linux kernel will not alter ECRC Generation / Check and considers this configured by boot firmware. This can be overridden by enabling CONFIG_PCIE_ECRC in the kernel and passing the kernel cmdline 'ecrc=0' to force disable or 'ecrc=1' to force enable
    210261
     262
     263[=#enumeration]
     264= PCIe Enumeration
     265PCIe enumeration (scanning of the devices on the bus) occurs during Linux kernel init time. While MiniPCIe and M.2 sockets do not support hotplug from an electrical standpoint you can get Linux to re-scan a bus which may be helpful for example if you have a device that needs to be programmed with firmware over a side-channel before it behaves like a PCIe endpoint (ie FPGA). In this case you can rescan the bus via sysfs.
     266
     267Example:
     268 * remove a device from the bus:
     269{{{#!bash
     270root@focal-venice:~# lspci -n
     27100:00.0 0604: 16c3:abcd (rev 01)
     27201:00.0 0280: 168c:003c
     273root@focal-venice:~# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
     274root@focal-venice:~# lspci -n
     27500:00.0 0604: 16c3:abcd (rev 01)
     276}}}
     277 * re-scan the bus:
     278{{{#!bash
     279root@focal-venice:~# echo 1 > /sys/bus/pci/rescan
     280[   78.881014] pci 0000:01:00.0: [168c:003c] type 00 class 0x028000
     281[   78.887205] pci 0000:01:00.0: reg 0x10: [mem 0x18000000-0x181fffff 64bit]
     282[   78.894245] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
     283[   78.901443] pci 0000:01:00.0: supports D1 D2
     284[   78.908375] pci 0000:01:00.0: BAR 0: assigned [mem 0x18000000-0x181fffff 64bit]
     285[   78.915804] pci 0000:01:00.0: BAR 6: assigned [mem 0x18300000-0x1830ffff pref]
     286[   78.925394] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
     287[   79.090892] ath10k_pci 0000:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
     288[   79.100172] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
     289[   79.108490] ath10k_pci 0000:01:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258
     290[   79.162062] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
     291[   80.390818] ath10k_pci 0000:01:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
     292[   80.487289] ath: EEPROM regdomain: 0x0
     293[   80.491106] ath: EEPROM indicates default country code should be used
     294[   80.497584] ath: doing EEPROM country->regdmn map search
     295[   80.502926] ath: country maps to regdmn code: 0x3a
     296[   80.507742] ath: Country alpha2 being used: US
     297[   80.512231] ath: Regpair used: 0x3a
     298root@focal-venice:~# lspci -n
     29900:00.0 0604: 16c3:abcd (rev 01)
     30001:00.0 0280: 168c:003c
     301}}}
     302
     303Note that in some cases if the PCIe link breaks to the host controller such as an IMX8M without a switch a re-scan is not possible.
     304
    211305[=#mechanical]
    212306= Mini-PCIe Mechanical Specification =