| 80 | |
| 81 | 8B-2B bit encoding is used on the data for gen1/gen2 (8 data, 2 checksum) which is 20% overhead and 80% of data thus 80% of 5000 is 4000 theoretical max for a gen2 link. For gen3 128B/130B encoding is used for a 98.75% efficiency. Additional data overhead would be specific to the PCIe device in question. A GbE and/or an NVMe should have low data overhead for example. |
| 82 | |
| 83 | PCIe max bw considering clock rate and data encoding (1x means 1 lane): |
| 84 | - pcie gen1 x1 : 2500MT/s*1lane*80% (8B/10B encoding) = 2000Mbps = 250MB/s (187MB/s with TLP=128) |
| 85 | - pcie gen2 x1 : 5000MT/s*1lane*80% (8B/10B encoding) = 4000Mbps = 500MB/s |
| 86 | - pcie gen3 x1 : 8000MT/s*1lane*98.75% (128B/130B encoding) = 7900Mbps = 987.5MB/s |
| 87 | |
| 88 | Next comes Packet Efficiency based on Transaction Layer Packet (TLP) overhead bound by the max TLP size between links : |
| 89 | ||= MPS (Bytes) =||= Calculation =||= Packet Efficiency (%) =|| |
| 90 | || 128 || 128 / (128 + 20) = 86 || 86 || |
| 91 | || 256 || 256 / (256 + 20) = 92 || 92 || |
| 92 | || 512 || 512 / (512 + 20) = 96 || 96 || |
| 93 | || 1024 || 1024 / (1024 + 20) = 98 || 98 || |
| 94 | * see [https://docs.xilinx.com/v/u/en-US/wp350 Understanding Performance of PCI Express Systems] Table 3 |
| 95 | |
| 96 | The lscpi command will show you the "!MaxPayload" size of the specific ports: |
| 97 | {{{#!bash |
| 98 | lspci -vvv |
| 99 | 00:06.0 System peripheral: Cavium, Inc. THUNDERX GPIO Controller (rev 02) |
| 100 | Subsystem: Cavium, Inc. THUNDERX GPIO Controller |
| 101 | Device tree node: /sys/firmware/devicetree/base/soc@0/pci@848000000000/gpio0@6,0 |
| 102 | Flags: bus master, fast devsel, latency 0, NUMA node 0 |
| 103 | Region 0: Memory at 8430a0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M] |
| 104 | Region 4: Memory at 8430e0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M] |
| 105 | Capabilities: [70] Express (v2) Endpoint, MSI 00 |
| 106 | DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us |
| 107 | ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W |
| 108 | DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- |
| 109 | RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- |
| 110 | MaxPayload 128 bytes, MaxReadReq 128 bytes |
| 111 | }}} |
| 112 | * !MaxPayload under !DevCap indicates what the device is capable of (up to 128B payloads here) |
| 113 | * !MaxPayload under !DevCtl indicates what the device is configured for (128B payloads here) |
| 114 | |
| 115 | Taking into account lane encoding and 128B payloads, the theoretical max per lane would be: |
| 116 | ||= gen =|| transfer rate (MT/s) =||= encoding =||= TLP rate =|| |
| 117 | || 1 || 2500 || 8B/10B 80% = 250MB/s =||= 86% 215MB/s || |
| 118 | || 2 || 8000 || 8B/10B 80% = 500MB/s =||= 86% 430MB/s || |
| 119 | || 3 || 8000 || 128B/130B 98.75% = 987.5MB/s =||= 86% 849.25MB/s || |
| 120 | |
| 121 | References: |
| 122 | - [https://docs.xilinx.com/v/u/en-US/wp350 Understanding Performance of PCI Express Systems] |
| 123 | |
| 262 | |
| 263 | [=#enumeration] |
| 264 | = PCIe Enumeration |
| 265 | PCIe enumeration (scanning of the devices on the bus) occurs during Linux kernel init time. While MiniPCIe and M.2 sockets do not support hotplug from an electrical standpoint you can get Linux to re-scan a bus which may be helpful for example if you have a device that needs to be programmed with firmware over a side-channel before it behaves like a PCIe endpoint (ie FPGA). In this case you can rescan the bus via sysfs. |
| 266 | |
| 267 | Example: |
| 268 | * remove a device from the bus: |
| 269 | {{{#!bash |
| 270 | root@focal-venice:~# lspci -n |
| 271 | 00:00.0 0604: 16c3:abcd (rev 01) |
| 272 | 01:00.0 0280: 168c:003c |
| 273 | root@focal-venice:~# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove |
| 274 | root@focal-venice:~# lspci -n |
| 275 | 00:00.0 0604: 16c3:abcd (rev 01) |
| 276 | }}} |
| 277 | * re-scan the bus: |
| 278 | {{{#!bash |
| 279 | root@focal-venice:~# echo 1 > /sys/bus/pci/rescan |
| 280 | [ 78.881014] pci 0000:01:00.0: [168c:003c] type 00 class 0x028000 |
| 281 | [ 78.887205] pci 0000:01:00.0: reg 0x10: [mem 0x18000000-0x181fffff 64bit] |
| 282 | [ 78.894245] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref] |
| 283 | [ 78.901443] pci 0000:01:00.0: supports D1 D2 |
| 284 | [ 78.908375] pci 0000:01:00.0: BAR 0: assigned [mem 0x18000000-0x181fffff 64bit] |
| 285 | [ 78.915804] pci 0000:01:00.0: BAR 6: assigned [mem 0x18300000-0x1830ffff pref] |
| 286 | [ 78.925394] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0 |
| 287 | [ 79.090892] ath10k_pci 0000:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000 |
| 288 | [ 79.100172] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0 |
| 289 | [ 79.108490] ath10k_pci 0000:01:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258 |
| 290 | [ 79.162062] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08 |
| 291 | [ 80.390818] ath10k_pci 0000:01:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1 |
| 292 | [ 80.487289] ath: EEPROM regdomain: 0x0 |
| 293 | [ 80.491106] ath: EEPROM indicates default country code should be used |
| 294 | [ 80.497584] ath: doing EEPROM country->regdmn map search |
| 295 | [ 80.502926] ath: country maps to regdmn code: 0x3a |
| 296 | [ 80.507742] ath: Country alpha2 being used: US |
| 297 | [ 80.512231] ath: Regpair used: 0x3a |
| 298 | root@focal-venice:~# lspci -n |
| 299 | 00:00.0 0604: 16c3:abcd (rev 01) |
| 300 | 01:00.0 0280: 168c:003c |
| 301 | }}} |
| 302 | |
| 303 | Note that in some cases if the PCIe link breaks to the host controller such as an IMX8M without a switch a re-scan is not possible. |
| 304 | |