[[PageOutline]] See also: * [wiki:venice/PCIe] * [wiki:newport/PCIe] * [wiki:ventana/PCIe] Gateworks PCI support: ||= Product Family =||= Capabilities =|| || Venice || PCIe Gen2 ^^^3^^^ || || Newport || PCIe Gen2 ^^^2^^^ || || Ventana || PCIe Gen1 ^^^1^^^ || 1. Ventana boards with external clock generators can theoretically support Gen2 however some software modification would be necessary for the PCIe clock configuration. 2. Newport can support PCIe Gen3 via a Gateworks special which modifies a strapping resistor to move the coprocessor clock (SCLK) from 350MHz to 550Mhz (at the cost of ~500mW of power draw). 3. Venice i.MX 8M has a limitation when the inbound write data transfer size exceeds 400 Bytes, the number of inbound MWr TLP transactions the controller can support is up to the combination of 12 hearders and 400 bytes of data (see [https://comm.eefocus.com/media/download/index/id-1021154 AN13164 iMX8MP PCIe Bandwidth Analysis]. Higher performance can be obtained by having the i.MX 8M Plus issue outbound MRd transactions instead of using inbound MWr. = PCI Peripheral Component Interconnect (PCI) is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus and Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space. Bus mastering refers to the concept that PCI devices can directly access a processors memory bus independent of the processor similar to a Direct Memory Access (DMA) controller. PCI History: * PCI 1.9 1992 Original issue * PCI 2.0 1993 Incorporated connector and add-in card specification * PCI 2.1 1995 Incorporated clarifications and added 66 MHz * PCI 2.2 1998 Added Mini PCI, Incorporated ECNs and improved readability * PCI 2.3 2002 Incorporated ECNs, errata, and deleted 5 volt only keyed add-in cards * PCI 3.0 2004 Removed support for 5.0 volt keyed system board connector * PCI Express 2004 Conventional PCI had 4 shared level-triggered interrupts and uses a paralle bus architecture where the PCI host and all devices share a common set of address, data and control lines. References: * https://en.wikipedia.org/wiki/Conventional_PCI = Mini PCI Mini PCI was added to PCI version 2.2 and differs from Conventional PCI in the following ways: * 32bit 33Mhz * 3.3V only; 5V limited to 100mA * three form factors: - Type I card uses a 100pin stacking connector - Type II card uses a 100pin stacking connector and accomodates a larger size - Type III card uses a 124pin edge connector Older generation Gateworks products such as some Laguna product families support the Mini PCI Type III cards. = PCI Express (PCIe) While the original PCI bus, now referred to as 'Conventional PCI' was a parallel bus with shared address/data in 2004 The PCI Express (PCIe) specification was released which defined a serialized version of PCI which is commonplace today. PCI Express is based on a point-to-point topology with separate serial links connecting every device to the host, also known as the root complex (RC). Links may contain from one to 32 lanes (1x, 2x, 4x, 12x, 16x, 32x) with each lane being its own differential pair. PCI Express interrupts are embedded within the serial data. References: * https://en.wikipedia.org/wiki/PCI_Express = PCI Express Mini Card (also known as 'Mini PCIe', 'mPCIe' or 'PEM') The PCI Express Mini Card specification is based on PCI Express with the following differences: * uses a 52-pin edge connector with 2 rows of pins * incorporates both 1x (1 lane) PCI Express, USB 2.0, and SIM connectivity on the connector Modern generation Gateworks products such as the Laguna GW2391, Ventana, and Newport product families support Mini PCIe cards. [=#throughput] = PCI Throughput There are several factors that can affect PCIe performance. The most obvious factor is how many lanes (pairs of TX/RX SERDES channels) you have: 1x, 2x, 3x, 4x etc which are pure multipliers to the rates that can be achieved over a single lane. The next most obvious factor is what generation of PCIe your host controller (root complex or RC) and device (endpoint or EP) supports: Gen1, Gen2, Gen3 etc which factors into the transfer rate and data transfer overhead [1]. Digging deeper into the Transaction Layer Packet (TLP) overhead is not as obvious as RC's and EP's have varying max payload packet sizes. Digging even deeper than this you may end up running into limits that have to do with the implementation of the host controller and SoC resources. MiniPCIe connectors provide a single lane (1x) where as M.2 sockets can allow additional lanes depending on the socket. The PCI specification has evolved to support various generations capable of increasing bus speeds: * PCI Gen3 (8.0GT/sec or 6.4Gbps) * PCI Gen2 (5.0GT/sec or 4Gbps) * PCI Gen1 (2.5GT/sec or 2Gbps) These are backwards compatible such that a PCI Gen3 link will only be established if the device and host controller support it and otherwise it will step down to Gen2 then Gen1 as needed. The bus speed represents a theoretical maximum throughput and does not account for host processing speed or bus contention from multiple masters. 8B-2B bit encoding is used on the data for gen1/gen2 (8 data, 2 checksum) which is 20% overhead and 80% of data thus 80% of 5000 is 4000 theoretical max for a gen2 link. For gen3 128B/130B encoding is used for a 98.75% efficiency. Additional data overhead would be specific to the PCIe device in question. A GbE and/or an NVMe should have low data overhead for example. PCIe max bw considering clock rate and data encoding (1x means 1 lane): - pcie gen1 x1 : 2500MT/s*1lane*80% (8B/10B encoding) = 2000Mbps = 250MB/s (187MB/s with TLP=128) - pcie gen2 x1 : 5000MT/s*1lane*80% (8B/10B encoding) = 4000Mbps = 500MB/s - pcie gen3 x1 : 8000MT/s*1lane*98.75% (128B/130B encoding) = 7900Mbps = 987.5MB/s Next comes Packet Efficiency based on Transaction Layer Packet (TLP) overhead bound by the max TLP size between links : ||= MPS (Bytes) =||= Calculation =||= Packet Efficiency (%) =|| || 128 || 128 / (128 + 20) = 86 || 86 || || 256 || 256 / (256 + 20) = 92 || 92 || || 512 || 512 / (512 + 20) = 96 || 96 || || 1024 || 1024 / (1024 + 20) = 98 || 98 || * see [https://docs.xilinx.com/v/u/en-US/wp350 Understanding Performance of PCI Express Systems] Table 3 The lscpi command will show you the "!MaxPayload" size of the specific ports: {{{#!bash lspci -vvv 00:06.0 System peripheral: Cavium, Inc. THUNDERX GPIO Controller (rev 02) Subsystem: Cavium, Inc. THUNDERX GPIO Controller Device tree node: /sys/firmware/devicetree/base/soc@0/pci@848000000000/gpio0@6,0 Flags: bus master, fast devsel, latency 0, NUMA node 0 Region 0: Memory at 8430a0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M] Region 4: Memory at 8430e0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M] Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes }}} * !MaxPayload under !DevCap indicates what the device is capable of (up to 128B payloads here) * !MaxPayload under !DevCtl indicates what the device is configured for (128B payloads here) Taking into account lane encoding and 128B payloads, the theoretical max per lane would be: ||= gen =|| transfer rate (MT/s) =||= encoding =||= TLP rate =|| || 1 || 2500 || 8B/10B 80% = 250MB/s =||= 86% 215MB/s || || 2 || 8000 || 8B/10B 80% = 500MB/s =||= 86% 430MB/s || || 3 || 8000 || 128B/130B 98.75% = 987.5MB/s =||= 86% 849.25MB/s || References: - [https://docs.xilinx.com/v/u/en-US/wp350 Understanding Performance of PCI Express Systems] [=#linux-pci-debug] = Linux PCI Debugging PCI configuration registers can be used to debug various PCI bus issues. The easiest way to access these registers is via the Linux {{{lspci}}} command with the 'very verbose' flag (-vv) which will decode and display the various PCI config space registers. Note that access to some parts of the PCI configuration space is restricted to root permissions on many operating systems - if this is the case you will see certain data flagged as 'access denied'. The various registers define bits that are either set (indicated with a '+') or unset (indicated with a '-'). These bits typically have attributes of 'RW1C' meaning you can read and write them and need to write a '1' to clear them. Because these are status bits, if you wanted to 'count' the occurrences of them you would need to write some software that detected the bits getting set, incremented counters, and cleared them over time. The 'Device Status Register' (!DevSta) shows at a high level if there have been correctable errors detected (!CorrErr), non-fatal errors detected (!UncorrErr), fata errors detected (!FataErr), unsupported requests detected (!UnsuppReq), if the device requires auxillary power (!AuxPwr), and if there are transactions pending (non posted requests that have not been completed). If you want to delve deeper into types of errors see [#aer PCI Advanced Error Reporting] below. References: - [https://www.kernel.org/doc/ols/2007/ols2007v2-pages-297-304.pdf Enable PCI Express Advanced Error Reporting in the Kernel] - [http://composter.com.ua/documents/PCI_Express_Base_Specification_Revision_3.0.pdf PCI Express Base Specification Revision 3.0] - [https://intrepid.warped.com/~scotte/OldBlogEntries/web/index-5.html PCI Debugging 101] [=#aer] == PCI Advanced Error Reporting (AER) Most modern PCI devices support 'Advanced Error Reporting' (AER). For these devices a {{{lspci -vv}}} as root will show additional registers similar to the ones described above: * UESta - Uncorrectable Error Status * UEMsk - Uncorrectable Error Mask * UESvrt - Uncorrectable Error Severity * CESta - Correctable Error Status * CEMsk - Correctable Error Mask * AERCap - Advanced Error Reporting Capabilities For specifics on what the meaning of the bits in these registers are see the PCI Express Base Specification Revision 3.0] == Examples Here are some examples: * Show all Atheros/QCA radios on the PCI bus (vendor 168c): {{{#!bash $ lspci -n | grep 168c 0001:20:00.0 0280: 168c:0046 }}} * Very Verbose listing of a specific device: {{{#!bash $ sudo lspci -s 1:20:00 -vv 0001:20:00.0 Network controller: Qualcomm Atheros Device 0046 Subsystem: Qualcomm Atheros Device cafe Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- /sys/bus/pci/devices/0000:01:00.0/remove root@focal-venice:~# lspci -n 00:00.0 0604: 16c3:abcd (rev 01) }}} * re-scan the bus: {{{#!bash root@focal-venice:~# echo 1 > /sys/bus/pci/rescan [ 78.881014] pci 0000:01:00.0: [168c:003c] type 00 class 0x028000 [ 78.887205] pci 0000:01:00.0: reg 0x10: [mem 0x18000000-0x181fffff 64bit] [ 78.894245] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref] [ 78.901443] pci 0000:01:00.0: supports D1 D2 [ 78.908375] pci 0000:01:00.0: BAR 0: assigned [mem 0x18000000-0x181fffff 64bit] [ 78.915804] pci 0000:01:00.0: BAR 6: assigned [mem 0x18300000-0x1830ffff pref] [ 78.925394] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0 [ 79.090892] ath10k_pci 0000:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000 [ 79.100172] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0 [ 79.108490] ath10k_pci 0000:01:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258 [ 79.162062] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08 [ 80.390818] ath10k_pci 0000:01:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1 [ 80.487289] ath: EEPROM regdomain: 0x0 [ 80.491106] ath: EEPROM indicates default country code should be used [ 80.497584] ath: doing EEPROM country->regdmn map search [ 80.502926] ath: country maps to regdmn code: 0x3a [ 80.507742] ath: Country alpha2 being used: US [ 80.512231] ath: Regpair used: 0x3a root@focal-venice:~# lspci -n 00:00.0 0604: 16c3:abcd (rev 01) 01:00.0 0280: 168c:003c }}} Note that in some cases if the PCIe link breaks to the host controller such as an IMX8M without a switch a re-scan is not possible. [=#mechanical] = Mini-PCIe Mechanical Specification = Gateworks SBCs adhere to the PCI Express Mini Card standard which uses a 52-pin edge connector. The majority of boards support full size cards which are 30mm x 50.95mm in size and 1.0mm thick. The Newport GW6903 supports half size cards which are 30mm x 26.8mm and 1.0mm thick. See the boards individual manuals for which type of signaling (PCIe, USB, SIM) is supported for each specific connector. The below mechanical drawing is for the standard Mini-PCIe connector used on the !Ventana/Newport/Venice boards. Note that other heights are available as special order (100 piece MOQ). Contact sales for available options. [[Image(http://trac.gateworks.com/raw-attachment/wiki/minipciexpressmodules/minipcieconn.png)]] == Mini-PCIe Screws == The hold down screws for the Mini-PCIe cards are pre-loaded into the standoffs on the board. If you need additional screws they can be purchased at McMaster Carr. - [https://www.mcmaster.com/#92005a016/=1eaais7 Standoff Screws]