wiki:PCI

Version 11 (modified by Tim Harvey, 4 months ago) (diff)

include more detail on PCI performance analysis and add enumeration section

See also:

Gateworks PCI support:

Product Family Capabilities
Venice PCIe Gen2 3
Newport PCIe Gen2 2
Ventana PCIe Gen1 1
  1. Ventana boards with external clock generators can theoretically support Gen2 however some software modification would be necessary for the PCIe clock configuration.
  2. Newport can support PCIe Gen3 via a Gateworks special which modifies a strapping resistor to move the coprocessor clock (SCLK) from 350MHz to 550Mhz (at the cost of ~500mW of power draw).
  3. Venice i.MX 8M has a limitation when the inbound write data transfer size exceeds 400 Bytes, the number of inbound MWr TLP transactions the controller can support is up to the combination of 12 hearders and 400 bytes of data (see AN13164 iMX8MP PCIe Bandwidth Analysis. Higher performance can be obtained by having the i.MX 8M Plus issue outbound MRd transactions instead of using inbound MWr.

PCI

Peripheral Component Interconnect (PCI) is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus and Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space.

Bus mastering refers to the concept that PCI devices can directly access a processors memory bus independent of the processor similar to a Direct Memory Access (DMA) controller.

PCI History:

  • PCI 1.9 1992 Original issue
  • PCI 2.0 1993 Incorporated connector and add-in card specification
  • PCI 2.1 1995 Incorporated clarifications and added 66 MHz
  • PCI 2.2 1998 Added Mini PCI, Incorporated ECNs and improved readability
  • PCI 2.3 2002 Incorporated ECNs, errata, and deleted 5 volt only keyed add-in cards
  • PCI 3.0 2004 Removed support for 5.0 volt keyed system board connector
  • PCI Express 2004

Conventional PCI had 4 shared level-triggered interrupts and uses a paralle bus architecture where the PCI host and all devices share a common set of address, data and control lines.

References:

Mini PCI

Mini PCI was added to PCI version 2.2 and differs from Conventional PCI in the following ways:

  • 32bit 33Mhz
  • 3.3V only; 5V limited to 100mA
  • three form factors:
    • Type I card uses a 100pin stacking connector
    • Type II card uses a 100pin stacking connector and accomodates a larger size
    • Type III card uses a 124pin edge connector

Older generation Gateworks products such as some Laguna product families support the Mini PCI Type III cards.

PCI Express (PCIe)

While the original PCI bus, now referred to as 'Conventional PCI' was a parallel bus with shared address/data in 2004 The PCI Express (PCIe) specification was released which defined a serialized version of PCI which is commonplace today.

PCI Express is based on a point-to-point topology with separate serial links connecting every device to the host, also known as the root complex (RC). Links may contain from one to 32 lanes (1x, 2x, 4x, 12x, 16x, 32x) with each lane being its own differential pair. PCI Express interrupts are embedded within the serial data.

References:

PCI Express Mini Card (also known as 'Mini PCIe', 'mPCIe' or 'PEM')

The PCI Express Mini Card specification is based on PCI Express with the following differences:

  • uses a 52-pin edge connector with 2 rows of pins
  • incorporates both 1x (1 lane) PCI Express, USB 2.0, and SIM connectivity on the connector

Modern generation Gateworks products such as the Laguna GW2391, Ventana, and Newport product families support Mini PCIe cards.

PCI Throughput

There are several factors that can affect PCIe performance. The most obvious factor is how many lanes (pairs of TX/RX SERDES channels) you have: 1x, 2x, 3x, 4x etc which are pure multipliers to the rates that can be achieved over a single lane. The next most obvious factor is what generation of PCIe your host controller (root complex or RC) and device (endpoint or EP) supports: Gen1, Gen2, Gen3 etc which factors into the transfer rate and data transfer overhead [1]. Digging deeper into the Transaction Layer Packet (TLP) overhead is not as obvious as RC's and EP's have varying max payload packet sizes. Digging even deeper than this you may end up running into limits that have to do with the implementation of the host controller and SoC resources.

MiniPCIe connectors provide a single lane (1x) where as M.2 sockets can allow additional lanes depending on the socket.

The PCI specification has evolved to support various generations capable of increasing bus speeds:

  • PCI Gen3 (8.0GT/sec or 6.4Gbps)
  • PCI Gen2 (5.0GT/sec or 4Gbps)
  • PCI Gen1 (2.5GT/sec or 2Gbps)

These are backwards compatible such that a PCI Gen3 link will only be established if the device and host controller support it and otherwise it will step down to Gen2 then Gen1 as needed.

The bus speed represents a theoretical maximum throughput and does not account for host processing speed or bus contention from multiple masters.

8B-2B bit encoding is used on the data for gen1/gen2 (8 data, 2 checksum) which is 20% overhead and 80% of data thus 80% of 5000 is 4000 theoretical max for a gen2 link. For gen3 128B/130B encoding is used for a 98.75% efficiency. Additional data overhead would be specific to the PCIe device in question. A GbE and/or an NVMe should have low data overhead for example.

PCIe max bw considering clock rate and data encoding (1x means 1 lane):

  • pcie gen1 x1 : 2500MT/s*1lane*80% (8B/10B encoding) = 2000Mbps = 250MB/s (187MB/s with TLP=128)
  • pcie gen2 x1 : 5000MT/s*1lane*80% (8B/10B encoding) = 4000Mbps = 500MB/s
  • pcie gen3 x1 : 8000MT/s*1lane*98.75% (128B/130B encoding) = 7900Mbps = 987.5MB/s

Next comes Packet Efficiency based on Transaction Layer Packet (TLP) overhead bound by the max TLP size between links :

MPS (Bytes) Calculation Packet Efficiency (%)
128 128 / (128 + 20) = 86 86
256 256 / (256 + 20) = 92 92
512 512 / (512 + 20) = 96 96
1024 1024 / (1024 + 20) = 98 98

The lscpi command will show you the "MaxPayload" size of the specific ports:

lspci -vvv
00:06.0 System peripheral: Cavium, Inc. THUNDERX GPIO Controller (rev 02)
        Subsystem: Cavium, Inc. THUNDERX GPIO Controller
        Device tree node: /sys/firmware/devicetree/base/soc@0/pci@848000000000/gpio0@6,0
        Flags: bus master, fast devsel, latency 0, NUMA node 0
        Region 0: Memory at 8430a0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M]
        Region 4: Memory at 8430e0000000 (32-bit, non-prefetchable) [disabled] [enhanced] [size=2M]
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
  • MaxPayload under DevCap indicates what the device is capable of (up to 128B payloads here)
  • MaxPayload under DevCtl indicates what the device is configured for (128B payloads here)

Taking into account lane encoding and 128B payloads, the theoretical max per lane would be:

gen transfer rate (MT/s) encoding TLP rate
1 2500 8B/10B 80% = 250MB/s 86% 215MB/s
2 8000 8B/10B 80% = 500MB/s 86% 430MB/s
3 8000 128B/130B 98.75% = 987.5MB/s 86% 849.25MB/s

References:

Linux PCI Debugging

PCI configuration registers can be used to debug various PCI bus issues.

The easiest way to access these registers is via the Linux lspci command with the 'very verbose' flag (-vv) which will decode and display the various PCI config space registers. Note that access to some parts of the PCI configuration space is restricted to root permissions on many operating systems - if this is the case you will see certain data flagged as 'access denied'.

The various registers define bits that are either set (indicated with a '+') or unset (indicated with a '-'). These bits typically have attributes of 'RW1C' meaning you can read and write them and need to write a '1' to clear them. Because these are status bits, if you wanted to 'count' the occurrences of them you would need to write some software that detected the bits getting set, incremented counters, and cleared them over time.

The 'Device Status Register' (DevSta) shows at a high level if there have been correctable errors detected (CorrErr), non-fatal errors detected (UncorrErr), fata errors detected (FataErr), unsupported requests detected (UnsuppReq), if the device requires auxillary power (AuxPwr), and if there are transactions pending (non posted requests that have not been completed).

If you want to delve deeper into types of errors see PCI Advanced Error Reporting below.

References:

PCI Advanced Error Reporting (AER)

Most modern PCI devices support 'Advanced Error Reporting' (AER). For these devices a lspci -vv as root will show additional registers similar to the ones described above:

  • UESta - Uncorrectable Error Status
  • UEMsk - Uncorrectable Error Mask
  • UESvrt - Uncorrectable Error Severity
  • CESta - Correctable Error Status
  • CEMsk - Correctable Error Mask
  • AERCap - Advanced Error Reporting Capabilities

For specifics on what the meaning of the bits in these registers are see the PCI Express Base Specification Revision 3.0]

Examples

Here are some examples:

  • Show all Atheros/QCA radios on the PCI bus (vendor 168c):
    $ lspci -n | grep 168c
    0001:20:00.0 0280: 168c:0046
    
  • Very Verbose listing of a specific device:
    $ sudo lspci -s 1:20:00 -vv
    0001:20:00.0 Network controller: Qualcomm Atheros Device 0046
            Subsystem: Qualcomm Atheros Device cafe
            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
            Latency: 0, Cache Line Size: 32 bytes
            Interrupt: pin A routed to IRQ 191
            Region 0: Memory at 881010000000 (64-bit, non-prefetchable) [size=2M]
            Capabilities: [40] Power Management version 3
                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
            Capabilities: [50] MSI: Enable+ Count=1/32 Maskable+ 64bit+
                    Address: 0000801000030040  Data: 0000
                    Masking: fffffffe  Pending: 00000000
            Capabilities: [70] Express (v2) Endpoint, MSI 00
                    DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                            RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                            MaxPayload 256 bytes, MaxReadReq 512 bytes
                    DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                    LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM not supported, Exit Latency L0s <4us, L1 <64us
                            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                    LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                    DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
                    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                    LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                             Compliance De-emphasis: -6dB
                    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
            Capabilities: [100 v2] Advanced Error Reporting
                    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                    AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
            Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
            Capabilities: [158 v1] Latency Tolerance Reporting
                    Max snoop latency: 0ns
                    Max no snoop latency: 0ns
            Capabilities: [160 v1] L1 PM Substates
                    L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1_PM_Substates-
            Kernel driver in use: ath10k_pci
            Kernel modules: ath10k_pci
    
  • Looking at general device status (DevSta) for a specific device:
    $ sudo lspci -s 1:20:00 -vv | grep DevSta
                    DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
    
    • Device Status Register above shows only AuxPwr bit is set (this device requires auxiliary power)
  • Looking at AER registers:
    $ sudo lspci -s 1:20:00 -vv | grep -e "UESta\|UEMsk\|UESvrt\|CESta\|CEMsk\|AERCap"
                    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                    UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                    AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
    
    • The Uncorrectable Error Status (UESta) reports error status of individual uncorrectable error sources (no bits are set above):
      • Data Link Protocol Error (DLP)
      • Surprise Down Error (SDES)
      • Poisoned TLP (TLP)
      • Flow Control Protocol Error (FCP)
      • Completion Timeout (CmpltTO)
      • Completer Abort (CmpltAbrt)
      • Unexpected Completion (UnxCmplt)
      • Receiver Overflow (RxOF)
      • Malformed TLP (MalfTLP)
      • ECRC Error (ECRC)
      • Unsupported Request Error (UnsupReq)
      • ACS Violation (ACSViol)
    • The Uncorrectable Error Mask (UEMsk) controls reporting of individual errors by the device to the PCIe root complex. A masked error (bit set) is not recorded or reported. Above shows no errors are being masked)
    • The Uncorrectable Severity controls whether an individual error is reported as a Non-fatal (clear) or Fatal error (set).
    • The Correctable Error Status reports error status of individual correctable error sources: (no bits are set above)
      • Receiver Error (RXErr)
      • Bad TLP status (BadTLP)
      • Bad DLLP status (BadDLLP)
      • REPLAY_NUM Rollover status (Rollover)
      • Replay Timer Timeout status (Timeout)
      • Advisory Non-Fatal Error (NonFatalIErr)
    • The Correctable Erro Mask (CEMsk) controls reporting of individual errors by the device to the PCIe root complex. A masked error (bit set) is not reported to the RC. Above shows that Advisory Non-Fatal Errors are being masked - this bit is set by default to enable compatibility with software that does not comprehend Role-Based error reporting.
    • The Advanced Error Capabilities and Control Register (AERCap) enables various capabilities (The above indicates the device capable of generating ECRC errors but they are not enabled):
      • First Error Pointer identifies the bit position of the first error reported in the Uncorrectable Error Status register
      • ECRC Generation Capable (GenCap) indicates if set that the function is capable of generating ECRC
      • ECRC Generation Enable (GenEn) indicates if ECRC generation is enabled (set)
      • ECRC Check Capable (ChkCap) indicates if set that the function is capable of checking ECRC
      • ECRC Check Enable (ChkEn) indicates if ECRC checking is enabled

Note that by default the Linux kernel will not alter ECRC Generation / Check and considers this configured by boot firmware. This can be overridden by enabling CONFIG_PCIE_ECRC in the kernel and passing the kernel cmdline 'ecrc=0' to force disable or 'ecrc=1' to force enable

PCIe Enumeration

PCIe enumeration (scanning of the devices on the bus) occurs during Linux kernel init time. While MiniPCIe and M.2 sockets do not support hotplug from an electrical standpoint you can get Linux to re-scan a bus which may be helpful for example if you have a device that needs to be programmed with firmware over a side-channel before it behaves like a PCIe endpoint (ie FPGA). In this case you can rescan the bus via sysfs.

Example:

  • remove a device from the bus:
    root@focal-venice:~# lspci -n
    00:00.0 0604: 16c3:abcd (rev 01)
    01:00.0 0280: 168c:003c
    root@focal-venice:~# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
    root@focal-venice:~# lspci -n
    00:00.0 0604: 16c3:abcd (rev 01)
    
  • re-scan the bus:
    root@focal-venice:~# echo 1 > /sys/bus/pci/rescan
    [   78.881014] pci 0000:01:00.0: [168c:003c] type 00 class 0x028000
    [   78.887205] pci 0000:01:00.0: reg 0x10: [mem 0x18000000-0x181fffff 64bit]
    [   78.894245] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
    [   78.901443] pci 0000:01:00.0: supports D1 D2
    [   78.908375] pci 0000:01:00.0: BAR 0: assigned [mem 0x18000000-0x181fffff 64bit]
    [   78.915804] pci 0000:01:00.0: BAR 6: assigned [mem 0x18300000-0x1830ffff pref]
    [   78.925394] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
    [   79.090892] ath10k_pci 0000:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
    [   79.100172] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
    [   79.108490] ath10k_pci 0000:01:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258
    [   79.162062] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
    [   80.390818] ath10k_pci 0000:01:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
    [   80.487289] ath: EEPROM regdomain: 0x0
    [   80.491106] ath: EEPROM indicates default country code should be used
    [   80.497584] ath: doing EEPROM country->regdmn map search
    [   80.502926] ath: country maps to regdmn code: 0x3a
    [   80.507742] ath: Country alpha2 being used: US
    [   80.512231] ath: Regpair used: 0x3a
    root@focal-venice:~# lspci -n
    00:00.0 0604: 16c3:abcd (rev 01)
    01:00.0 0280: 168c:003c
    

Note that in some cases if the PCIe link breaks to the host controller such as an IMX8M without a switch a re-scan is not possible.

Mini-PCIe Mechanical Specification

Gateworks SBCs adhere to the PCI Express Mini Card standard which uses a 52-pin edge connector. The majority of boards support full size cards which are 30mm x 50.95mm in size and 1.0mm thick. The Newport GW6903 supports half size cards which are 30mm x 26.8mm and 1.0mm thick. See the boards individual manuals for which type of signaling (PCIe, USB, SIM) is supported for each specific connector.

The below mechanical drawing is for the standard Mini-PCIe connector used on the Ventana/Newport/Venice boards. Note that other heights are available as special order (100 piece MOQ). Contact sales for available options. http://trac.gateworks.com/raw-attachment/wiki/minipciexpressmodules/minipcieconn.png

Mini-PCIe Screws

The hold down screws for the Mini-PCIe cards are pre-loaded into the standoffs on the board. If you need additional screws they can be purchased at McMaster? Carr.