| 1 | [[PageOutline]] |
| 2 | |
| 3 | = PCI |
| 4 | Peripheral Component Interconnect (PCI) is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus and Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space. |
| 5 | |
| 6 | Bus mastering refers to the concept that PCI devices can directly access a processors memory bus independent of the processor similar to a Direct Memory Access (DMA) controller. |
| 7 | |
| 8 | PCI History: |
| 9 | * PCI 1.9 1992 Original issue |
| 10 | * PCI 2.0 1993 Incorporated connector and add-in card specification |
| 11 | * PCI 2.1 1995 Incorporated clarifications and added 66 MHz |
| 12 | * PCI 2.2 1998 Added Mini PCI, Incorporated ECNs and improved readability |
| 13 | * PCI 2.3 2002 Incorporated ECNs, errata, and deleted 5 volt only keyed add-in cards |
| 14 | * PCI 3.0 2004 Removed support for 5.0 volt keyed system board connector |
| 15 | * PCI Express 2004 |
| 16 | |
| 17 | Conventional PCI had 4 shared level-triggered interrupts and uses a paralle bus architecture where the PCI host and all devices share a common set of address, data and control lines. |
| 18 | |
| 19 | References: |
| 20 | * https://en.wikipedia.org/wiki/Conventional_PCI |
| 21 | |
| 22 | |
| 23 | = Mini PCI |
| 24 | Mini PCI was added to PCI version 2.2 and differs from Conventional PCI in the following ways: |
| 25 | * 32bit 33Mhz |
| 26 | * 3.3V only; 5V limited to 100mA |
| 27 | * three form factors: |
| 28 | - Type I card uses a 100pin stacking connector |
| 29 | - Type II card uses a 100pin stacking connector and accomodates a larger size |
| 30 | - Type III card uses a 124pin edge connector |
| 31 | |
| 32 | Older generation Gateworks products such as Avila, Cambria, and some Laguna product families support the Mini PCI Type III cards. |
| 33 | |
| 34 | |
| 35 | = PCI Express (PCIe) |
| 36 | While the original PCI bus, now referred to as 'Conventional PCI' was a parallel bus with shared address/data in 2004 The PCI Express (PCIe) specification was released which defined a serialized version of PCI which is commonplace today. |
| 37 | |
| 38 | PCI Express is based on a point-to-point topology with separate serial links connecting every device to the host, also known as the root complex (RC). Links may contain from one to 32 lanes (1x, 2x, 4x, 12x, 16x, 32x) with each lane being its own differential pair. PCI Express interrupts are embedded within the serial data. |
| 39 | |
| 40 | References: |
| 41 | * https://en.wikipedia.org/wiki/PCI_Express |
| 42 | |
| 43 | |
| 44 | = PCI Express Mini Card (also known as 'Mini PCIe', 'mPCIe' or 'PEM') |
| 45 | The PCI Express Mini Card specification is based on PCI Express with the following differences: |
| 46 | * uses a 52-pin edge connector with 2 rows of pins |
| 47 | * incorporates both 1x (1 lane) PCI Express, USB 2.0, and SIM connectivity on the connector |
| 48 | |
| 49 | Modern generation Gateworks products such as the Laguna GW2391, Ventana, and Newport product families support Mini PCIe cards. |
| 50 | |
| 51 | |
| 52 | [=#linux-pci-debug] |
| 53 | = Linux PCI Debugging |
| 54 | PCI configuration registers can be used to debug various PCI bus issues. |
| 55 | |
| 56 | The easiest way to access these registers is via the Linux {{{lspci}}} command with the 'very verbose' flag (-vv) which will decode and display the various PCI config space registers. Note that access to some parts of the PCI configuration space is restricted to root permissions on many operating systems - if this is the case you will see certain data flagged as 'access denied'. |
| 57 | |
| 58 | The various registers define bits that are either set (indicated with a '+') or unset (indicated with a '-'). These bits typically have attributes of 'RW1C' meaning you can read and write them and need to write a '1' to clear them. Because these are status bits, if you wanted to 'count' the occurrences of them you would need to write some software that detected the bits getting set, incremented counters, and cleared them over time. |
| 59 | |
| 60 | The 'Device Status Register' (!DevSta) shows at a high level if there have been correctable errors detected (!CorrErr), non-fatal errors detected (!UncorrErr), fata errors detected (!FataErr), unsupported requests detected (!UnsuppReq), if the device requires auxillary power (!AuxPwr), and if there are transactions pending (non posted requests that have not been completed). |
| 61 | |
| 62 | If you want to delve deeper into types of errors see [#aer PCI Advanced Error Reporting] below. |
| 63 | |
| 64 | References: |
| 65 | - [https://www.kernel.org/doc/ols/2007/ols2007v2-pages-297-304.pdf Enable PCI Express Advanced Error Reporting in the Kernel] |
| 66 | - [http://composter.com.ua/documents/PCI_Express_Base_Specification_Revision_3.0.pdf PCI Express Base Specification Revision 3.0] |
| 67 | - [https://intrepid.warped.com/~scotte/OldBlogEntries/web/index-5.html PCI Debugging 101] |
| 68 | |
| 69 | |
| 70 | [=#aer] |
| 71 | == PCI Advanced Error Reporting (AER) |
| 72 | Most modern PCI devices support 'Advanced Error Reporting' (AER). For these devices a {{{lspci -vv}}} as root will show additional registers similar to the ones described above: |
| 73 | * UESta - Uncorrectable Error Status |
| 74 | * UEMsk - Uncorrectable Error Mask |
| 75 | * UESvrt - Uncorrectable Error Severity |
| 76 | * CESta - Correctable Error Status |
| 77 | * CEMsk - Correctable Error Mask |
| 78 | * AERCap - Advanced Error Reporting Capabilities |
| 79 | |
| 80 | For specifics on what the meaning of the bits in these registers are see the PCI Express Base Specification Revision 3.0] |
| 81 | |
| 82 | == Examples |
| 83 | |
| 84 | Here are some examples: |
| 85 | * Show all Atheros/QCA radios on the PCI bus (vendor 168c): |
| 86 | {{{#!bash |
| 87 | $ lspci -n | grep 168c |
| 88 | 0001:20:00.0 0280: 168c:0046 |
| 89 | }}} |
| 90 | * Very Verbose listing of a specific device: |
| 91 | {{{#!bash |
| 92 | $ sudo lspci -s 1:20:00 -vv |
| 93 | 0001:20:00.0 Network controller: Qualcomm Atheros Device 0046 |
| 94 | Subsystem: Qualcomm Atheros Device cafe |
| 95 | Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ |
| 96 | Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- |
| 97 | Latency: 0, Cache Line Size: 32 bytes |
| 98 | Interrupt: pin A routed to IRQ 191 |
| 99 | Region 0: Memory at 881010000000 (64-bit, non-prefetchable) [size=2M] |
| 100 | Capabilities: [40] Power Management version 3 |
| 101 | Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) |
| 102 | Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- |
| 103 | Capabilities: [50] MSI: Enable+ Count=1/32 Maskable+ 64bit+ |
| 104 | Address: 0000801000030040 Data: 0000 |
| 105 | Masking: fffffffe Pending: 00000000 |
| 106 | Capabilities: [70] Express (v2) Endpoint, MSI 00 |
| 107 | DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited |
| 108 | ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- |
| 109 | DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- |
| 110 | RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- |
| 111 | MaxPayload 256 bytes, MaxReadReq 512 bytes |
| 112 | DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- |
| 113 | LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM not supported, Exit Latency L0s <4us, L1 <64us |
| 114 | ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ |
| 115 | LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- |
| 116 | ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- |
| 117 | LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- |
| 118 | DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported |
| 119 | DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled |
| 120 | LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- |
| 121 | Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- |
| 122 | Compliance De-emphasis: -6dB |
| 123 | LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- |
| 124 | EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- |
| 125 | Capabilities: [100 v2] Advanced Error Reporting |
| 126 | UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- |
| 127 | UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- |
| 128 | UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- |
| 129 | CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- |
| 130 | CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ |
| 131 | AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- |
| 132 | Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00 |
| 133 | Capabilities: [158 v1] Latency Tolerance Reporting |
| 134 | Max snoop latency: 0ns |
| 135 | Max no snoop latency: 0ns |
| 136 | Capabilities: [160 v1] L1 PM Substates |
| 137 | L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1_PM_Substates- |
| 138 | Kernel driver in use: ath10k_pci |
| 139 | Kernel modules: ath10k_pci |
| 140 | }}} |
| 141 | * Looking at general device status (!DevSta) for a specific device: |
| 142 | {{{#!bash |
| 143 | $ sudo lspci -s 1:20:00 -vv | grep DevSta |
| 144 | DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- |
| 145 | }}} |
| 146 | - Device Status Register above shows only !AuxPwr bit is set (this device requires auxiliary power) |
| 147 | * Looking at AER registers: |
| 148 | {{{#!bash |
| 149 | $ sudo lspci -s 1:20:00 -vv | grep -e "UESta\|UEMsk\|UESvrt\|CESta\|CEMsk\|AERCap" |
| 150 | UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- |
| 151 | UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- |
| 152 | UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- |
| 153 | CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- |
| 154 | CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ |
| 155 | AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- |
| 156 | }}} |
| 157 | - The Uncorrectable Error Status (UESta) reports error status of individual uncorrectable error sources (no bits are set above): |
| 158 | * Data Link Protocol Error (DLP) |
| 159 | * Surprise Down Error (SDES) |
| 160 | * Poisoned TLP (TLP) |
| 161 | * Flow Control Protocol Error (FCP) |
| 162 | * Completion Timeout (CmpltTO) |
| 163 | * Completer Abort (!CmpltAbrt) |
| 164 | * Unexpected Completion (!UnxCmplt) |
| 165 | * Receiver Overflow (RxOF) |
| 166 | * Malformed TLP (MalfTLP) |
| 167 | * ECRC Error (ECRC) |
| 168 | * Unsupported Request Error (!UnsupReq) |
| 169 | * ACS Violation (ACSViol) |
| 170 | - The Uncorrectable Error Mask (UEMsk) controls reporting of individual errors by the device to the PCIe root complex. A masked error (bit set) is not recorded or reported. Above shows no errors are being masked) |
| 171 | - The Uncorrectable Severity controls whether an individual error is reported as a Non-fatal (clear) or Fatal error (set). |
| 172 | - The Correctable Error Status reports error status of individual correctable error sources: (no bits are set above) |
| 173 | * Receiver Error (RXErr) |
| 174 | * Bad TLP status (BadTLP) |
| 175 | * Bad DLLP status (BadDLLP) |
| 176 | * REPLAY_NUM Rollover status (Rollover) |
| 177 | * Replay Timer Timeout status (Timeout) |
| 178 | * Advisory Non-Fatal Error (NonFatalIErr) |
| 179 | - The Correctable Erro Mask (CEMsk) controls reporting of individual errors by the device to the PCIe root complex. A masked error (bit set) is not reported to the RC. Above shows that Advisory Non-Fatal Errors are being masked - this bit is set by default to enable compatibility with software that does not comprehend Role-Based error reporting. |
| 180 | - The Advanced Error Capabilities and Control Register (AERCap) enables various capabilities (The above indicates the device capable of generating ECRC errors but they are not enabled): |
| 181 | * First Error Pointer identifies the bit position of the first error reported in the Uncorrectable Error Status register |
| 182 | * ECRC Generation Capable (!GenCap) indicates if set that the function is capable of generating ECRC |
| 183 | * ECRC Generation Enable (!GenEn) indicates if ECRC generation is enabled (set) |
| 184 | * ECRC Check Capable (!ChkCap) indicates if set that the function is capable of checking ECRC |
| 185 | * ECRC Check Enable (!ChkEn) indicates if ECRC checking is enabled |
| 186 | |
| 187 | Note that by default the Linux kernel will not alter ECRC Generation / Check and considers this configured by boot firmware. This can be overridden by enabling CONFIG_PCIE_ECRC in the kernel and passing the kernel cmdline 'ecrc=0' to force disable or 'ecrc=1' to force enable |