Changes between Initial Version and Version 1 of PCI


Ignore:
Timestamp:
06/01/2018 07:34:50 PM (6 years ago)
Author:
Tim Harvey
Comment:

initial page

Legend:

Unmodified
Added
Removed
Modified
  • PCI

    v1 v1  
     1[[PageOutline]]
     2
     3= PCI
     4Peripheral Component Interconnect (PCI) is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus and Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space.
     5
     6Bus mastering refers to the concept that PCI devices can directly access a processors memory bus independent of the processor similar to a Direct Memory Access (DMA) controller.
     7
     8PCI History:
     9* PCI 1.9 1992 Original issue
     10* PCI 2.0 1993 Incorporated connector and add-in card specification
     11* PCI 2.1 1995 Incorporated clarifications and added 66 MHz
     12* PCI 2.2 1998 Added Mini PCI, Incorporated ECNs and improved readability
     13* PCI 2.3 2002 Incorporated ECNs, errata, and deleted 5 volt only keyed add-in cards
     14* PCI 3.0 2004 Removed support for 5.0 volt keyed system board connector
     15* PCI Express 2004
     16
     17Conventional PCI had 4 shared level-triggered interrupts and uses a paralle  bus architecture where the PCI host and all devices share a common set of address, data and control lines.
     18
     19References:
     20 * https://en.wikipedia.org/wiki/Conventional_PCI
     21
     22
     23= Mini PCI
     24Mini PCI was added to PCI version 2.2 and differs from Conventional PCI in the following ways:
     25* 32bit 33Mhz
     26* 3.3V only; 5V limited to 100mA
     27* three form factors:
     28 - Type I card uses a 100pin stacking connector
     29 - Type II card uses a 100pin stacking connector and accomodates a larger size
     30 - Type III card uses a 124pin edge connector
     31
     32Older generation Gateworks products such as Avila, Cambria, and some Laguna product families support the Mini PCI Type III cards.
     33
     34
     35= PCI Express (PCIe)
     36While the original PCI bus, now referred to as 'Conventional PCI' was a parallel bus with shared address/data in 2004 The PCI Express (PCIe) specification was released which defined a serialized version of PCI which is commonplace today.
     37
     38PCI Express is based on a point-to-point topology with separate serial links connecting every device to the host, also known as the root complex (RC). Links may contain from one to 32 lanes (1x, 2x, 4x, 12x, 16x, 32x) with each lane being its own differential pair. PCI Express interrupts are embedded within the serial data.
     39
     40References:
     41* https://en.wikipedia.org/wiki/PCI_Express
     42
     43
     44= PCI Express Mini Card (also known as 'Mini PCIe', 'mPCIe' or 'PEM')
     45The PCI Express Mini Card specification is based on PCI Express with the following differences:
     46* uses a 52-pin edge connector with 2 rows of pins
     47* incorporates both 1x (1 lane) PCI Express, USB 2.0, and SIM connectivity on the connector
     48
     49Modern generation Gateworks products such as the Laguna GW2391, Ventana, and Newport product families support Mini PCIe cards.
     50
     51
     52[=#linux-pci-debug]
     53= Linux PCI Debugging
     54PCI configuration registers can be used to debug various PCI bus issues.
     55
     56The easiest way to access these registers is via the Linux {{{lspci}}} command with the 'very verbose' flag (-vv) which will decode and display the various PCI config space registers. Note that access to some parts of the PCI configuration space is restricted to root permissions on many operating systems - if this is the case you will see certain data flagged as 'access denied'.
     57
     58The various registers define bits that are either set (indicated with a '+') or unset (indicated with a '-'). These bits typically have attributes of 'RW1C' meaning you can read and write them and need to write a '1' to clear them. Because these are status bits, if you wanted to 'count' the occurrences of them you would need to write some software that detected the bits getting set, incremented counters, and cleared them over time.
     59
     60The 'Device Status Register' (!DevSta) shows at a high level if there have been correctable errors detected (!CorrErr), non-fatal errors detected (!UncorrErr), fata errors detected (!FataErr), unsupported requests detected (!UnsuppReq), if the device requires auxillary power (!AuxPwr), and if there are transactions pending (non posted requests that have not been completed).
     61
     62If you want to delve deeper into types of errors see [#aer PCI Advanced Error Reporting] below.
     63
     64References:
     65- [https://www.kernel.org/doc/ols/2007/ols2007v2-pages-297-304.pdf Enable PCI Express Advanced Error Reporting in the Kernel]
     66- [http://composter.com.ua/documents/PCI_Express_Base_Specification_Revision_3.0.pdf PCI Express Base Specification Revision 3.0]
     67- [https://intrepid.warped.com/~scotte/OldBlogEntries/web/index-5.html PCI Debugging 101]
     68
     69
     70[=#aer]
     71== PCI Advanced Error Reporting (AER)
     72Most modern PCI devices support 'Advanced Error Reporting' (AER). For these devices a {{{lspci -vv}}} as root will show additional registers similar to the ones described above:
     73* UESta - Uncorrectable Error Status
     74* UEMsk - Uncorrectable Error Mask
     75* UESvrt - Uncorrectable Error Severity
     76* CESta - Correctable Error Status
     77* CEMsk - Correctable Error Mask
     78* AERCap - Advanced Error Reporting Capabilities
     79
     80For specifics on what the meaning of the bits in these registers are see the PCI Express Base Specification Revision 3.0]
     81
     82== Examples
     83
     84Here are some examples:
     85 * Show all Atheros/QCA radios on the PCI bus (vendor 168c):
     86{{{#!bash
     87$ lspci -n | grep 168c
     880001:20:00.0 0280: 168c:0046
     89}}}
     90 * Very Verbose listing of a specific device:
     91{{{#!bash
     92$ sudo lspci -s 1:20:00 -vv
     930001:20:00.0 Network controller: Qualcomm Atheros Device 0046
     94        Subsystem: Qualcomm Atheros Device cafe
     95        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
     96        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
     97        Latency: 0, Cache Line Size: 32 bytes
     98        Interrupt: pin A routed to IRQ 191
     99        Region 0: Memory at 881010000000 (64-bit, non-prefetchable) [size=2M]
     100        Capabilities: [40] Power Management version 3
     101                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
     102                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
     103        Capabilities: [50] MSI: Enable+ Count=1/32 Maskable+ 64bit+
     104                Address: 0000801000030040  Data: 0000
     105                Masking: fffffffe  Pending: 00000000
     106        Capabilities: [70] Express (v2) Endpoint, MSI 00
     107                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
     108                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
     109                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
     110                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
     111                        MaxPayload 256 bytes, MaxReadReq 512 bytes
     112                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
     113                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM not supported, Exit Latency L0s <4us, L1 <64us
     114                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
     115                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
     116                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
     117                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
     118                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
     119                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
     120                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
     121                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
     122                         Compliance De-emphasis: -6dB
     123                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
     124                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
     125        Capabilities: [100 v2] Advanced Error Reporting
     126                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
     127                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
     128                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
     129                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
     130                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
     131                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
     132        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
     133        Capabilities: [158 v1] Latency Tolerance Reporting
     134                Max snoop latency: 0ns
     135                Max no snoop latency: 0ns
     136        Capabilities: [160 v1] L1 PM Substates
     137                L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1_PM_Substates-
     138        Kernel driver in use: ath10k_pci
     139        Kernel modules: ath10k_pci
     140}}}
     141 * Looking at general device status (!DevSta) for a specific device:
     142{{{#!bash
     143$ sudo lspci -s 1:20:00 -vv | grep DevSta
     144                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
     145}}}
     146  - Device Status Register above shows only !AuxPwr bit is set (this device requires auxiliary power)
     147 * Looking at AER registers:
     148{{{#!bash
     149$ sudo lspci -s 1:20:00 -vv | grep -e "UESta\|UEMsk\|UESvrt\|CESta\|CEMsk\|AERCap"
     150                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
     151                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
     152                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
     153                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
     154                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
     155                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
     156}}}
     157  - The Uncorrectable Error Status (UESta) reports error status of individual uncorrectable error sources (no bits are set above):
     158   * Data Link Protocol Error (DLP)
     159   * Surprise Down Error (SDES)
     160   * Poisoned TLP (TLP)
     161   * Flow Control Protocol Error (FCP)
     162   * Completion Timeout (CmpltTO)
     163   * Completer Abort (!CmpltAbrt)
     164   * Unexpected Completion (!UnxCmplt)
     165   * Receiver Overflow (RxOF)
     166   * Malformed TLP (MalfTLP)
     167   * ECRC Error (ECRC)
     168   * Unsupported Request Error (!UnsupReq)
     169   * ACS Violation (ACSViol)
     170  - The Uncorrectable Error Mask (UEMsk) controls reporting of individual errors by the device to the PCIe root complex. A masked error (bit set) is not recorded or reported. Above shows no errors are being masked)
     171  - The Uncorrectable Severity controls whether an individual error is reported as a Non-fatal (clear) or Fatal error (set).
     172  - The Correctable Error Status reports error status of individual correctable error sources: (no bits are set above)
     173   * Receiver Error (RXErr)
     174   * Bad TLP status (BadTLP)
     175   * Bad DLLP status (BadDLLP)
     176   * REPLAY_NUM Rollover status (Rollover)
     177   * Replay Timer Timeout status (Timeout)
     178   * Advisory Non-Fatal Error (NonFatalIErr)
     179  - The Correctable Erro Mask (CEMsk) controls reporting of individual errors by the device to the PCIe root complex. A masked error (bit set) is not reported to the RC. Above shows that Advisory Non-Fatal Errors are being masked - this bit is set by default to enable compatibility with software that does not comprehend Role-Based error reporting.
     180  - The Advanced Error Capabilities and Control Register (AERCap) enables various capabilities (The above indicates the device capable of generating ECRC errors but they are not enabled):
     181   * First Error Pointer identifies the bit position of the first error reported in the Uncorrectable Error Status register
     182   * ECRC Generation Capable (!GenCap) indicates if set that the function is capable of generating ECRC
     183   * ECRC Generation Enable (!GenEn) indicates if ECRC generation is enabled (set)
     184   * ECRC Check Capable (!ChkCap) indicates if set that the function is capable of checking ECRC
     185   * ECRC Check Enable (!ChkEn) indicates if ECRC checking is enabled
     186
     187Note that by default the Linux kernel will not alter ECRC Generation / Check and considers this configured by boot firmware. This can be overridden by enabling CONFIG_PCIE_ECRC in the kernel and passing the kernel cmdline 'ecrc=0' to force disable or 'ecrc=1' to force enable