wiki:ventana/PCIe

Version 1 (modified by trac, 6 months ago) (diff)

--

Ventana PCI/PCIe Support

The i.MX6 CPU has an internal address translation unit (iATU) that connects the i.MX6 PCI host controller to the memory bus. This iATU window size imposes a resource limit which can ultimately limit the number of PCI devices you can have on the bus. The iATU window is 16MB which can technically be broken up in a variety of ways but by default is used as:

  • 512KB config space
  • 64KB io space
  • 15MB mem space

PCI Devices can request 1 or more io regions, and 1 or more mem regions however when devices are behind a bridge (which they will be on a GW52xx, GW53xx, and GW54xx) the various resource requests must go through a PCI bridge which imposes a 1MB granularity for mem regions. On the GW52xx, GW53xx, GW54xx, each PCIe socket is behind a bridge and thus has this 1MB granularity. The upstream port on a PCIe switch takes a mem resource itself, which ends up leaving 14 more 1MB windows available.

The outcome is complex and is likely best explained with a series of examples of what is possible. The following examples use various hardware combinations of:

  • baseboards:
    • GW54xx - 2 mem windows used by baseboard (1 for PCIe switch, 1 for eth1 GigE)
    • GW53xx - 2 mem windows used by baseboard (1 for PCIe switch, 1 for eth1 GigE)
    • GW52xx - 1 mem windows used by baseboard (1 for PCIe switch)
    • GW51xx - 0 mem windows used by baseboard
  • Expansion Mezzanines:
    • GW16081 PCIe expansion mezz - 1 mem window used by PCIe bridge upstream port
    • GW16082 PCI expansion mezz - 1 mem window used by PCIe to PCI bridge upstream port
  • Various WiFi Radios:
    • WLE300 802.11n 3x3 MIMO radio - 2 mem windows required
    • SR71e 802.11n 2x2 MIMO radio - 1 mem window required
    • DNMA H-5 802.11abg radio - 1 mem window required
    • WLE900 802.11ac 3x3 MIMO radio - 2 mem windows (1 2mb window)
      • Note the ath10k driver/firmware may also request coherent pool memory (coherent memory from the kernel's atomic memory pool) and may require you to increase the kernel atomic coherent memory pool via the 'coherent_pool' kernel command line if you encounter allocation errors using multiple radios.
      •  setenv extra 'coherent_pool=4M'
        
      • Depending on the card(s) and mode(s) you're using, this value can change, so 4M is a very safe bet (considering it's currently set to 256k by default). To verify that the kernel got this new setting, just do a 'cat /proc/cmdline' and you should see the 'coherent_pool=4M' sitting there.

Various Examples:

  • GW54xx + 6x SR71e (slots fully loaded - 6 radios) (with 6 unused 1MB resource windows remaining)
  • GW54xx + 6x WLE300 (slots fully loaded - 6 radios)
  • GW54xx + 6x WLE900 (slots fully loaded - 6 radios) (with 'coherent_pool=4m' kernel cmdline argument)
  • GW54xx + GW16081 + 12x SR71e (slots fully loaded - 12 radios)
  • GW54xx + GW16081 + 5x WLE300 + 1x SR71e (6 radios)
  • GW54xx + GW16082 + 5x WLE300 + 4x DNMA H-5 (slots fully loaded - 9 radios)
  • GW54xx + GW16082 + 5x SR71e + 4x DNMA H-5 (slots fully loaded - 9 radios)
  • GW54xx + GW16081 + GW16082 + 9x SR71e + 4xDNMA H-5 (13 radios)

Other configurations are possible if someone for example wants to spread out some PCIe devices across a couple of GW16081 mezzanines to allow many cellular radios (which USE USB, not PCI). The basic rules can be summarized as follows:

  • i.MX6 has 14 available memory resources
  • most atheros radios seem to require 1 (ie SR71e, Option GTM671WFS), but some (ie WLE300) requires 2
  • each PCIe switch requires 1 (ie GW54xx/GW53xx/GW52xx has one on-board, add another if you have a GW16081 mezz)
  • 2nd onboard eth1 GigE requires 1
  • the PCIe-to-PCI bridge (GW16082 mezz) requires 1 but has the unique case that everything behind it fits into 1 resource regardless of radio.

Notes:

  • The above examples refer to the PCIe host controller driver used in the OpenWrt (3.10+) kernel. The 3.0.35 kernel used for our Yocto and Android BSPs reserve a 14MB mem resource window which leaves 1 less region affecting the examples above.

Memory Calculation Example

To determine how many memory resources a wifi or other card uses, consult the following:

The simplest method to document would be lspci.

For example:

From lspci, we can see the card we want is at address 07:00.0 (Atheros AR93xx)

root@OpenWrt:/# lspci
00:00.0 PCI bridge: Device 16c3:abcd (rev 01)
01:00.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
01:00.1 System peripheral: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
02:01.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
02:04.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
02:05.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
02:06.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
02:07.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
02:08.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
02:09.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
07:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 PCI-E Gigabit Ethernet Controller

Now run the dmesg command and grep for that device 07:00.0 Note the two memory regions below, reg10 and reg 30. Thus this device uses 2 memory resources.

root@OpenWrt:/# dmesg | grep 07:00.0
[    0.678291] pci 0000:07:00.0: [168c:0030] type 00 class 0x028000
[    0.678395] pci 0000:07:00.0: reg 10: [mem 0x00000000-0x0001ffff 64bit]
[    0.678590] pci 0000:07:00.0: reg 30: [mem 0x00000000-0x0000ffff pref]
[    0.678725] pci 0000:07:00.0: supports D1
[    0.678739] pci 0000:07:00.0: PME# supported from D0 D1 D3hot
[    0.682167] pci 0000:07:00.0: BAR 0: assigned [mem 0x01100000-0x0111ffff 64bit]
[    0.682222] pci 0000:07:00.0: BAR 6: assigned [mem 0x01400000-0x0140ffff pref]
[    8.326434] PCI: enabling device 0000:07:00.0 (0140 -> 0142)

PCIe PLX Switch - Temperature discussion

Please see this discussion here

What throughput does the PCIe lane support?

The PCIe lane is a single lane going into the switch with a maximum theoretical of 2.5Gbits/sec on the Ventana boards.

Message Signaled Interrupts (MSI)

MSI replaces traditional out-of-band interrupt assertion with an in-band messaging construct. This was introduced in PCI 2.2 and is used by PCI Express. MSI-X was introduced in PCI 3.0 and permits a device to allocate up to 2048 interrupts.

While MSI is used for PCIe at a hardware level, an additional layer of support can be provided and used by the kernel and drivers that can expand the 'legacy' PCI interrupts INTA/B/C/D to virtual software interrupts. For example, a GW54xx having 6 PCIe expansion sockets must share the 4 legacy PCI interrupts among its sockets. If MSI was used, while they would all end up firing a single hardware interrupt it would be cascaded to unique software interrupts which theoretically could be split across CPU cores. The end result of this is better CPU core separation ability via smp-affinity. Note that the IMX6 PCIe host controller driver does not implement virtual MSI interrupts in a way that allows them to be steered towards different CPU's and adding code to allow this would add additional overhead burdening the single-cpu case.

Because MSI interrupts can not be steered to different CPU's in the hardirq context there is no performance benefit of MSI and we have MSI disabled in the Gateworks Ventana kernels. Additionally we have encountered devices/drivers from time-to-time that do not work properly with MSI interrupts enabled.

Attachments (1)

Download all attachments as: .zip