wiki:ventana/encryption

Version 2 (modified by Ryan Erbstoesser, 7 years ago) ( diff )

add app note link for hab

i.MX6 Encryption

The i.MX6 Processors offer hardware encryption through Freescale's Cryptographic Accelerator and Assurance Module (CAAM, also known as SEC4). It offers the following support:

  • Security Control
  • Advanced High Assurance Boot (A-HAB) System (HAB with embedded enhancements)
  • SHA-256, 2048-bit RSA key
  • Version control mechanism
  • Warm boot
  • CSU and TZ initialization
  • IC Identification Module (IIM) and Central Security Unit (CSU)
  • CSU enhanced for the IIM
  • Configured during boot and by e-fuses
  • Determines the security level operation mode and the TZ policy
  • Tamper Detection

For the encryption, these are the HW cryptographic accelerators we have on board the i.MX6:

  • AES128, AES256
  • 3DES
  • ARC4
  • SHA1
  • SHA224
  • SHA256
  • MD-5

At a high level the Cryptographic Accelerator and Assurance Module (CAAM) is a DMA master supporting the following capabilities:

  • Secure memory feature with HW enforced access control
  • Cryptographic authentication
    • Hashing algorithms
      • MD5
      • SHA-1
      • SHA-224
      • SHA-256
    • Message authentication codes (MAC)
      • HMAC-all hashing algorithms
      • AES-CMAC
      • AES-XCBC-MAC
    • Auto padding
    • ICV checking
  • Authenticated encryption algorithms
    • AES-CCM (counter with CBC-MAC)
  • Symmetric key block ciphers
    • AES (128-bit, 192-bit or 256-bit keys)
    • DES (64-bit keys, including key parity)
    • 3DES (128-bit or 192-bit keys, including key parity)
  • Cipher modes
    • ECB, CBC, CFB, OFB for all block ciphers
    • CTR for AES
  • Symmetric key stream ciphers
  • ArcFour (alleged RC4 with 40 - 128 bit keys)
  • Random-number generation
    • Entropy is generated via an independent free running ring oscillator
    • Oscillator is off when not generating entropy; for lower-power consumption
    • NIST-compliant, pseudo random-number generator seeded using hardware generated entropy

The above features are usable via the CAAM driver which is available on our Yocto BSPs, as well as our latest OpenWrt on GitHub. In order to make use of some of these features, the Linux CryptoAPI must be used. The driver itself is integrated with the Crypto API kernel service in which the algorithms supported by CAAM can replace the native SW implementations.

References

i.MX6 Security Reference Manual

Please contact support@… to request this document.

Driver Information

The Cryptographic Accelerator and Assurance Module (CAAM) is the driver for Freescale's hardware crypto. It configures hw to operate as a DPAA component, as well as creates job ring devices. Please see here for more detail. This driver was added to Linux 4.3, but we have support for it in our Yocto 1.6, Yocto 1.7, Yocto 1.8, and OpenWrt next (our latest OpenWrt branch on GitHub).

In order to enable the CAAM driver from within the kernel, the CONFIG_CRYPTO_DEV_FSL_CAAM must be set:

  • make menuconfig
    • Kernel Cryptographic API → Hardware crypto devices → Freescale CAAM-Multicore driver backend
      • You can either build as a module via M or statically via Y

Enabling the above will select the following in the kernel config:

CONFIG_CRYPTO_HW=y
CONFIG_CRYPTO_DEV_FSL_CAAM=m
CONFIG_CRYPTO_DEV_FSL_CAAM_JR=m
CONFIG_CRYPTO_DEV_FSL_CAAM_CRYPTO_API=m
CONFIG_CRYPTO_DEV_FSL_CAAM_AHASH_API=m
CONFIG_CRYPTO_DEV_FSL_CAAM_RNG_API=m
CONFIG_CRYPTO_DEV_FSL_CAAM_RINGSIZE=9
CONFIG_CRYPTO_DEV_FSL_CAAM_INTC=n
CONFIG_CRYPTO_DEV_FSL_CAAM_DEBUG=n

When this is enabled, /proc/crypto will list out that system's cipher support and where that support comes from. For example:

root@OpenWrt:/# cat /proc/crypto
<snip>
name         : sha1
driver       : sha1-caam
module       : caamhash
priority     : 3000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 20
<snip>

We can see that the caamhash module offers the sha1 ahash function. This effectively means that any program using this hash will automatically gain hardware acceleration.

cryptodev vs. af_alg vs. ocf-linux

cryptodev, af_alg, and ocf-linux are three userspace crypto API's into the Linux kernel. While both cryptodev and af_alg use the native Linux crypto interface, ocf-linux does not. ocf-linux also conflicts with cryptodev in that they both create a /dev/crypto interface. For this reason, these two drivers cannot co-exist. Gateworks has decided to include cryptodev over ocf-linux for these reasons.

However, af_alg and cryptodev both use the native Linux crypto interface, but go about it in differing ways. According to the cryptodev site, cryptodev outperforms af_alg, mainly due to how each was created. Both are acceptable ways of interacting with the kernel and many programs default to utilizing one or the other. Programs such as openssl are able to pick the engine they can use. However, cryptodev must be built out-of-tree because it is not apart of the kernel. However, af_alg is and so no special handling must be done there.

To build cryptodev out-of-tree:

# Download cryptodev tarball from here: http://download.gna.org/cryptodev-linux/
wget http://download.gna.org/cryptodev-linux/cryptodev-linux-1.8.tar.gz
tar xvf cryptodev-linux-1.8.tar.gz
cd cryptodev-linux-1.8
# Make sure you have kernel build directory for the kernel you are compiling for and point to it via KERNEL_DIR= (if cross compiling)
KERNEL_DIR=/usr/src/psidhu/linux/linux-imx6 make
make install # Only do this if compiling on target system

Gateworks has written an example cryptodev program for the cbc(aes) cipher called gw-cryptodev-example. To get the source and compile, please follow these instructions:

git clone https://github.com/Gateworks/gateworks-sample-apps.git
cd gateworks-sample-apps/gw-cryptodev-example
# (optional) Source your env. if cross compiling. In this case, we'll use the Yocto 1.8 SDK.
. /opt/pocky/1.8/environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi # Please make sure this is the updated version with cryptodev.h.
make

To run:

root@ventana:~# ./gw-cryptodev-example
Using cbc-aes-caam driver! Accelerated through SEC4 engine.
Encrypted 'Hello, World!' to '���<�팻�m��5͎'
Decrypted '���<�팻�m��5͎' to 'Hello, World!'
Test passed!

An example of using this same cipher, but through af_alg, can be found here.

Note that the main differences between using cryptodev and af_alg are how messages are sent to the kernel. cryptodev relies on ioctl calls while af_alg relies on the kernels SOCKET family (called AF_ALG).

BSP Support

Both Yocto and the latest OpenWrt have CAAM support.

For example, adding the CAAM driver will grant the ability to directly access the hardware random number generator via /dev/hwrng. This tremendously speeds up generation of random garbage as seen below:

# Generate 50Mb of data via software
root@OpenWrt:/# time dd if=/dev/urandom of=/tmp/sw_random count=50 bs=1M
50+0 records in
50+0 records out
real    0m 17.29s
user    0m 0.00s
sys     0m 17.28s
# Now generate 50Mb of data via hardware
root@OpenWrt:/# time dd if=/dev/hwrng of=/tmp/hw_random count=50 bs=1M
50+0 records in
50+0 records out
real    0m 1.05s
user    0m 0.00s
sys     0m 1.04s

As seen above, using the hardware accelerated rng, random data with good entropy was generated almost 17x faster.

This, however, also means programs using either cryptodev or af_alg will automatically have hardware accelerated cryptography. However, some programs use their own software based algorithms for portability reasons. One such program is openssl. Note, openssl must be compiled with the following flags in order to use the cryptodev engine: -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS

Yocto

In the Yocto BSP, openssl is built with cryptodev support. Please see below for a comparison using the cryptodev engine and without:

  • Yocto 1.8 WITHOUT cryptodev (using openssl software based algorithms)
    root@ventana:~# openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 6008244 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 64 size blocks: 1608835 aes-128 cbc's in 2.99s
    Doing aes-128 cbc for 3s on 256 size blocks: 411309 aes-128 cbc's in 2.99s
    Doing aes-128 cbc for 3s on 1024 size blocks: 103187 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 8192 size blocks: 12923 aes-128 cbc's in 3.00s
    OpenSSL 1.0.2d 9 Jul 2015
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
    compiler: arm-poky-linux-gnueabi-gcc  -march=armv7-a -marm  -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN    -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128 cbc      32043.97k    34436.60k    35215.75k    35221.16k    35288.41k
    
  • Yocto 1.8 with cryptodev (using kernel hardware accelerated algorithms)
    root@ventana:~# openssl speed -evp aes-128-cbc -engine cryptodev
    engine "cryptodev" set.
    Doing aes-128-cbc for 3s on 16 size blocks: 44146 aes-128-cbc's in 0.14s
    Doing aes-128-cbc for 3s on 64 size blocks: 43561 aes-128-cbc's in 0.11s
    Doing aes-128-cbc for 3s on 256 size blocks: 39724 aes-128-cbc's in 0.13s
    Doing aes-128-cbc for 3s on 1024 size blocks: 30733 aes-128-cbc's in 0.10s
    Doing aes-128-cbc for 3s on 8192 size blocks: 9122 aes-128-cbc's in 0.01s
    OpenSSL 1.0.2d 9 Jul 2015
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
    compiler: arm-poky-linux-gnueabi-gcc  -march=armv7-a -marm  -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN    -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc       5045.26k    25344.58k    78225.72k   314705.92k  7472742.40k
    

One of the biggest advantages to using hardware encryption is how cpu is utilized. In the above two cases, we found the following to be true:

  • With cryptodev disabled: 25% usr CPU usage (one core pegged to 100%)
  • With cryptodev enabled : 16% sys CPU usage, 2% sirq
  • openssl using the hardware engine cryptodev increased the number of bytes per second processed tremendously, especially on the larger number of bytes processed

OpenWrt

Our OpenWrt 16.02 BSP added support for CAAM and cryptodev. openssl can utilize this engine like Yocto. Please see below for some results:

  • OpenWrt 16.02 WITHOUT cryptodev (using openssl software based algorithms)
    root@OpenWrt:/# openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 2890377 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 64 size blocks: 767833 aes-128 cbc's in 2.99s
    Doing aes-128 cbc for 3s on 256 size blocks: 196252 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 1024 size blocks: 49243 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 8192 size blocks: 6165 aes-128 cbc's in 3.00s
    OpenSSL 1.0.2g  1 Mar 2016
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
    compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128 cbc      15415.34k    16435.22k    16746.84k    16808.28k    16834.56k
    
  • OpenWrt 16.02 with cryptodev (using kernel hardware accelerated algorithms)
    root@OpenWrt:/# openssl speed -evp aes-128-cbc -engine cryptodev
    engine "cryptodev" set.
    Doing aes-128-cbc for 3s on 16 size blocks: 80789 aes-128-cbc's in 0.13s
    Doing aes-128-cbc for 3s on 64 size blocks: sy67854 aes-128-cbc's in 0.15s
    Doing aes-128-cbc for 3s on 256 size blocks: 63909 aes-128-cbc's in 0.21s
    Doing aes-128-cbc for 3s on 1024 size blocks: 46740 aes-128-cbc's in 0.06s
    Doing aes-128-cbc for 3s on 8192 size blocks: 12239 aes-128-cbc's in 0.03s
    OpenSSL 1.0.2g  1 Mar 2016
    built on: reproducible build, date unspecified
    options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
    compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-cbc       9943.26k    28951.04k    77908.11k   797696.00k  3342062.93k
    
Note: See TracWiki for help on using the wiki.