Version 2 (modified by 7 years ago) ( diff ) | ,
---|
i.MX6 Encryption
The i.MX6 Processors offer hardware encryption through Freescale's Cryptographic Accelerator and Assurance Module (CAAM, also known as SEC4). It offers the following support:
- Security Control
- Advanced High Assurance Boot (A-HAB) System (HAB with embedded enhancements)
- SHA-256, 2048-bit RSA key
- Version control mechanism
- Warm boot
- CSU and TZ initialization
- IC Identification Module (IIM) and Central Security Unit (CSU)
- CSU enhanced for the IIM
- Configured during boot and by e-fuses
- Determines the security level operation mode and the TZ policy
- Tamper Detection
For the encryption, these are the HW cryptographic accelerators we have on board the i.MX6:
- AES128, AES256
- 3DES
- ARC4
- SHA1
- SHA224
- SHA256
- MD-5
At a high level the Cryptographic Accelerator and Assurance Module (CAAM) is a DMA master supporting the following capabilities:
- Secure memory feature with HW enforced access control
- Cryptographic authentication
- Hashing algorithms
- MD5
- SHA-1
- SHA-224
- SHA-256
- Message authentication codes (MAC)
- HMAC-all hashing algorithms
- AES-CMAC
- AES-XCBC-MAC
- Auto padding
- ICV checking
- Hashing algorithms
- Authenticated encryption algorithms
- AES-CCM (counter with CBC-MAC)
- Symmetric key block ciphers
- AES (128-bit, 192-bit or 256-bit keys)
- DES (64-bit keys, including key parity)
- 3DES (128-bit or 192-bit keys, including key parity)
- Cipher modes
- ECB, CBC, CFB, OFB for all block ciphers
- CTR for AES
- Symmetric key stream ciphers
- ArcFour (alleged RC4 with 40 - 128 bit keys)
- Random-number generation
- Entropy is generated via an independent free running ring oscillator
- Oscillator is off when not generating entropy; for lower-power consumption
- NIST-compliant, pseudo random-number generator seeded using hardware generated entropy
The above features are usable via the CAAM driver which is available on our Yocto BSPs, as well as our latest OpenWrt on GitHub. In order to make use of some of these features, the Linux CryptoAPI must be used. The driver itself is integrated with the Crypto API kernel service in which the algorithms supported by CAAM can replace the native SW implementations.
References
- https://community.freescale.com/thread/303229
- https://community.freescale.com/thread/319374
- https://community.freescale.com/thread/311605
- https://community.freescale.com/thread/309499
- http://www.freescale.com/webapp/sps/site/overview.jsp?code=NETWORK_SECURITY_CRYPTOG
- https://community.freescale.com/docs/DOC-96451
- Freescale Code Signing Tool for the High Assurance Boot library. Provides software code signing support designed for use with i.MX processors that integrate the HAB library in the internal boot ROM
- Freescale HAB App Note
i.MX6 Security Reference Manual
Please contact support@… to request this document.
Driver Information
The Cryptographic Accelerator and Assurance Module (CAAM) is the driver for Freescale's hardware crypto. It configures hw to operate as a DPAA component, as well as creates job ring devices. Please see here for more detail. This driver was added to Linux 4.3, but we have support for it in our Yocto 1.6, Yocto 1.7, Yocto 1.8, and OpenWrt next (our latest OpenWrt branch on GitHub).
In order to enable the CAAM driver from within the kernel, the CONFIG_CRYPTO_DEV_FSL_CAAM
must be set:
make menuconfig
- Kernel Cryptographic API → Hardware crypto devices → Freescale CAAM-Multicore driver backend
- You can either build as a module via
M
or statically viaY
- You can either build as a module via
- Kernel Cryptographic API → Hardware crypto devices → Freescale CAAM-Multicore driver backend
Enabling the above will select the following in the kernel config:
CONFIG_CRYPTO_HW=y CONFIG_CRYPTO_DEV_FSL_CAAM=m CONFIG_CRYPTO_DEV_FSL_CAAM_JR=m CONFIG_CRYPTO_DEV_FSL_CAAM_CRYPTO_API=m CONFIG_CRYPTO_DEV_FSL_CAAM_AHASH_API=m CONFIG_CRYPTO_DEV_FSL_CAAM_RNG_API=m CONFIG_CRYPTO_DEV_FSL_CAAM_RINGSIZE=9 CONFIG_CRYPTO_DEV_FSL_CAAM_INTC=n CONFIG_CRYPTO_DEV_FSL_CAAM_DEBUG=n
When this is enabled, /proc/crypto
will list out that system's cipher support and where that support comes from. For example:
root@OpenWrt:/# cat /proc/crypto <snip> name : sha1 driver : sha1-caam module : caamhash priority : 3000 refcnt : 1 selftest : passed internal : no type : ahash async : yes blocksize : 64 digestsize : 20 <snip>
We can see that the caamhash
module offers the sha1 ahash function. This effectively means that any program using this hash will automatically gain hardware acceleration.
cryptodev vs. af_alg vs. ocf-linux
cryptodev
, af_alg
, and ocf-linux
are three userspace crypto API's into the Linux kernel. While both cryptodev
and af_alg
use the native Linux crypto interface, ocf-linux
does not. ocf-linux
also conflicts with cryptodev
in that they both create a /dev/crypto
interface. For this reason, these two drivers cannot co-exist. Gateworks has decided to include cryptodev
over ocf-linux
for these reasons.
However, af_alg
and cryptodev
both use the native Linux crypto interface, but go about it in differing ways. According to the cryptodev site, cryptodev
outperforms af_alg
, mainly due to how each was created. Both are acceptable ways of interacting with the kernel and many programs default to utilizing one or the other. Programs such as openssl
are able to pick the engine they can use. However, cryptodev
must be built out-of-tree because it is not apart of the kernel. However, af_alg
is and so no special handling must be done there.
To build cryptodev
out-of-tree:
# Download cryptodev tarball from here: http://download.gna.org/cryptodev-linux/ wget http://download.gna.org/cryptodev-linux/cryptodev-linux-1.8.tar.gz tar xvf cryptodev-linux-1.8.tar.gz cd cryptodev-linux-1.8 # Make sure you have kernel build directory for the kernel you are compiling for and point to it via KERNEL_DIR= (if cross compiling) KERNEL_DIR=/usr/src/psidhu/linux/linux-imx6 make make install # Only do this if compiling on target system
Gateworks has written an example cryptodev
program for the cbc(aes) cipher called gw-cryptodev-example. To get the source and compile, please follow these instructions:
git clone https://github.com/Gateworks/gateworks-sample-apps.git cd gateworks-sample-apps/gw-cryptodev-example # (optional) Source your env. if cross compiling. In this case, we'll use the Yocto 1.8 SDK. . /opt/pocky/1.8/environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi # Please make sure this is the updated version with cryptodev.h. make
To run:
root@ventana:~# ./gw-cryptodev-example Using cbc-aes-caam driver! Accelerated through SEC4 engine. Encrypted 'Hello, World!' to '���<�팻�m��5͎' Decrypted '���<�팻�m��5͎' to 'Hello, World!' Test passed!
An example of using this same cipher, but through af_alg
, can be found here.
Note that the main differences between using cryptodev
and af_alg
are how messages are sent to the kernel. cryptodev
relies on ioctl
calls while af_alg
relies on the kernels SOCKET family (called AF_ALG).
- References
BSP Support
Both Yocto and the latest OpenWrt have CAAM support.
For example, adding the CAAM driver will grant the ability to directly access the hardware random number generator via /dev/hwrng
. This tremendously speeds up generation of random garbage as seen below:
# Generate 50Mb of data via software root@OpenWrt:/# time dd if=/dev/urandom of=/tmp/sw_random count=50 bs=1M 50+0 records in 50+0 records out real 0m 17.29s user 0m 0.00s sys 0m 17.28s # Now generate 50Mb of data via hardware root@OpenWrt:/# time dd if=/dev/hwrng of=/tmp/hw_random count=50 bs=1M 50+0 records in 50+0 records out real 0m 1.05s user 0m 0.00s sys 0m 1.04s
As seen above, using the hardware accelerated rng, random data with good entropy was generated almost 17x faster.
This, however, also means programs using either cryptodev
or af_alg
will automatically have hardware accelerated cryptography. However, some programs use their own software based algorithms for portability reasons. One such program is openssl
. Note, openssl
must be compiled with the following flags in order to use the cryptodev
engine: -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
Yocto
In the Yocto BSP, openssl
is built with cryptodev
support. Please see below for a comparison using the cryptodev
engine and without:
- Yocto 1.8 WITHOUT
cryptodev
(usingopenssl
software based algorithms)root@ventana:~# openssl speed aes-128-cbc Doing aes-128 cbc for 3s on 16 size blocks: 6008244 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 1608835 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 256 size blocks: 411309 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 1024 size blocks: 103187 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 12923 aes-128 cbc's in 3.00s OpenSSL 1.0.2d 9 Jul 2015 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) compiler: arm-poky-linux-gnueabi-gcc -march=armv7-a -marm -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 32043.97k 34436.60k 35215.75k 35221.16k 35288.41k
- Yocto 1.8 with
cryptodev
(using kernel hardware accelerated algorithms)root@ventana:~# openssl speed -evp aes-128-cbc -engine cryptodev engine "cryptodev" set. Doing aes-128-cbc for 3s on 16 size blocks: 44146 aes-128-cbc's in 0.14s Doing aes-128-cbc for 3s on 64 size blocks: 43561 aes-128-cbc's in 0.11s Doing aes-128-cbc for 3s on 256 size blocks: 39724 aes-128-cbc's in 0.13s Doing aes-128-cbc for 3s on 1024 size blocks: 30733 aes-128-cbc's in 0.10s Doing aes-128-cbc for 3s on 8192 size blocks: 9122 aes-128-cbc's in 0.01s OpenSSL 1.0.2d 9 Jul 2015 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) compiler: arm-poky-linux-gnueabi-gcc -march=armv7-a -marm -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 5045.26k 25344.58k 78225.72k 314705.92k 7472742.40k
One of the biggest advantages to using hardware encryption is how cpu is utilized. In the above two cases, we found the following to be true:
- With
cryptodev
disabled: 25% usr CPU usage (one core pegged to 100%) - With
cryptodev
enabled : 16% sys CPU usage, 2% sirq openssl
using the hardware enginecryptodev
increased the number of bytes per second processed tremendously, especially on the larger number of bytes processed
OpenWrt
Our OpenWrt 16.02 BSP added support for CAAM and cryptodev
. openssl
can utilize this engine like Yocto. Please see below for some results:
- OpenWrt 16.02 WITHOUT
cryptodev
(usingopenssl
software based algorithms)root@OpenWrt:/# openssl speed aes-128-cbc Doing aes-128 cbc for 3s on 16 size blocks: 2890377 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 767833 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 256 size blocks: 196252 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 1024 size blocks: 49243 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 6165 aes-128 cbc's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 15415.34k 16435.22k 16746.84k 16808.28k 16834.56k
- OpenWrt 16.02 with
cryptodev
(using kernel hardware accelerated algorithms)root@OpenWrt:/# openssl speed -evp aes-128-cbc -engine cryptodev engine "cryptodev" set. Doing aes-128-cbc for 3s on 16 size blocks: 80789 aes-128-cbc's in 0.13s Doing aes-128-cbc for 3s on 64 size blocks: sy67854 aes-128-cbc's in 0.15s Doing aes-128-cbc for 3s on 256 size blocks: 63909 aes-128-cbc's in 0.21s Doing aes-128-cbc for 3s on 1024 size blocks: 46740 aes-128-cbc's in 0.06s Doing aes-128-cbc for 3s on 8192 size blocks: 12239 aes-128-cbc's in 0.03s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 9943.26k 28951.04k 77908.11k 797696.00k 3342062.93k