[[PageOutline]] = i.MX6 Encryption = The i.MX6 Processors offer hardware encryption through Freescale's Cryptographic Accelerator and Assurance Module (CAAM, also known as SEC4). It offers the following support: * Security Control * Advanced High Assurance Boot (A-HAB) System (HAB with embedded enhancements) * SHA-256, 2048-bit RSA key * Version control mechanism * Warm boot * CSU and TZ initialization * IC Identification Module (IIM) and Central Security Unit (CSU) * CSU enhanced for the IIM * Configured during boot and by e-fuses * Determines the security level operation mode and the TZ policy * Tamper Detection For the encryption, these are the HW cryptographic accelerators we have on board the i.MX6: * AES128, AES256 * 3DES * ARC4 * SHA1 * SHA224 * SHA256 * MD-5 At a high level the '''Cryptographic Accelerator and Assurance Module (CAAM)''' is a DMA master supporting the following capabilities: * Secure memory feature with HW enforced access control * Cryptographic authentication * Hashing algorithms * MD5 * SHA-1 * SHA-224 * SHA-256 * Message authentication codes (MAC) * HMAC-all hashing algorithms * AES-CMAC * AES-XCBC-MAC * Auto padding * ICV checking * Authenticated encryption algorithms * AES-CCM (counter with CBC-MAC) * Symmetric key block ciphers * AES (128-bit, 192-bit or 256-bit keys) * DES (64-bit keys, including key parity) * 3DES (128-bit or 192-bit keys, including key parity) * Cipher modes * ECB, CBC, CFB, OFB for all block ciphers * CTR for AES * Symmetric key stream ciphers * ArcFour (alleged RC4 with 40 - 128 bit keys) * Random-number generation * Entropy is generated via an independent free running ring oscillator * Oscillator is off when not generating entropy; for lower-power consumption * NIST-compliant, pseudo random-number generator seeded using hardware generated entropy The above features are usable via the CAAM driver which is available on our Yocto BSPs, as well as our [wiki:OpenWrt/building latest OpenWrt] on [https://github.com/Gateworks/openwrt GitHub]. In order to make use of some of these features, the Linux CryptoAPI must be used. The driver itself is integrated with the Crypto API kernel service in which the algorithms supported by CAAM can replace the native SW implementations. == References == * [https://community.freescale.com/thread/303229] * [https://community.freescale.com/thread/319374] * [https://community.freescale.com/thread/311605] * [https://community.freescale.com/thread/309499] * [http://www.freescale.com/webapp/sps/site/overview.jsp?code=NETWORK_SECURITY_CRYPTOG] * [https://community.freescale.com/docs/DOC-96451] * [https://www.freescale.com/webapp/Download?colCode=IMX_CST_TOOL&appType=license&location=null&fasp=1&WT_TYPE=Initialization/Boot/Device%20Driver%20Code%20Generation&WT_VENDOR=FREESCALE&WT_FILE_FORMAT=tgz&WT_ASSET=Downloads&Parent_nodeId=13376371545356958310 Freescale Code Signing Tool] for the High Assurance Boot library. Provides software code signing support designed for use with i.MX processors that integrate the HAB library in the internal boot ROM * [https://www.nxp.com/docs/en/application-note/AN4581.pdf Freescale HAB App Note] == i.MX6 Security Reference Manual == Please register on the NXP website and request the document by visiting the link [https://www.nxp.com/webapp/sps/download/mod_download.jsp?colCode=IMX6DQ6SDLSRM&appType=moderatedWithoutFAE here] = Driver Information = The Cryptographic Accelerator and Assurance Module (CAAM) is the driver for Freescale's hardware crypto. It configures hw to operate as a DPAA component, as well as creates job ring devices. Please see [https://www.kernel.org/doc/menuconfig/drivers-crypto-caam-Kconfig.html here] for more detail. This driver was added to Linux 4.3, but we have support for it in our Yocto 1.6, Yocto 1.7, Yocto 1.8, and OpenWrt next (our latest OpenWrt branch on [https://github.com/Gateworks/openwrt GitHub]). In order to enable the CAAM driver from within the kernel, the {{{CONFIG_CRYPTO_DEV_FSL_CAAM}}} must be set: * {{{make menuconfig}}} * Kernel Cryptographic API → Hardware crypto devices → Freescale CAAM-Multicore driver backend * You can either build as a module via {{{M}}} or statically via {{{Y}}} Enabling the above will select the following in the kernel config: {{{#!bash CONFIG_CRYPTO_HW=y CONFIG_CRYPTO_DEV_FSL_CAAM=m CONFIG_CRYPTO_DEV_FSL_CAAM_JR=m CONFIG_CRYPTO_DEV_FSL_CAAM_CRYPTO_API=m CONFIG_CRYPTO_DEV_FSL_CAAM_AHASH_API=m CONFIG_CRYPTO_DEV_FSL_CAAM_RNG_API=m CONFIG_CRYPTO_DEV_FSL_CAAM_RINGSIZE=9 CONFIG_CRYPTO_DEV_FSL_CAAM_INTC=n CONFIG_CRYPTO_DEV_FSL_CAAM_DEBUG=n }}} When this is enabled, {{{/proc/crypto}}} will list out that system's cipher support and where that support comes from. For example: {{{#!bash root@OpenWrt:/# cat /proc/crypto name : sha1 driver : sha1-caam module : caamhash priority : 3000 refcnt : 1 selftest : passed internal : no type : ahash async : yes blocksize : 64 digestsize : 20 }}} We can see that the {{{caamhash}}} module offers the sha1 ahash function. This effectively means that any program using this hash will automatically gain hardware acceleration. == cryptodev vs. af_alg vs. ocf-linux == {{{cryptodev}}}, {{{af_alg}}}, and {{{ocf-linux}}} are three userspace crypto API's into the Linux kernel. While both {{{cryptodev}}} and {{{af_alg}}} use the native Linux crypto interface, {{{ocf-linux}}} does not. {{{ocf-linux}}} also conflicts with {{{cryptodev}}} in that they both create a {{{/dev/crypto}}} interface. For this reason, these two drivers cannot co-exist. Gateworks has decided to include {{{cryptodev}}} over {{{ocf-linux}}} for these reasons. However, {{{af_alg}}} and {{{cryptodev}}} both use the native Linux crypto interface, but go about it in differing ways. According to the [http://cryptodev-linux.org/comparison.html cryptodev] site, {{{cryptodev}}} outperforms {{{af_alg}}}, mainly due to how each was created. Both are acceptable ways of interacting with the kernel and many programs default to utilizing one or the other. Programs such as {{{openssl}}} are able to pick the engine they can use. However, {{{cryptodev}}} must be built out-of-tree because it is not apart of the kernel. However, {{{af_alg}}} is and so no special handling must be done there. To build {{{cryptodev}}} out-of-tree: {{{#!bash # Download cryptodev tarball from here: http://download.gna.org/cryptodev-linux/ wget http://download.gna.org/cryptodev-linux/cryptodev-linux-1.8.tar.gz tar xvf cryptodev-linux-1.8.tar.gz cd cryptodev-linux-1.8 # Make sure you have kernel build directory for the kernel you are compiling for and point to it via KERNEL_DIR= (if cross compiling) KERNEL_DIR=/usr/src/psidhu/linux/linux-imx6 make make install # Only do this if compiling on target system }}} Gateworks has written an example {{{cryptodev}}} program for the cbc(aes) cipher called [https://github.com/Gateworks/gateworks-sample-apps/tree/master/gw-cryptodev-example gw-cryptodev-example]. To get the source and compile, please follow these instructions: {{{#!bash git clone https://github.com/Gateworks/gateworks-sample-apps.git cd gateworks-sample-apps/gw-cryptodev-example # (optional) Source your env. if cross compiling. In this case, we'll use the Yocto 1.8 SDK. . /opt/pocky/1.8/environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi # Please make sure this is the updated version with cryptodev.h. make }}} To run: {{{#!bash root@ventana:~# ./gw-cryptodev-example Using cbc-aes-caam driver! Accelerated through SEC4 engine. Encrypted 'Hello, World!' to '���<�팻�m��5͎' Decrypted '���<�팻�m��5͎' to 'Hello, World!' Test passed! }}} An example of using this same cipher, but through {{{af_alg}}}, can be found [http://lwn.net/Articles/410833/ here]. Note that the main differences between using {{{cryptodev}}} and {{{af_alg}}} are how messages are sent to the kernel. {{{cryptodev}}} relies on {{{ioctl}}} calls while {{{af_alg}}} relies on the kernels SOCKET family (called AF_ALG). * References * https://en.wikipedia.org/wiki/Crypto_API_%28Linux%29 * http://lwn.net/Articles/410833/ * https://lwn.net/Articles/410536/ * http://cryptodev-linux.org/ == BSP Support == Both Yocto and the [wiki:OpenWrt/building latest OpenWrt] have CAAM support. For example, adding the CAAM driver will grant the ability to directly access the hardware random number generator via {{{/dev/hwrng}}}. This tremendously speeds up generation of random garbage as seen below: {{{#!bash # Generate 50Mb of data via software root@OpenWrt:/# time dd if=/dev/urandom of=/tmp/sw_random count=50 bs=1M 50+0 records in 50+0 records out real 0m 17.29s user 0m 0.00s sys 0m 17.28s # Now generate 50Mb of data via hardware root@OpenWrt:/# time dd if=/dev/hwrng of=/tmp/hw_random count=50 bs=1M 50+0 records in 50+0 records out real 0m 1.05s user 0m 0.00s sys 0m 1.04s }}} As seen above, using the hardware accelerated rng, random data with good entropy was generated almost 17x faster. This, however, also means programs using either {{{cryptodev}}} or {{{af_alg}}} will automatically have hardware accelerated cryptography. However, some programs use their own software based algorithms for portability reasons. One such program is {{{openssl}}}. Note, {{{openssl}}} must be compiled with the following flags in order to use the {{{cryptodev}}} engine: {{{-DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS}}} === Yocto === In the Yocto BSP, {{{openssl}}} is built with {{{cryptodev}}} support. Please see below for a comparison using the {{{cryptodev}}} engine and without: * Yocto 1.8 WITHOUT {{{cryptodev}}} (using {{{openssl}}} software based algorithms) {{{#!bash root@ventana:~# openssl speed aes-128-cbc Doing aes-128 cbc for 3s on 16 size blocks: 6008244 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 1608835 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 256 size blocks: 411309 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 1024 size blocks: 103187 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 12923 aes-128 cbc's in 3.00s OpenSSL 1.0.2d 9 Jul 2015 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) compiler: arm-poky-linux-gnueabi-gcc -march=armv7-a -marm -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 32043.97k 34436.60k 35215.75k 35221.16k 35288.41k }}} * Yocto 1.8 with {{{cryptodev}}} (using kernel hardware accelerated algorithms) {{{#!bash root@ventana:~# openssl speed -evp aes-128-cbc -engine cryptodev engine "cryptodev" set. Doing aes-128-cbc for 3s on 16 size blocks: 44146 aes-128-cbc's in 0.14s Doing aes-128-cbc for 3s on 64 size blocks: 43561 aes-128-cbc's in 0.11s Doing aes-128-cbc for 3s on 256 size blocks: 39724 aes-128-cbc's in 0.13s Doing aes-128-cbc for 3s on 1024 size blocks: 30733 aes-128-cbc's in 0.10s Doing aes-128-cbc for 3s on 8192 size blocks: 9122 aes-128-cbc's in 0.01s OpenSSL 1.0.2d 9 Jul 2015 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) compiler: arm-poky-linux-gnueabi-gcc -march=armv7-a -marm -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 5045.26k 25344.58k 78225.72k 314705.92k 7472742.40k }}} One of the biggest advantages to using hardware encryption is how cpu is utilized. In the above two cases, we found the following to be true: * With {{{cryptodev}}} disabled: 25% usr CPU usage (one core pegged to 100%) * With {{{cryptodev}}} enabled : 16% sys CPU usage, 2% sirq * {{{openssl}}} using the hardware engine {{{cryptodev}}} increased the number of bytes per second processed tremendously, especially on the larger number of bytes processed === OpenWrt === Our OpenWrt 16.02 BSP added support for CAAM and {{{cryptodev}}}. {{{openssl}}} can utilize this engine like Yocto. Please see below for some results: * OpenWrt 16.02 WITHOUT {{{cryptodev}}} (using {{{openssl}}} software based algorithms) {{{#!bash root@OpenWrt:/# openssl speed aes-128-cbc Doing aes-128 cbc for 3s on 16 size blocks: 2890377 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 767833 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 256 size blocks: 196252 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 1024 size blocks: 49243 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 6165 aes-128 cbc's in 3.00s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 15415.34k 16435.22k 16746.84k 16808.28k 16834.56k }}} * OpenWrt 16.02 with {{{cryptodev}}} (using kernel hardware accelerated algorithms) {{{#!bash root@OpenWrt:/# openssl speed -evp aes-128-cbc -engine cryptodev engine "cryptodev" set. Doing aes-128-cbc for 3s on 16 size blocks: 80789 aes-128-cbc's in 0.13s Doing aes-128-cbc for 3s on 64 size blocks: sy67854 aes-128-cbc's in 0.15s Doing aes-128-cbc for 3s on 256 size blocks: 63909 aes-128-cbc's in 0.21s Doing aes-128-cbc for 3s on 1024 size blocks: 46740 aes-128-cbc's in 0.06s Doing aes-128-cbc for 3s on 8192 size blocks: 12239 aes-128-cbc's in 0.03s OpenSSL 1.0.2g 1 Mar 2016 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 9943.26k 28951.04k 77908.11k 797696.00k 3342062.93k }}}