Changes between Initial Version and Version 1 of ventana/encryption


Ignore:
Timestamp:
10/22/2017 05:28:45 AM (7 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ventana/encryption

    v1 v1  
     1[[PageOutline]]
     2
     3= i.MX6 Encryption =
     4The i.MX6 Processors offer hardware encryption through Freescale's Cryptographic Accelerator and Assurance Module (CAAM, also known as SEC4). It offers the following support:
     5 * Security Control
     6 * Advanced High Assurance Boot (A-HAB) System (HAB with embedded enhancements)
     7 * SHA-256, 2048-bit RSA key
     8 * Version control mechanism
     9 * Warm boot
     10 * CSU and TZ initialization
     11 * IC Identification Module (IIM) and Central Security Unit (CSU)
     12 * CSU enhanced for the IIM
     13 * Configured during boot and by e-fuses
     14 * Determines the security level operation mode and the TZ policy
     15 * Tamper Detection
     16
     17For the encryption, these are the HW cryptographic accelerators we have on board the i.MX6:
     18 * AES128, AES256
     19 * 3DES
     20 * ARC4
     21 * SHA1
     22 * SHA224
     23 * SHA256
     24 * MD-5
     25
     26At a high level the '''Cryptographic Accelerator and Assurance Module  (CAAM)''' is a DMA master supporting the following capabilities:
     27* Secure memory feature with HW enforced access control
     28* Cryptographic authentication
     29  * Hashing algorithms
     30     * MD5
     31     * SHA-1
     32     * SHA-224
     33     * SHA-256
     34  * Message authentication codes (MAC)
     35     * HMAC-all hashing algorithms
     36     * AES-CMAC
     37     * AES-XCBC-MAC
     38  * Auto padding
     39  * ICV checking
     40* Authenticated encryption algorithms
     41  * AES-CCM (counter with CBC-MAC)
     42* Symmetric key block ciphers
     43  * AES (128-bit, 192-bit or 256-bit keys)
     44  * DES (64-bit keys, including key parity)
     45  * 3DES (128-bit or 192-bit keys, including key parity)
     46* Cipher modes
     47  * ECB, CBC, CFB, OFB for all block ciphers
     48  * CTR for AES
     49* Symmetric key stream ciphers
     50* ArcFour (alleged RC4 with 40 - 128 bit keys)
     51* Random-number generation
     52  * Entropy is generated via an independent free running ring oscillator
     53  * Oscillator is off when not generating entropy; for lower-power consumption
     54  * NIST-compliant, pseudo random-number generator seeded using hardware generated entropy
     55
     56The above features are usable via the CAAM driver which is available on our Yocto BSPs, as well as our [wiki:OpenWrt/building latest OpenWrt] on [https://github.com/Gateworks/openwrt GitHub]. In order to make use of some of these features, the Linux CryptoAPI must be used. The driver itself is integrated with the Crypto API kernel service in which the algorithms supported by CAAM can replace the native SW implementations.
     57
     58== References ==
     59 * [https://community.freescale.com/thread/303229]
     60 * [https://community.freescale.com/thread/319374]
     61 * [https://community.freescale.com/thread/311605]
     62 * [https://community.freescale.com/thread/309499]
     63 * [http://www.freescale.com/webapp/sps/site/overview.jsp?code=NETWORK_SECURITY_CRYPTOG]
     64 * [https://community.freescale.com/docs/DOC-96451]
     65 * [https://www.freescale.com/webapp/Download?colCode=IMX_CST_TOOL&appType=license&location=null&fasp=1&WT_TYPE=Initialization/Boot/Device%20Driver%20Code%20Generation&WT_VENDOR=FREESCALE&WT_FILE_FORMAT=tgz&WT_ASSET=Downloads&Parent_nodeId=13376371545356958310 Freescale Code Signing Tool] for the High Assurance Boot library. Provides software code signing support designed for use with i.MX processors that integrate the HAB library in the internal boot ROM
     66
     67== i.MX6 Security Reference Manual ==
     68Please contact support@gateworks.com to request this document.
     69
     70= Driver Information =
     71The Cryptographic Accelerator and Assurance Module (CAAM) is the driver for Freescale's hardware crypto. It configures hw to operate as a DPAA component, as well as creates job ring devices. Please see [https://www.kernel.org/doc/menuconfig/drivers-crypto-caam-Kconfig.html here] for more detail. This driver was added to Linux 4.3, but we have support for it in our Yocto 1.6, Yocto 1.7, Yocto 1.8, and OpenWrt next (our latest OpenWrt branch on [https://github.com/Gateworks/openwrt GitHub]).
     72
     73In order to enable the CAAM driver from within the kernel, the {{{CONFIG_CRYPTO_DEV_FSL_CAAM}}} must be set:
     74 * {{{make menuconfig}}}
     75  * Kernel Cryptographic API → Hardware crypto devices → Freescale CAAM-Multicore driver backend
     76   * You can either build as a module via {{{M}}} or statically via {{{Y}}}
     77
     78Enabling the above will select the following in the kernel config:
     79{{{#!bash
     80CONFIG_CRYPTO_HW=y
     81CONFIG_CRYPTO_DEV_FSL_CAAM=m
     82CONFIG_CRYPTO_DEV_FSL_CAAM_JR=m
     83CONFIG_CRYPTO_DEV_FSL_CAAM_CRYPTO_API=m
     84CONFIG_CRYPTO_DEV_FSL_CAAM_AHASH_API=m
     85CONFIG_CRYPTO_DEV_FSL_CAAM_RNG_API=m
     86CONFIG_CRYPTO_DEV_FSL_CAAM_RINGSIZE=9
     87CONFIG_CRYPTO_DEV_FSL_CAAM_INTC=n
     88CONFIG_CRYPTO_DEV_FSL_CAAM_DEBUG=n
     89}}}
     90
     91When this is enabled, {{{/proc/crypto}}} will list out that system's cipher support and where that support comes from. For example:
     92{{{#!bash
     93root@OpenWrt:/# cat /proc/crypto
     94<snip>
     95name         : sha1
     96driver       : sha1-caam
     97module       : caamhash
     98priority     : 3000
     99refcnt       : 1
     100selftest     : passed
     101internal     : no
     102type         : ahash
     103async        : yes
     104blocksize    : 64
     105digestsize   : 20
     106<snip>
     107}}}
     108
     109We can see that the {{{caamhash}}} module offers the sha1 ahash function. This effectively means that any program using this hash will automatically gain hardware acceleration.
     110
     111== cryptodev vs. af_alg vs. ocf-linux ==
     112{{{cryptodev}}}, {{{af_alg}}}, and {{{ocf-linux}}} are three userspace crypto API's into the Linux kernel. While both {{{cryptodev}}} and {{{af_alg}}} use the native Linux crypto interface, {{{ocf-linux}}} does not. {{{ocf-linux}}} also conflicts with {{{cryptodev}}} in that they both create a {{{/dev/crypto}}} interface. For this reason, these two drivers cannot co-exist. Gateworks has decided to include {{{cryptodev}}} over {{{ocf-linux}}} for these reasons.
     113
     114However, {{{af_alg}}} and {{{cryptodev}}} both use the native Linux crypto interface, but go about it in differing ways. According to the [http://cryptodev-linux.org/comparison.html cryptodev] site, {{{cryptodev}}} outperforms {{{af_alg}}}, mainly due to how each was created. Both are acceptable ways of interacting with the kernel and many programs default to utilizing one or the other. Programs such as {{{openssl}}} are able to pick the engine they can use. However, {{{cryptodev}}} must be built out-of-tree because it is not apart of the kernel. However, {{{af_alg}}} is and so no special handling must be done there.
     115
     116To build {{{cryptodev}}} out-of-tree:
     117{{{#!bash
     118# Download cryptodev tarball from here: http://download.gna.org/cryptodev-linux/
     119wget http://download.gna.org/cryptodev-linux/cryptodev-linux-1.8.tar.gz
     120tar xvf cryptodev-linux-1.8.tar.gz
     121cd cryptodev-linux-1.8
     122# Make sure you have kernel build directory for the kernel you are compiling for and point to it via KERNEL_DIR= (if cross compiling)
     123KERNEL_DIR=/usr/src/psidhu/linux/linux-imx6 make
     124make install # Only do this if compiling on target system
     125}}}
     126
     127Gateworks has written an example {{{cryptodev}}} program for the cbc(aes) cipher called [https://github.com/Gateworks/gateworks-sample-apps/tree/master/gw-cryptodev-example gw-cryptodev-example]. To get the source and compile, please follow these instructions:
     128{{{#!bash
     129git clone https://github.com/Gateworks/gateworks-sample-apps.git
     130cd gateworks-sample-apps/gw-cryptodev-example
     131# (optional) Source your env. if cross compiling. In this case, we'll use the Yocto 1.8 SDK.
     132. /opt/pocky/1.8/environment-setup-cortexa9hf-vfp-neon-poky-linux-gnueabi # Please make sure this is the updated version with cryptodev.h.
     133make
     134}}}
     135
     136To run:
     137{{{#!bash
     138root@ventana:~# ./gw-cryptodev-example
     139Using cbc-aes-caam driver! Accelerated through SEC4 engine.
     140Encrypted 'Hello, World!' to '���<�팻�m��5͎'
     141Decrypted '���<�팻�m��5͎' to 'Hello, World!'
     142Test passed!
     143}}}
     144
     145An example of using this same cipher, but through {{{af_alg}}}, can be found [http://lwn.net/Articles/410833/ here].
     146
     147Note that the main differences between using {{{cryptodev}}} and {{{af_alg}}} are how messages are sent to the kernel. {{{cryptodev}}} relies on {{{ioctl}}} calls while {{{af_alg}}} relies on the kernels SOCKET family (called AF_ALG).
     148
     149* References
     150 * https://en.wikipedia.org/wiki/Crypto_API_%28Linux%29
     151 * http://lwn.net/Articles/410833/
     152 * https://lwn.net/Articles/410536/
     153 * http://cryptodev-linux.org/
     154
     155== BSP Support ==
     156Both Yocto and the [wiki:OpenWrt/building latest OpenWrt] have CAAM support.
     157
     158For example, adding the CAAM driver will grant the ability to directly access the hardware random number generator via {{{/dev/hwrng}}}. This tremendously speeds up generation of random garbage as seen below:
     159{{{#!bash
     160# Generate 50Mb of data via software
     161root@OpenWrt:/# time dd if=/dev/urandom of=/tmp/sw_random count=50 bs=1M
     16250+0 records in
     16350+0 records out
     164real    0m 17.29s
     165user    0m 0.00s
     166sys     0m 17.28s
     167# Now generate 50Mb of data via hardware
     168root@OpenWrt:/# time dd if=/dev/hwrng of=/tmp/hw_random count=50 bs=1M
     16950+0 records in
     17050+0 records out
     171real    0m 1.05s
     172user    0m 0.00s
     173sys     0m 1.04s
     174}}}
     175
     176As seen above, using the hardware accelerated rng, random data with good entropy was generated almost 17x faster.
     177
     178This, however, also means programs using either {{{cryptodev}}} or {{{af_alg}}} will automatically have hardware accelerated cryptography. However, some programs use their own software based algorithms for portability reasons. One such program is {{{openssl}}}. Note, {{{openssl}}} must be compiled with the following flags in order to use the {{{cryptodev}}} engine: {{{-DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS}}}
     179
     180=== Yocto ===
     181In the Yocto BSP, {{{openssl}}} is built with {{{cryptodev}}} support. Please see below for a comparison using the {{{cryptodev}}} engine and without:
     182* Yocto 1.8 WITHOUT {{{cryptodev}}} (using {{{openssl}}} software based algorithms)
     183{{{#!bash
     184root@ventana:~# openssl speed aes-128-cbc
     185Doing aes-128 cbc for 3s on 16 size blocks: 6008244 aes-128 cbc's in 3.00s
     186Doing aes-128 cbc for 3s on 64 size blocks: 1608835 aes-128 cbc's in 2.99s
     187Doing aes-128 cbc for 3s on 256 size blocks: 411309 aes-128 cbc's in 2.99s
     188Doing aes-128 cbc for 3s on 1024 size blocks: 103187 aes-128 cbc's in 3.00s
     189Doing aes-128 cbc for 3s on 8192 size blocks: 12923 aes-128 cbc's in 3.00s
     190OpenSSL 1.0.2d 9 Jul 2015
     191built on: reproducible build, date unspecified
     192options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
     193compiler: arm-poky-linux-gnueabi-gcc  -march=armv7-a -marm  -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN    -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
     194The 'numbers' are in 1000s of bytes per second processed.
     195type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
     196aes-128 cbc      32043.97k    34436.60k    35215.75k    35221.16k    35288.41k
     197}}}
     198
     199* Yocto 1.8 with {{{cryptodev}}} (using kernel hardware accelerated algorithms)
     200{{{#!bash
     201root@ventana:~# openssl speed -evp aes-128-cbc -engine cryptodev
     202engine "cryptodev" set.
     203Doing aes-128-cbc for 3s on 16 size blocks: 44146 aes-128-cbc's in 0.14s
     204Doing aes-128-cbc for 3s on 64 size blocks: 43561 aes-128-cbc's in 0.11s
     205Doing aes-128-cbc for 3s on 256 size blocks: 39724 aes-128-cbc's in 0.13s
     206Doing aes-128-cbc for 3s on 1024 size blocks: 30733 aes-128-cbc's in 0.10s
     207Doing aes-128-cbc for 3s on 8192 size blocks: 9122 aes-128-cbc's in 0.01s
     208OpenSSL 1.0.2d 9 Jul 2015
     209built on: reproducible build, date unspecified
     210options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
     211compiler: arm-poky-linux-gnueabi-gcc  -march=armv7-a -marm  -mthumb-interwork -mfloat-abi=hard -mfpu=neon -mtune=cortex-a9 --sysroot=/usr/src/psidhu/gw-yocto-1.8/build/tmp/sysroots/ventana -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN    -DTERMIO  -O2 -pipe -g -feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
     212The 'numbers' are in 1000s of bytes per second processed.
     213type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
     214aes-128-cbc       5045.26k    25344.58k    78225.72k   314705.92k  7472742.40k
     215}}}
     216
     217One of the biggest advantages to using hardware encryption is how cpu is utilized. In the above two cases, we found the following to be true:
     218 * With {{{cryptodev}}} disabled: 25% usr CPU usage (one core pegged to 100%)
     219 * With {{{cryptodev}}} enabled : 16% sys CPU usage, 2% sirq
     220 * {{{openssl}}} using the hardware engine {{{cryptodev}}} increased the number of bytes per second processed tremendously, especially on the larger number of bytes processed
     221
     222=== OpenWrt ===
     223Our OpenWrt 16.02 BSP added support for CAAM and {{{cryptodev}}}. {{{openssl}}} can utilize this engine like Yocto. Please see below for some results:
     224
     225* OpenWrt 16.02 WITHOUT {{{cryptodev}}} (using {{{openssl}}} software based algorithms)
     226{{{#!bash
     227root@OpenWrt:/# openssl speed aes-128-cbc
     228Doing aes-128 cbc for 3s on 16 size blocks: 2890377 aes-128 cbc's in 3.00s
     229Doing aes-128 cbc for 3s on 64 size blocks: 767833 aes-128 cbc's in 2.99s
     230Doing aes-128 cbc for 3s on 256 size blocks: 196252 aes-128 cbc's in 3.00s
     231Doing aes-128 cbc for 3s on 1024 size blocks: 49243 aes-128 cbc's in 3.00s
     232Doing aes-128 cbc for 3s on 8192 size blocks: 6165 aes-128 cbc's in 3.00s
     233OpenSSL 1.0.2g  1 Mar 2016
     234built on: reproducible build, date unspecified
     235options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
     236compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall
     237The 'numbers' are in 1000s of bytes per second processed.
     238type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
     239aes-128 cbc      15415.34k    16435.22k    16746.84k    16808.28k    16834.56k
     240}}}
     241
     242* OpenWrt 16.02 with {{{cryptodev}}} (using kernel hardware accelerated algorithms)
     243{{{#!bash
     244root@OpenWrt:/# openssl speed -evp aes-128-cbc -engine cryptodev
     245engine "cryptodev" set.
     246Doing aes-128-cbc for 3s on 16 size blocks: 80789 aes-128-cbc's in 0.13s
     247Doing aes-128-cbc for 3s on 64 size blocks: sy67854 aes-128-cbc's in 0.15s
     248Doing aes-128-cbc for 3s on 256 size blocks: 63909 aes-128-cbc's in 0.21s
     249Doing aes-128-cbc for 3s on 1024 size blocks: 46740 aes-128-cbc's in 0.06s
     250Doing aes-128-cbc for 3s on 8192 size blocks: 12239 aes-128-cbc's in 0.03s
     251OpenSSL 1.0.2g  1 Mar 2016
     252built on: reproducible build, date unspecified
     253options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
     254compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/usr/include -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include/fortify -I/usr/src/psidhu/openwrt/openwrt-next/staging_dir/toolchain-arm_cortex-a9+neon_gcc-5.2.0_musl-1.1.12_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv7-a -mtune=cortex-a9 -mfpu=neon -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap /usr/src/psidhu/openwrt/openwrt-next/build_dir/target-arm_cortex-a9+neon_musl-1.1.12_eabi/openssl-1.0.2g:openssl-1.0.2g -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -fomit-frame-pointer -Wall
     255The 'numbers' are in 1000s of bytes per second processed.
     256type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
     257aes-128-cbc       9943.26k    28951.04k    77908.11k   797696.00k  3342062.93k
     258}}}