Changes between Version 4 and Version 5 of venice/npu


Ignore:
Timestamp:
07/26/2024 12:10:32 AM (4 months ago)
Author:
Blake Stewart
Comment:

Completed guide on using NPU for 74xx, with infographics

Legend:

Unmodified
Added
Removed
Modified
  • venice/npu

    v4 v5  
    77The NPU operatines up to 2.25 TOPS.
    88
    9 NXP uses the term eIQ, which is 'edge intelligence'. NXP has a eIQ ML software environment for neural networks (NN).
    10 
    11 With eIQ, there are 4 inference engines:
    12  1. OpenCV
    13  1. Arm® NN
    14  1. Arm CMSIS-NN
    15  1. !TensorFlow Lite
    16 
    17 
    18 Some of the default NXP Yocto software has examples in the directory /usr/bin/tensorflow-lite-2.4.0/examples
    19 
    20 PyeIQ is demo software written on top of the eIQ Machine Learning software environment. This Python code provides python classes to provide a simple and efficient baseline to get started.
    21  * [https://pypi.org/project/pyeiq/]
    22  *
    23 {{{
    24 apt install pip
    25 pip3 install pyeiq
    26 }}}
    27  * PyeIQ Examples are shown here: [https://community.nxp.com/t5/Blog/PyeIQ-3-x-Release-User-Guide/ba-p/1305998]
    28 
    29 More information can also be found on the NXP eIQ community page:
    30  * eIQ Edge Intelligence Starter PDF: [https://www.nxp.com/docs/en/fact-sheet/EIQ-FS.pdf]
    31  * eIQ Edge Intelligence Community Page: https://community.nxp.com/t5/eIQ-Machine-Learning-Software/bd-p/eiq
    32 
    33 Other links:
    34  * [https://www.nxp.com/video/using-i-mx-8m-plus-applications-processors-to-enable-ai-in-factory:USING-I.MX-8M-PLUS-APPS-PROCESSORS-TO-ENABLE-AI Using i.MX 8M Plus Applications Processors to Enable AI in Factory]
     9[[Image(https://i.imgur.com/Jw1JTHp.png)]]
     10
     11The easiest way to get started with the NPU is to use a image from the NXP BSP. This image contains the necessary libraries and kernel to interface the NPU without much configuration. You can either [[https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf | follow the guide to build their image]] or [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | download a pre-built one]] (recommended).
     12
     13This guide assumes you have:
     14- A Gateworks board with a i.MX8M Plus procesor.
     15- A NXP account, which is necessary to download their image and models.
     16- A >= 16GB flash drive, SD card, or other removable block storage to install a Rescue Image, NXP Image, and updated device trees (DTBs) onto the board.
     17
     18== Getting Started with the NPU
     19=== 1. Download the Gateworks Venice Rescue Image to removable multimedia.
     20Find which device your flash drive/SD card is. For example, {{{/dev/sdc}}}
     21
     22Then, use dd to flash this image onto your device byte-for-byte.
     23{{{
     24DEVICE=<your device, no trailing />
     25wget https://dev.gateworks.com/buildroot/venice/minimal/rescue.img.gz
     26zcat rescue.img.gz | dd of=${DEVICE} bs=1M oflag=sync
     27}}}
     28
     29In this guide, we will have the NXP image and our rescue image on the same drive, so we will resize the partition and file system to fit both. If you're using separate devices, this is not necessary.
     30
     31{{{
     32parted ${DEVICE} "resizepart 1 -0" # resize the partition to fit the size of the drive
     33resize2fs ${DEVICE}1 # resize ext fs on device partition 1
     34}}}
     35
     36=== 2. Download the NXP BSP evaluation kit image to removable multimedia.
     37On your host machine, install the Linux 6.6.3_1.0.0 image for the i.MX 8M Plus EVK [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | here]].
     38
     39In this download you will find imx-image-full-imx8mpevk.wic, which is a Yocto-generated image with all of the ML libraries.
     40Copy this image to our device.
     41{{{
     42sudo mount ${DEVICE}1 /mnt
     43sudo cp imx-image-full-imx8mpevk.wic /mnt/
     44}}}
     45
     46=== 3. Patch & Build patch Venice DTBs from the Kernel source.
     47Due to small inconsistencies between the NXP and Gateworks devicetrees for bleeding-edge peripherals, a patch is required until mainline compatibility is reached.
     48
     49{{{
     50git clone https://github.com/nxp-imx/linux-imx -b lf-6.6.y
     51cd linux-imx
     52wget <patches>
     53patch -p1 < 0001-arm64-dts-imx8mp-venice-fix-USB_OC-pinmux.patch
     54patch -p1 < 0002-arm64-dts-imx8mm-venice-gw700x-remove-ddrc.patch
     55patch -p1 < 0003-arm64-dts-freescale-add-Gateworks-venice-board-dtbs.patch
     56patch -p1 < 0004-arm64-dts-imx8mp-venice-gw74xx-enable-gpu-nodes.patch
     57ARCH=arm64 make imx_v8_defconfig
     58ARCH=arm64 make dtbs
     59}}}
     60
     61Copy these patched dtbs to a directory on your flash such as {{{/nxp/}}}, as to not overwrite the ones necessary for booting into the rescue image.
     62
     63{{{
     64mkdir /mnt/nxp
     65cp arch/arm64/boot/dts/freescale/*venice*.dtb /mnt/nxp/
     66}}}
     67
     68
     69
     70Now, the contents of the device should include:
     71- Rescue image
     72- Rescue image dtbs
     73- rootfs.cpio.xz and boot.scr for booting Rescue Image
     74- nxp/ with new, updated dtbs
     75
     76=== 4. Boot Rescue Image ramdisk on board
     77Connect the removable multimedia, in our case a USB stick, to the board before powering. Many boards have built-in SD readers, which would change the device commands slightly.
     78
     79Connect serial console via JTAG and power on the board.
     80Enter the U-Boot console by stopping autoboot.
     81
     82Sanity check: is the USB device properly detected?
     83{{{
     84usb start
     85part list usb 0
     86}}}
     87
     88This command should have an expected output like below
     89{{{
     90Partition Map for USB device 0  --   Partition Type: DOS
     91
     92Part    Start Sector    Num Sectors     UUID            Type
     93  1     2048            204800          6fd772a2-01     83 Boot
     94}}}
     95
     96Override the boot_targets variable temporarily to ensure booting into the Rescue Image, then boot into it. If you are not using usb0, run {{{print boot_targets}}} to see a list.
     97
     98{{{
     99setenv boot_targets usb0
     100run bootcmd_${boot_targets}
     101}}}
     102
     103If all functions normally, you should be met with a login; login with root and you will enter the shell.
     104
     105=== Flash NXP .wic and patched DTBs onto eMMC
     106
     107
     108You are now booted into the ramdisk rescue image. The next steps are to flash the .wic onto the emmc.
     109
     110Your multimedia device will likely have a different device name than when it was connected to the host computer; in our case, it is now {{{/dev/sda}}} instead of {{{/dev/sdc}}}. This is expected.
     111{{{
     112DEVICE=<flash device, with no trailing />
     113mkdir /mnt/src
     114mkdir /mnt/dst
     115mount ${DEVICE}1 /mnt/src
     116dd if=/mnt/src/imx-image-full-imx8mpevk.wic of=/dev/mmcblk2 bs=16M oflag=sync
     117mount /dev/mmcblk2p1 /mnt/dst
     118cp /mnt/src/*.dtb /mnt/dst/
     119cp /mnt/src/nxp/*.dtb /mnt/dst/
     120}}}
     121
     122This flashes the prebuilt .wic image (both partitions, the kernel and fs) to our eMMC, then also brings over the old and new device trees. Next, we will create the boot script. If the below doesn't copy right, the file can be created/edited in a text editor like vi; just remove the EOF line.
     123{{{
     124cat <<\EOF > boot.scr.txt
     125setenv bootargs 'root=/dev/mmcblk2p2'
     126load mmc 2:1 $kernel_addr_r Image
     127
     128setenv fdt_addr
     129setenv fdt_list $fdt_file $fdt_file1 $fdt_file2 $fdt_file3 $fdt_file4 $fdt_file5
     130setenv load_fdt 'echo Loading $fdt...; load ${devtype} ${devnum}:${distro_bootpart} ${fdt_addr_r} ${prefix}${fdt} && setenv fdt_addr ${fdt_addr_r}'
     131for fdt in ${fdt_list}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${fdt}; then run load_fdt; fi; done
     132if test -z "$fdt_addr"; then echo "Warning: Using bootloader DTB"; setenv fdt_addr $fdtcontroladdr; fi
     133#Disables PCI; patch is needed, otherwise kernel hangs.
     134fdt addr $fdt_addr_r && fdt resize && fdt set /soc@0/pcie@33800000 status disabled
     135booti $kernel_addr_r - $fdt_addr_r
     136
     137EOF
     138}}}
     139
     140'Compile' the boot script txt and flash it onto the MMC
     141{{{
     142mkimage -A arm64 -T script -C none -d boot.scr.txt /mnt/dst/boot.scr
     143umount /mnt/dst
     144umount /mnt/src
     145}}}
     146
     147=== Boot into the NXP image.
     148
     149Power cycle the board, and note the kernel which it boots into. The board should automatically boot into the eMMC image we just flashed, meaning the removable multimedia need not be connected.
     150
     151If there is an error, look at the logs and the boot scripts in U-Boot.
     152
     153At this point, all features regarding the Kernel and below are properly enabled. If you have an application that uses TensorFlow, it will run on the NPU or GPU using {{{/usr/lib/libvx_delegate.so}}}. Follow the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]] for more information.
     154
     155=== Image Classification Example
     156
     157As per the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]], we will test a simple image labeling script on both the CPU and NPU.
     158
     159{{{
     160$ cd /usr/bin/tensorflow-lite-2.15.0/examples
     161$ python3 label_image.py
     162$ python3 label_image.py -e /usr/lib/libvx_delegate.so
     163}}}
     164
     165Result from either label_image script:
     166{{{
     1670.878431: military uniform
     1680.027451: Windsor tie
     1690.011765: mortarboard
     1700.011765: bulletproof vest
     1710.007843: sax
     172}}}
     173
     174Without the NPU: {{{Inference time: 170.5 ms}}}
     175With the NPU: {{{Inference time: 3.2 ms}}}
     176
     177Without considering the warmup times, this is a >**98% speedup**! For every CPU frame, the NPU can process 53.
     178
     179[[Image(https://i.imgur.com/Jw1JTHp.png)]]
     180
     181=== GStreamer Example
     182
     183Section 8.2 of the Machine Learning Users Guide details this process, such as how to download the necessary models. After following the download steps, the {{{home/root/nxp-nnstreamer-examples/}}} directory on your board should have a {{{downloads}}} directory with {{{models}}} and {{{media}}} directories. If not, you need to run the update script on your host to compile the models and scp them to the board.
     184
     185On your host, execute the following command to have GStreamer take in video over UDP.
     186{{{ gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,payload=96 ! rtpjpegdepay ! jpegdec ! autovideosink }}}
     187
     188On your board, execute the following to send a stream over UDP to the host port 5000. This script was derived from Section 8.1 of the Machine Learning Users Guide.
     189{{{
     190CAMERA=<your camera device, such as /dev/video2>
     191HOST_IP=<desktop ip addr>
     192gst-launch-1.0 v4l2src name=cam_src device=${CAMERA} num-buffers=-1 ! video/x-raw,width=640,height=480,framerate=30/1 ! tee name=t t. ! queue name=thread-nn max-size-buffers=2 leaky=2 ! imxvideoconvert_g2d ! video/x-raw,width=300,height=300,format=RGBA ! videoconvert ! video/x-raw,format=RGB ! tensor_converter ! tensor_filter framework=tensorflow-lite model=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so ! tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/coco_labels_list.txt option3=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/box_priors.txt option4=640:480 option5=300:300 ! videoconvert ! queue ! mix. t. ! queue name=thread-img max-size-buffers=2 leaky=2 ! videoconvert ! mix. imxcompositor_g2d name=mix latency=30000000 min-upstream-latency=30000000 sink_0::zorder=2 sink_1::zorder=1 ! videoconvert ! jpegenc ! rtpjpegpay ! udpsink host=${HOST_IP} port=5000
     193}}}
     194
     195If everything works properly, you should instantly see your video input streamed to your desktop host. After a few seconds of warming up, the bounding boxes from the [[https://nnstreamer.github.io/gst/nnstreamer/README.html | TensorFlow filter]] will be overlaid on the video. The stream properties can be changed for different resolutions and framerates; see [[https://trac.gateworks.com/wiki/Yocto/gstreamer/streaming | gstreamer/streaming]].
     196
     197[[Image(https://i.imgur.com/7KK4Wo8.png)]]
     198
     199
     200
     201 
     202
     203
     204
     205