Changes between Version 24 and Version 25 of venice/npu


Ignore:
Timestamp:
02/26/2026 10:45:59 PM (19 hours ago)
Author:
Ryan Erbstoesser
Comment:

adjust wiki, note for mesa and yocto

Legend:

Unmodified
Added
Removed
Modified
  • venice/npu

    v24 v25  
    1212
    1313[[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/gw74xx_npu_benchmark_new.png)]]
    14 
    15 
    16 == NXP Yocto BSP
    17 The easiest way to get started with the NPU is to use a image from the NXP BSP. This image contains the necessary libraries and kernel to interface the NPU with !TensorFlow without much configuration. You can either [[https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf | follow the guide to build their image]] or [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | download a pre-built one]] (recommended).
    18 
    19 This guide assumes you have:
    20 - A Gateworks board with a i.MX8M Plus procesor.
    21 - A NXP account, which is necessary to download their image and models.
    22 - A >= 16GB flash drive, SD card, or other removable block storage to install a Rescue Image, NXP Image, and updated device trees (DTBs) onto the board.
    23 
    24 The steps are as generalized as possible to not depend on the boards available RAM to load an image, or the low speeds of JTAG uploading, as the .wic from NXP is >8GB. We will use a ramdisk to boot a "rescue image" fully in RAM, then use dd to write from the removable multimedia (flash drive) to the onboard eMMC (/dev/mmcblk2).
    25 
    26 **NOTE**: In the scripts below, we disable PCIe as a temporary fix to prevent the NXP 6.6.3_1.0.0 kernel from hanging on boot. This is caused by a missing patch necessary to work around a PCIe switch quirk when used on the IMX8MP, which can be found specifically [[https://github.com/Gateworks/linux-venice/commit/cf983e4a04eecb5be93af7b53cb10805ee448998|here]] from our kernel.
    27 
    28 === 1. Download the Gateworks Venice Rescue Image to removable multimedia.
    29 Find which device your flash drive/SD card is. For example, {{{/dev/sdc}}}
    30 
    31 Then, use dd to flash this image onto your device byte-for-byte.
    32 {{{#!bash
    33 DEVICE=<your device, no trailing />
    34 wget https://dev.gateworks.com/buildroot/venice/minimal/rescue.img.gz
    35 zcat rescue.img.gz | sudo dd of=${DEVICE} bs=1M oflag=sync
    36 }}}
    37 
    38 In this guide, we will have the NXP image and our rescue image on the same drive, so we will resize the partition and file system to fit both. If you're using separate devices, this is not necessary.
    39 
    40 {{{#!bash
    41 sudo parted ${DEVICE} "resizepart 1 -0" # resize the partition to fit the size of the drive
    42 sudo resize2fs ${DEVICE}1 # resize ext fs on device partition 1
    43 }}}
    44 
    45 === 2. Download the NXP BSP evaluation kit image to removable multimedia.
    46 On your host machine, install the Linux 6.6.3_1.0.0 image for the i.MX 8M Plus EVK [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | here]].
    47 
    48 In this download you will find imx-image-full-imx8mpevk.wic, which is a Yocto-generated image with all of the ML libraries.
    49 Copy this image to our device.
    50 {{{#!bash
    51 sudo mount ${DEVICE}1 /mnt
    52 sudo cp imx-image-full-imx8mpevk.wic /mnt/
    53 }}}
    54 
    55 === 3. Patch & Build patch Venice DTBs from the Kernel source.
    56 Due to small inconsistencies between the NXP and Gateworks devicetrees for bleeding-edge peripherals, a patch is required until mainline compatibility is reached. The below script gets the patches from the attachments at the bottom of this page.
    57 
    58 {{{#!bash
    59 git clone https://github.com/nxp-imx/linux-imx -b lf-6.6.y
    60 cd linux-imx
    61 wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0001-arm64-dts-imx8mp-venice-fix-USB_OC-pinmux.patch
    62 wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0002-arm64-dts-imx8mm-venice-gw700x-remove-ddrc.patch
    63 wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0003-arm64-dts-freescale-add-Gateworks-venice-board-dtbs.patch
    64 wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0004-arm64-dts-imx8mp-venice-gw74xx-enable-gpu-nodes.patch
    65 patch -p1 < 0001-arm64-dts-imx8mp-venice-fix-USB_OC-pinmux.patch
    66 patch -p1 < 0002-arm64-dts-imx8mm-venice-gw700x-remove-ddrc.patch
    67 patch -p1 < 0003-arm64-dts-freescale-add-Gateworks-venice-board-dtbs.patch
    68 patch -p1 < 0004-arm64-dts-imx8mp-venice-gw74xx-enable-gpu-nodes.patch
    69 ARCH=arm64 make imx_v8_defconfig
    70 ARCH=arm64 make dtbs
    71 }}}
    72 
    73 Copy these patched dtbs to a directory on your flash such as {{{/nxp/}}}, as to not overwrite the ones necessary for booting into the rescue image.
    74 
    75 {{{#!bash
    76 sudo mkdir /mnt/nxp
    77 sudo cp arch/arm64/boot/dts/freescale/*venice*.dtb /mnt/nxp/
    78 }}}
    79 
    80 Now, the contents of the device should include:
    81 - Rescue image
    82 - Rescue image dtbs
    83 - rootfs.cpio.xz and boot.scr for booting Rescue Image
    84 - nxp/ with new, updated dtbs
    85 
    86 === 4. Boot Rescue Image ramdisk on board
    87 Connect the removable multimedia, in our case a USB stick, to the board before powering. Many boards have built-in SD readers, which would change the device commands slightly.
    88 
    89 Connect serial console via JTAG and power on the board.
    90 Enter the U-Boot console by stopping autoboot.
    91 
    92 Sanity check: is the USB device properly detected?
    93 {{{#!bash
    94 usb start
    95 part list usb 0
    96 }}}
    97 
    98 This command should have an expected output like below
    99 {{{#!bash
    100 Partition Map for USB device 0  --   Partition Type: DOS
    101 
    102 Part    Start Sector    Num Sectors     UUID            Type
    103   1     2048            204800          6fd772a2-01     83 Boot
    104 }}}
    105 
    106 Override the boot_targets variable temporarily to ensure booting into the Rescue Image, then boot into it. If you are not using usb0, run {{{print boot_targets}}} to see a list.
    107 
    108 {{{#!bash
    109 setenv boot_targets usb0
    110 run bootcmd
    111 }}}
    112 
    113 If all functions normally, you should be met with a login; login with root and you will enter the shell.
    114 
    115 === Flash NXP .wic and patched DTBs onto eMMC
    116 You are now booted into the ramdisk rescue image. The next steps are to flash the .wic onto the emmc.
    117 
    118 For venice boards the emmc that we are imaging is {{{/dev/mmcblk2}}} and with only one removable storage device your rescue image with be {{{/dev/sda}}}.
    119 
    120 Image the emmc as followes:
    121 {{{#!bash
    122 mkdir /mnt/src
    123 mkdir /mnt/dst
    124 mount /dev/sda1 /mnt/src
    125 dd if=/mnt/src/imx-image-full-imx8mpevk.wic of=/dev/mmcblk2 bs=16M oflag=sync # this will take a couple of minutes
    126 mount /dev/mmcblk2p1 /mnt/dst
    127 cp /mnt/src/*.dtb /mnt/dst/
    128 cp /mnt/src/nxp/*.dtb /mnt/dst/
    129 }}}
    130 
    131 This flashes the prebuilt .wic image (both partitions, the kernel and fs) to our eMMC, then also brings over the old and new device trees. Next, we will create the boot script. If the below doesn't copy right, the file can be created/edited in a text editor like vi; just remove the EOF line.
    132 {{{#!bash
    133 cat <<\EOF > boot.scr.txt
    134 setenv bootargs 'root=/dev/mmcblk2p2'
    135 load mmc 2:1 $kernel_addr_r Image
    136 
    137 setenv fdt_addr
    138 setenv fdt_list $fdt_file $fdt_file1 $fdt_file2 $fdt_file3 $fdt_file4 $fdt_file5
    139 setenv load_fdt 'echo Loading $fdt...; load ${devtype} ${devnum}:${distro_bootpart} ${fdt_addr_r} ${prefix}${fdt} && setenv fdt_addr ${fdt_addr_r}'
    140 for fdt in ${fdt_list}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${fdt}; then run load_fdt; fi; done
    141 if test -z "$fdt_addr"; then echo "Warning: Using bootloader DTB"; setenv fdt_addr $fdtcontroladdr; fi
    142 #Disables PCI; patch is needed, otherwise kernel hangs: See note at start of wiki page.
    143 fdt addr $fdt_addr_r && fdt resize && fdt set /soc@0/pcie@33800000 status disabled
    144 booti $kernel_addr_r - $fdt_addr_r
    145 
    146 EOF
    147 }}}
    148 
    149 'Compile' the boot script txt and flash it onto the MMC
    150 {{{#!bash
    151 mkimage -A arm64 -T script -C none -d boot.scr.txt /mnt/dst/boot.scr
    152 umount /mnt/dst
    153 umount /mnt/src
    154 }}}
    155 
    156 === Boot into the NXP image.
    157 
    158 Power cycle the board, and note the kernel which it boots into. The board should automatically boot into the eMMC image we just flashed, meaning the removable multimedia need not be connected.
    159 
    160 If there is an error, look at the logs and the boot scripts in U-Boot.
    161 
    162 At this point, all features regarding the Kernel and below are properly enabled. If you have an application that uses !TensorFlow, it will run on the NPU or GPU using {{{/usr/lib/libvx_delegate.so}}}. Follow the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]] for more information.
    163 
    164 === Image Classification Example
    165 
    166 As per the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]], we will test a simple image labeling script on both the CPU and NPU.
    167 
    168 {{{#!bash
    169 $ cd /usr/bin/tensorflow-lite-2.15.0/examples
    170 $ python3 label_image.py # without NPU acceleration
    171 $ python3 label_image.py -e /usr/lib/libvx_delegate.so # with NPU accerlation via the libvx_delegate external TensorFlow delegate
    172 }}}
    173 
    174 Result from either label_image script:
    175 {{{#!bash
    176 0.878431: military uniform
    177 0.027451: Windsor tie
    178 0.011765: mortarboard
    179 0.011765: bulletproof vest
    180 0.007843: sax
    181 }}}
    182 
    183 Without the NPU: {{{Image Classification time: 170.5 ms}}}
    184 With the NPU: {{{Image Classification: 3.2 ms}}}
    185 
    186 Without considering the warmup times, this is a >**98% speedup**! For every CPU frame, the NPU can process 53.
    187 
    188 This data is derived from {{{classifications/sec = 1/(image classification time)}}}
    189 
    190 [[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/gw74xx_npu_benchmark_new.png)]]
    191 
    192 === GStreamer Example for Detection
    193 
    194 Section 8.2 of the Machine Learning Users Guide details this process, such as how to download the necessary models. After following the download steps, the {{{home/root/nxp-nnstreamer-examples/}}} directory on your board should have a {{{downloads}}} directory with {{{models}}} and {{{media}}} directories. If not, you need to run the update script on your host to compile the models and scp them to the board.
    195 
    196 On your host, execute the following command to have GStreamer take in video over UDP.
    197 {{{ gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,payload=96 ! rtpjpegdepay ! jpegdec ! autovideosink }}}
    198 
    199 [[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/hostpipeline.svg, width=1200)]]
    200 \\Host GStreamer pipeline (SVG)
    201 
    202 
    203 On your board, execute the following to send a stream over UDP to the host port 5000. This script was derived from Section 8.1 of the Machine Learning Users Guide. The GStreamer command takes in a video input and overlays both bounding boxes and labels on it using !TensorFlow and NXP filters.
    204 {{{#!bash
    205 CAMERA= <your camera device, such as /dev/video2>
    206 HOST_IP= <desktop ip addr>
    207 gst-launch-1.0 v4l2src name=cam_src device=${CAMERA} num-buffers=-1
    208 ! video/x-raw,width=640,height=480,framerate=30/1
    209 ! tee name=t t.
    210 ! queue name=thread-nn max-size-buffers=2 leaky=2 ! imxvideoconvert_g2d
    211 ! video/x-raw,width=300,height=300,format=RGBA ! videoconvert
    212 ! video/x-raw,format=RGB ! tensor_converter
    213 ! tensor_filter framework=tensorflow-lite model=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so
    214 ! tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/coco_labels_list.txt option3=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/box_priors.txt option4=640:480 option5=300:300
    215 ! videoconvert ! queue
    216 ! mix. t.
    217 ! queue name=thread-img max-size-buffers=2 leaky=2 ! videoconvert
    218 ! mix. imxcompositor_g2d name=mix latency=30000000 min-upstream-latency=30000000 sink_0::zorder=2 sink_1::zorder=1
    219 ! videoconvert ! jpegenc ! rtpjpegpay ! udpsink host=${HOST_IP} port=5000
    220 }}}
    221 
    222 [[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/pipeline.svg, width=1200)]]
    223 \\GW74xx AI-detection GStreamer pipeline (SVG)
    224 
    225 
    226 
    227 If everything works properly, you should instantly see your video input streamed to your desktop host. After a few seconds of warming up, the bounding boxes from the [[https://nnstreamer.github.io/gst/nnstreamer/README.html | TensorFlow Detection filter]] will be overlaid on the video. The stream properties can be changed for different resolutions and framerates; see [[https://trac.gateworks.com/wiki/Yocto/gstreamer/streaming | gstreamer/streaming]]. NOTE: This example is object detection, which differs from the image classification that we got benchmark data from in the previous section.
    228 
    229 [[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/imx8mp_border.png)]]
    230 
    23114
    23215
     
    370153== Other AI Chipsets and Solutions
    371154
    372 Please note other AI accelerators can also be added via expansion slots described [wiki:TPU here]
    373 
    374 
    375 
     155Please note other AI accelerators can also be added via expansion slots described [wiki:TPU here]
     156
     157
     158
     159
     160== NXP Yocto BSP
     161Another option to explore the NPU is to use an image from the NXP BSP.
     162
     163'''NOTE''' Gateworks typically uses a mainline kernel and Ubuntu, etc. If using the NXP image, it is a completely different eco-system that will require a lot of learning, etc. It may be handy to explore the NPU, but for a larger project, the mainline MESA option above may be better.
     164
     165This NXP image contains the necessary libraries and kernel to interface the NPU with !TensorFlow without much configuration. You can either [[https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf | follow the guide to build their image]] or [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | download a pre-built one]] (recommended).
     166
     167This guide assumes you have:
     168- A Gateworks board with a i.MX8M Plus procesor.
     169- A NXP account, which is necessary to download their image and models.
     170- A >= 16GB flash drive, SD card, or other removable block storage to install a Rescue Image, NXP Image, and updated device trees (DTBs) onto the board.
     171
     172The steps are as generalized as possible to not depend on the boards available RAM to load an image, or the low speeds of JTAG uploading, as the .wic from NXP is >8GB. We will use a ramdisk to boot a "rescue image" fully in RAM, then use dd to write from the removable multimedia (flash drive) to the onboard eMMC (/dev/mmcblk2).
     173
     174**NOTE**: In the scripts below, we disable PCIe as a temporary fix to prevent the NXP 6.6.3_1.0.0 kernel from hanging on boot. This is caused by a missing patch necessary to work around a PCIe switch quirk when used on the IMX8MP, which can be found specifically [[https://github.com/Gateworks/linux-venice/commit/cf983e4a04eecb5be93af7b53cb10805ee448998|here]] from our kernel.
     175
     176=== 1. Download the Gateworks Venice Rescue Image to removable multimedia.
     177Find which device your flash drive/SD card is. For example, {{{/dev/sdc}}}
     178
     179Then, use dd to flash this image onto your device byte-for-byte.
     180{{{#!bash
     181DEVICE=<your device, no trailing />
     182wget https://dev.gateworks.com/buildroot/venice/minimal/rescue.img.gz
     183zcat rescue.img.gz | sudo dd of=${DEVICE} bs=1M oflag=sync
     184}}}
     185
     186In this guide, we will have the NXP image and our rescue image on the same drive, so we will resize the partition and file system to fit both. If you're using separate devices, this is not necessary.
     187
     188{{{#!bash
     189sudo parted ${DEVICE} "resizepart 1 -0" # resize the partition to fit the size of the drive
     190sudo resize2fs ${DEVICE}1 # resize ext fs on device partition 1
     191}}}
     192
     193=== 2. Download the NXP BSP evaluation kit image to removable multimedia.
     194On your host machine, install the Linux 6.6.3_1.0.0 image for the i.MX 8M Plus EVK [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | here]].
     195
     196In this download you will find imx-image-full-imx8mpevk.wic, which is a Yocto-generated image with all of the ML libraries.
     197Copy this image to our device.
     198{{{#!bash
     199sudo mount ${DEVICE}1 /mnt
     200sudo cp imx-image-full-imx8mpevk.wic /mnt/
     201}}}
     202
     203=== 3. Patch & Build patch Venice DTBs from the Kernel source.
     204Due to small inconsistencies between the NXP and Gateworks devicetrees for bleeding-edge peripherals, a patch is required until mainline compatibility is reached. The below script gets the patches from the attachments at the bottom of this page.
     205
     206{{{#!bash
     207git clone https://github.com/nxp-imx/linux-imx -b lf-6.6.y
     208cd linux-imx
     209wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0001-arm64-dts-imx8mp-venice-fix-USB_OC-pinmux.patch
     210wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0002-arm64-dts-imx8mm-venice-gw700x-remove-ddrc.patch
     211wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0003-arm64-dts-freescale-add-Gateworks-venice-board-dtbs.patch
     212wget https://trac.gateworks.com/raw-attachment/wiki/venice/npu/0004-arm64-dts-imx8mp-venice-gw74xx-enable-gpu-nodes.patch
     213patch -p1 < 0001-arm64-dts-imx8mp-venice-fix-USB_OC-pinmux.patch
     214patch -p1 < 0002-arm64-dts-imx8mm-venice-gw700x-remove-ddrc.patch
     215patch -p1 < 0003-arm64-dts-freescale-add-Gateworks-venice-board-dtbs.patch
     216patch -p1 < 0004-arm64-dts-imx8mp-venice-gw74xx-enable-gpu-nodes.patch
     217ARCH=arm64 make imx_v8_defconfig
     218ARCH=arm64 make dtbs
     219}}}
     220
     221Copy these patched dtbs to a directory on your flash such as {{{/nxp/}}}, as to not overwrite the ones necessary for booting into the rescue image.
     222
     223{{{#!bash
     224sudo mkdir /mnt/nxp
     225sudo cp arch/arm64/boot/dts/freescale/*venice*.dtb /mnt/nxp/
     226}}}
     227
     228Now, the contents of the device should include:
     229- Rescue image
     230- Rescue image dtbs
     231- rootfs.cpio.xz and boot.scr for booting Rescue Image
     232- nxp/ with new, updated dtbs
     233
     234=== 4. Boot Rescue Image ramdisk on board
     235Connect the removable multimedia, in our case a USB stick, to the board before powering. Many boards have built-in SD readers, which would change the device commands slightly.
     236
     237Connect serial console via JTAG and power on the board.
     238Enter the U-Boot console by stopping autoboot.
     239
     240Sanity check: is the USB device properly detected?
     241{{{#!bash
     242usb start
     243part list usb 0
     244}}}
     245
     246This command should have an expected output like below
     247{{{#!bash
     248Partition Map for USB device 0  --   Partition Type: DOS
     249
     250Part    Start Sector    Num Sectors     UUID            Type
     251  1     2048            204800          6fd772a2-01     83 Boot
     252}}}
     253
     254Override the boot_targets variable temporarily to ensure booting into the Rescue Image, then boot into it. If you are not using usb0, run {{{print boot_targets}}} to see a list.
     255
     256{{{#!bash
     257setenv boot_targets usb0
     258run bootcmd
     259}}}
     260
     261If all functions normally, you should be met with a login; login with root and you will enter the shell.
     262
     263=== Flash NXP .wic and patched DTBs onto eMMC
     264You are now booted into the ramdisk rescue image. The next steps are to flash the .wic onto the emmc.
     265
     266For venice boards the emmc that we are imaging is {{{/dev/mmcblk2}}} and with only one removable storage device your rescue image with be {{{/dev/sda}}}.
     267
     268Image the emmc as followes:
     269{{{#!bash
     270mkdir /mnt/src
     271mkdir /mnt/dst
     272mount /dev/sda1 /mnt/src
     273dd if=/mnt/src/imx-image-full-imx8mpevk.wic of=/dev/mmcblk2 bs=16M oflag=sync # this will take a couple of minutes
     274mount /dev/mmcblk2p1 /mnt/dst
     275cp /mnt/src/*.dtb /mnt/dst/
     276cp /mnt/src/nxp/*.dtb /mnt/dst/
     277}}}
     278
     279This flashes the prebuilt .wic image (both partitions, the kernel and fs) to our eMMC, then also brings over the old and new device trees. Next, we will create the boot script. If the below doesn't copy right, the file can be created/edited in a text editor like vi; just remove the EOF line.
     280{{{#!bash
     281cat <<\EOF > boot.scr.txt
     282setenv bootargs 'root=/dev/mmcblk2p2'
     283load mmc 2:1 $kernel_addr_r Image
     284
     285setenv fdt_addr
     286setenv fdt_list $fdt_file $fdt_file1 $fdt_file2 $fdt_file3 $fdt_file4 $fdt_file5
     287setenv load_fdt 'echo Loading $fdt...; load ${devtype} ${devnum}:${distro_bootpart} ${fdt_addr_r} ${prefix}${fdt} && setenv fdt_addr ${fdt_addr_r}'
     288for fdt in ${fdt_list}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${fdt}; then run load_fdt; fi; done
     289if test -z "$fdt_addr"; then echo "Warning: Using bootloader DTB"; setenv fdt_addr $fdtcontroladdr; fi
     290#Disables PCI; patch is needed, otherwise kernel hangs: See note at start of wiki page.
     291fdt addr $fdt_addr_r && fdt resize && fdt set /soc@0/pcie@33800000 status disabled
     292booti $kernel_addr_r - $fdt_addr_r
     293
     294EOF
     295}}}
     296
     297'Compile' the boot script txt and flash it onto the MMC
     298{{{#!bash
     299mkimage -A arm64 -T script -C none -d boot.scr.txt /mnt/dst/boot.scr
     300umount /mnt/dst
     301umount /mnt/src
     302}}}
     303
     304=== Boot into the NXP image.
     305
     306Power cycle the board, and note the kernel which it boots into. The board should automatically boot into the eMMC image we just flashed, meaning the removable multimedia need not be connected.
     307
     308If there is an error, look at the logs and the boot scripts in U-Boot.
     309
     310At this point, all features regarding the Kernel and below are properly enabled. If you have an application that uses !TensorFlow, it will run on the NPU or GPU using {{{/usr/lib/libvx_delegate.so}}}. Follow the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]] for more information.
     311
     312=== Image Classification Example
     313
     314As per the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]], we will test a simple image labeling script on both the CPU and NPU.
     315
     316{{{#!bash
     317$ cd /usr/bin/tensorflow-lite-2.15.0/examples
     318$ python3 label_image.py # without NPU acceleration
     319$ python3 label_image.py -e /usr/lib/libvx_delegate.so # with NPU accerlation via the libvx_delegate external TensorFlow delegate
     320}}}
     321
     322Result from either label_image script:
     323{{{#!bash
     3240.878431: military uniform
     3250.027451: Windsor tie
     3260.011765: mortarboard
     3270.011765: bulletproof vest
     3280.007843: sax
     329}}}
     330
     331Without the NPU: {{{Image Classification time: 170.5 ms}}}
     332With the NPU: {{{Image Classification: 3.2 ms}}}
     333
     334Without considering the warmup times, this is a >**98% speedup**! For every CPU frame, the NPU can process 53.
     335
     336This data is derived from {{{classifications/sec = 1/(image classification time)}}}
     337
     338[[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/gw74xx_npu_benchmark_new.png)]]
     339
     340=== GStreamer Example for Detection
     341
     342Section 8.2 of the Machine Learning Users Guide details this process, such as how to download the necessary models. After following the download steps, the {{{home/root/nxp-nnstreamer-examples/}}} directory on your board should have a {{{downloads}}} directory with {{{models}}} and {{{media}}} directories. If not, you need to run the update script on your host to compile the models and scp them to the board.
     343
     344On your host, execute the following command to have GStreamer take in video over UDP.
     345{{{ gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,payload=96 ! rtpjpegdepay ! jpegdec ! autovideosink }}}
     346
     347[[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/hostpipeline.svg, width=1200)]]
     348\\Host GStreamer pipeline (SVG)
     349
     350
     351On your board, execute the following to send a stream over UDP to the host port 5000. This script was derived from Section 8.1 of the Machine Learning Users Guide. The GStreamer command takes in a video input and overlays both bounding boxes and labels on it using !TensorFlow and NXP filters.
     352{{{#!bash
     353CAMERA= <your camera device, such as /dev/video2>
     354HOST_IP= <desktop ip addr>
     355gst-launch-1.0 v4l2src name=cam_src device=${CAMERA} num-buffers=-1
     356! video/x-raw,width=640,height=480,framerate=30/1
     357! tee name=t t.
     358! queue name=thread-nn max-size-buffers=2 leaky=2 ! imxvideoconvert_g2d
     359! video/x-raw,width=300,height=300,format=RGBA ! videoconvert
     360! video/x-raw,format=RGB ! tensor_converter
     361! tensor_filter framework=tensorflow-lite model=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so
     362! tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/coco_labels_list.txt option3=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/box_priors.txt option4=640:480 option5=300:300
     363! videoconvert ! queue
     364! mix. t.
     365! queue name=thread-img max-size-buffers=2 leaky=2 ! videoconvert
     366! mix. imxcompositor_g2d name=mix latency=30000000 min-upstream-latency=30000000 sink_0::zorder=2 sink_1::zorder=1
     367! videoconvert ! jpegenc ! rtpjpegpay ! udpsink host=${HOST_IP} port=5000
     368}}}
     369
     370[[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/pipeline.svg, width=1200)]]
     371\\GW74xx AI-detection GStreamer pipeline (SVG)
     372
     373
     374
     375If everything works properly, you should instantly see your video input streamed to your desktop host. After a few seconds of warming up, the bounding boxes from the [[https://nnstreamer.github.io/gst/nnstreamer/README.html | TensorFlow Detection filter]] will be overlaid on the video. The stream properties can be changed for different resolutions and framerates; see [[https://trac.gateworks.com/wiki/Yocto/gstreamer/streaming | gstreamer/streaming]]. NOTE: This example is object detection, which differs from the image classification that we got benchmark data from in the previous section.
     376
     377[[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/imx8mp_border.png)]]
     378
     379
     380
     381