Changes between Version 8 and Version 9 of venice/npu


Ignore:
Timestamp:
08/12/2024 10:12:07 PM (4 months ago)
Author:
Blake Stewart
Comment:

Added clarity to PCIe disabling in NXP image and formatting fixes

Legend:

Unmodified
Added
Removed
Modified
  • venice/npu

    v8 v9  
    99[[Image(https://trac.gateworks.com/raw-attachment/wiki/venice/npu/gw74xx_npu_benchmark_new.png)]]
    1010
    11 The easiest way to get started with the NPU is to use a image from the NXP BSP. This image contains the necessary libraries and kernel to interface the NPU with TensorFlow without much configuration. You can either [[https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf | follow the guide to build their image]] or [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | download a pre-built one]] (recommended).
     11The easiest way to get started with the NPU is to use a image from the NXP BSP. This image contains the necessary libraries and kernel to interface the NPU with !TensorFlow without much configuration. You can either [[https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf | follow the guide to build their image]] or [[https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/embedded-linux-for-i-mx-applications-processors:IMXLINUX | download a pre-built one]] (recommended).
    1212
    1313This guide assumes you have:
     
    1818The steps are as generalized as possible to not depend on the boards available RAM to load an image, or the low speeds of JTAG uploading, as the .wic from NXP is >8GB. We will use a ramdisk to boot a "rescue image" fully in RAM, then use dd to write from the removable multimedia (flash drive) to the onboard eMMC (/dev/mmcblk2).
    1919
     20**NOTE**: In the scripts below, we disable PCIe as a temporary fix to prevent the NXP 6.6.3_1.0.0 kernel from hanging on boot. This is caused by a missing patch necessary for the PCIe switch, which can be found specifically [[https://github.com/Gateworks/linux-venice/commit/cf983e4a04eecb5be93af7b53cb10805ee448998|here]] from our kernel.
    2021== Getting Started with the NPU
    2122=== 1. Download the Gateworks Venice Rescue Image to removable multimedia.
     
    136137for fdt in ${fdt_list}; do if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${fdt}; then run load_fdt; fi; done
    137138if test -z "$fdt_addr"; then echo "Warning: Using bootloader DTB"; setenv fdt_addr $fdtcontroladdr; fi
    138 #Disables PCI; patch is needed, otherwise kernel hangs.
     139#Disables PCI; patch is needed, otherwise kernel hangs: See note at start of wiki page.
    139140fdt addr $fdt_addr_r && fdt resize && fdt set /soc@0/pcie@33800000 status disabled
    140141booti $kernel_addr_r - $fdt_addr_r
     
    156157If there is an error, look at the logs and the boot scripts in U-Boot.
    157158
    158 At this point, all features regarding the Kernel and below are properly enabled. If you have an application that uses TensorFlow, it will run on the NPU or GPU using {{{/usr/lib/libvx_delegate.so}}}. Follow the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]] for more information.
     159At this point, all features regarding the Kernel and below are properly enabled. If you have an application that uses !TensorFlow, it will run on the NPU or GPU using {{{/usr/lib/libvx_delegate.so}}}. Follow the [[https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf | NXP Machine Learning User's Guide]] for more information.
    159160
    160161=== Image Classification Example
     
    193194{{{ gst-launch-1.0 udpsrc port=5000 ! application/x-rtp,payload=96 ! rtpjpegdepay ! jpegdec ! autovideosink }}}
    194195
    195 On your board, execute the following to send a stream over UDP to the host port 5000. This script was derived from Section 8.1 of the Machine Learning Users Guide.
    196 {{{
    197 CAMERA=<your camera device, such as /dev/video2>
    198 HOST_IP=<desktop ip addr>
    199 gst-launch-1.0 v4l2src name=cam_src device=${CAMERA} num-buffers=-1 ! video/x-raw,width=640,height=480,framerate=30/1 ! tee name=t t. ! queue name=thread-nn max-size-buffers=2 leaky=2 ! imxvideoconvert_g2d ! video/x-raw,width=300,height=300,format=RGBA ! videoconvert ! video/x-raw,format=RGB ! tensor_converter ! tensor_filter framework=tensorflow-lite model=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so ! tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/coco_labels_list.txt option3=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/box_priors.txt option4=640:480 option5=300:300 ! videoconvert ! queue ! mix. t. ! queue name=thread-img max-size-buffers=2 leaky=2 ! videoconvert ! mix. imxcompositor_g2d name=mix latency=30000000 min-upstream-latency=30000000 sink_0::zorder=2 sink_1::zorder=1 ! videoconvert ! jpegenc ! rtpjpegpay ! udpsink host=${HOST_IP} port=5000
    200 }}}
     196
     197On your board, execute the following to send a stream over UDP to the host port 5000. This script was derived from Section 8.1 of the Machine Learning Users Guide. The GStreamer command takes in a video input and overlays both bounding boxes and labels on it using !TensorFlow and NXP filters.
     198{{{
     199CAMERA= <your camera device, such as /dev/video2>
     200HOST_IP= <desktop ip addr>
     201gst-launch-1.0 v4l2src name=cam_src device=${CAMERA} num-buffers=-1
     202! video/x-raw,width=640,height=480,framerate=30/1
     203! tee name=t t.
     204! queue name=thread-nn max-size-buffers=2 leaky=2 ! imxvideoconvert_g2d
     205! video/x-raw,width=300,height=300,format=RGBA ! videoconvert
     206! video/x-raw,format=RGB ! tensor_converter
     207! tensor_filter framework=tensorflow-lite model=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/ssdlite_mobilenet_v2_coco_quant_uint8_float32_no_postprocess.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so
     208! tensor_decoder mode=bounding_boxes option1=mobilenet-ssd option2=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/coco_labels_list.txt option3=/home/root/nxp-nnstreamer-examples/detection/../downloads/models/detection/box_priors.txt option4=640:480 option5=300:300
     209! videoconvert ! queue
     210! mix. t.
     211! queue name=thread-img max-size-buffers=2 leaky=2 ! videoconvert
     212! mix. imxcompositor_g2d name=mix latency=30000000 min-upstream-latency=30000000 sink_0::zorder=2 sink_1::zorder=1
     213! videoconvert ! jpegenc ! rtpjpegpay ! udpsink host=${HOST_IP} port=5000
     214}}}
     215
     216
    201217
    202218If everything works properly, you should instantly see your video input streamed to your desktop host. After a few seconds of warming up, the bounding boxes from the [[https://nnstreamer.github.io/gst/nnstreamer/README.html | TensorFlow Detection filter]] will be overlaid on the video. The stream properties can be changed for different resolutions and framerates; see [[https://trac.gateworks.com/wiki/Yocto/gstreamer/streaming | gstreamer/streaming]]. NOTE: This example is object detection, which differs from the image classification that we got benchmark data from in the previous section.