Context Navigation

Changes between Version 9 and Version 10 of expansion/gw16168

Timestamp:: 06/05/2026 08:48:20 PM (8 weeks ago)
Author:: Tim Harvey
Comment:: added gstreamer plugin details and some image and video detection examples

Legend:

: Unmodified
: Added
: Removed
: Modified

expansion/gw16168

-              v9
+              v10
  - on bootup make sure you wait for the console messages indicating the Proxy is launched before using it as it can take a couple of minutes
  - the binary tools and libs are all currently dynamic linked against stdlibc
  - the GStreamer libs have compatibility issues with modern GStreamer
+ - the GStreamer libs require GStreamer 1.26 or newer
 Verification steps:
 …
 }}}
+[=#gstreamer
+=== GStreamer plugins
+The rt-sdk-ara2 provides a set of gstreamer plugins for inference:
+ - dvPre
+ - dvInf
+ - dvPost
+Without more documentation or source for these its likely best to think of them as: dvPre prepares buffers, dvInf hands them off to the NPU and dvPost processes the response.
+The dvPre element must have 32bit pixel samples (ie format=BGRA using 4 bytes per pixel, blue, green, red, alpha; alpha byte is completely empty padding data not used for transparency just as a structural spacer), not 24-bit format=RGB (3 bytes one for red, green, blue).
+All three elements require the model specified via the 'model' property. If using yolov8x for example you would specify the path to the yolov8x.dvm
+For detection models the dvPost element frame data will contain a buffer with number of bytes (32bit) followed by a series of detection structures containing the bounding box, confidence level, and COCO class ID of the object detected.
+The units for the bounding box are relative to the models size and will need to be scaled back to your original image size. For example the YOLO models operate on 640x640 pixel data. You can pass something larger in and it will essentially tile but its unclear if there is an advantage of doing that.
+The gstreamer plugins are currently provided as binary only shared objects. They are linked against stdlibc (libc.so.6) and libgstreamer-1.0.so.0 and compatible with GStreamer 1.26 or newer.
+If you are using a rootfs that does not have GStreamer 1.26 you will need to build it or provide it via virtualization. For example Ubuntu 24.x Noble has GStreamer 1.24, Ubuntu 25.x has GStreamer 1.26 and Ubuntu 26.x Ocelot has GStreamer 1.28. So if you were running Ubuntu Noble you could use distrobox/docker to install GStreamer 1.26 and its dependencies using Ubuntu 25.x.
+Examples:
+ * Ubuntu noble (24.04):
+  - Ubuntu noble has GStreamer 1.24 which is not compatible with the 1.26 plugins
+  - one solution could be a GStreamer 1.26 PPA backport but we have not found any
+  - one solution is a containerized Ubuntu 25.04 container on Ubuntu 24.04 rootfs:
+{{{#!bash
+apt update && apt install -y distrobox docker.io
+# Create a 25.04 container that can see your hardware
+distrobox create --image ubuntu:25.04 --name gst126 \  --volume /usr/lib/gstreamer-1.0:/opt/ara2/plugins:ro \
+  --volume /usr/lib:/opt/ara2/libs:ro \
+  --volume /usr/share/cnn:/usr/share/cnn \
+  --volume /usr/share/llm:/usr/share/llm \
+  --volume /dev/bus/usb:/dev/bus/usb
+# enter the container to use it
+distrobox enter gst126
+# export vars via ~/.bashrc (exit and enter the distrobox to take effect)
+echo "export GST_PLUGIN_PATH=/opt/ara2/plugins" >> ~/.bashrc
+echo "export LD_LIBRARY_PATH=/opt/ara2/libs:\$LD_LIBRARY_PATH" >> ~/.bashrc
+}}}
+   - whenever using the ARA plugins you will need to make sure you do so in the gst126 environment
+   - the volume param creates bind mounts between the host and the virtual target
+   - you can also always access the host rootfs via /run/host
+   - also make sure you install gstreamer and anything that uses it within that virtual environment
+   - this uses virtualization, not emulation - there is no performance hit or latency added, its just a different set of executables
+   - disk space for the ubuntu 25.04 base above is about 1.54GB
+ * Ubuntu 26.04 resolute
+  - Ubuntu resolute (26.04) has GStreamer 1.28 which the 1.26 plugins are backwards compatible with
+  - gstreamer 1.28 decodebin is picking hardware-accelerated v4l2jpegdec (on Venice) instead of the standard software decoder jpegdec and v4l2jpegdec does not support YUV3 (typical for standard JPEG images) so if using it you will need to take steps to disable it or prefer jpegdec over it. For example you can use GST_PLUGIN_FEATURE_RANK="v4l2jpegdec:NONE" or set the rank at runtime such is done in the detection examples below
+Install GStreamer:
+{{{#!bash
+apt-get update && apt install -y \
+   gstreamer1.0-x \
+   gstreamer1.0-tools \
+   gstreamer1.0-plugins-base \
+   gstreamer1.0-plugins-good \
+   gstreamer1.0-plugins-bad \
+   gstreamer1.0-plugins-ugly \
+   gstreamer1.0-libav \
+   v4l-utils
+}}}
+ - this adds about 500MiB of disk space
+Specify Plugin path:
+{{{#!bash
+# export now to current shell
+export GST_PLUGIN_PATH=/usr/lib/gstreamer-1.0/
+# put in .bashrc so it happens for any new bash shell
+echo "export GST_PLUGIN_PATH=/usr/lib/gstreamer-1.0/" >> ~/.bashrc
+}}}
+ - this tells GStreamer to look for plugins in the non-standard location of the ARA gstreamer plugins
+At this point you should be able to inspect the dvPre, dvInf, and dvPost elements:
+{{{#!bash
+gst-inspect-1.0 dvPre
+gst-inspect-1.0 dvInf
+gst-inspect-1.0 dvPost
+}}}
+=== Detection Examples
+Examples:
+ * gst-launch pipeline prototypeing:
+  - enabling debug level 6 on dvPost will show the number of object detections in its debug output but if you want to do anything with that data you need to write an application that can decode frame buffers. Still this is useful for prototyping:
+   * perform detection on a v4l2 video device like a webcam:
+{{{#!bash
+DEV=/dev/video2
+MODEL=/usr/share/cnn/detection/yolov8n/model.dvm
+GST_DEBUG="dvPost:6" \
+gst-launch-1.0 -v \
+  v4l2src device=$DEV ! \
+  video/x-raw,width=640,height=480,framerate=30/1 ! \
+  videoconvert ! video/x-raw,format=BGRA ! \
+  dvPre model=$MODEL ! \
+  dvInf model=$MODEL sock=/var/run/proxy.sock use-shm=false ! \
+  dvPost model=$MODEL ! \
+  fakesink sync=false | grep Detected
+}}}
+    - see wiki:linux/persistent_device_naming#video for details about making video devices have persistent device names
+   * perform a detection on an image:
+{{{#!bash
+URI=file:///$PWD/traffic.png
+MODEL=/usr/share/cnn/detection/yolov8n/model.dvm
+GST_DEBUG="dvPost:6" \
+GST_PLUGIN_FEATURE_RANK="v4l2jpegdec:NONE" \
+gst-launch-1.0 -v \
+  urisourcebin uri=$URI ! decodebin ! \
+  videoconvert ! video/x-raw,format=BGRA ! \
+  dvPre model=$MODEL ! \
+  dvInf model=$MODEL sock=/var/run/proxy.sock use-shm=false ! \
+  dvPost model=$MODEL ! \
+  fakesink sync=false | grep Detected
+}}}
+    - the GST_PLUGIN_FEATURE_RANK is to disable the use of the v4l2jpegdec hardware decode on GStreamer 1.28 as it does not support a compatible format needed by dvPre (jet jpegdec does)
+ * Image detection with boxing via Python
+  - Python is incredibly useful for accessing GStreamer and handling the ARA detection frame data and imagemagick provides excellent tools for converting and drawing on images:
+  - (optional) install lighttpd so that we can easily see our resulting images via a browser
+{{{#!bash
+apt-get install -y lighttpd
+# add configuration for directory listing and mapping of /root to /
+cat << EOF >> /etc/lighttpd/lighttpd.conf
+dir-listing.encoding    = "utf-8"
+server.dir-listing      = "enable"
+# directory access
+alias.url += (
+        "/root" => "/root",
+)
+EOF
+# make the dir executable
+chmod ugo+x .
+# restart the web server
+/etc/init.d/lighttpd restart
+}}}
+  - install imagemagick which we will use to draw named boxes for detections
+{{{#!bash
+apt-get install -y imagemagick
+}}}
+  - create a dir for us to work in and create the script
+{{{#!bash
+mkdir image-detect; cd image-detect
+# create python script
+cat <<\EOF > image_detect.py
+#!/usr/bin/env python3
+"""
+Ara NPU Multi-Format Universal Image Decoder
+============================================
+"""
+import ctypes
+import os
+import sys
+import subprocess
+import gi
+gi.require_version('Gst', '1.0')
+from gi.repository import Gst
+Gst.init(None)
+# Standard COCO Class Mapping for printing human-readable labels
+COCO_CLASSES = {
+: "person", 1: "bicycle", 2: "car", 3: "motorcycle", 4: "airplane", 5: "bus",
+: "train", 7: "truck", 8: "boat", 9: "traffic light", 10: "fire hydrant",
+: "stop sign", 12: "parking meter", 13: "bench", 14: "bird", 15: "cat",
+: "dog", 17: "horse", 18: "sheep", 19: "cow", 20: "elephant", 21: "bear",
+: "zebra", 23: "giraffe", 24: "backpack", 25: "umbrella", 26: "handbag",
+: "tie", 28: "suitcase", 29: "frisbee", 30: "skis", 31: "snowboard",
+: "sports ball", 33: "kite", 34: "baseball bat", 35: "baseball glove",
+: "skateboard", 37: "surfboard", 38: "tennis racket", 39: "bottle",
+: "wine glass", 41: "cup", 42: "fork", 43: "knife", 44: "spoon", 45: "bowl",
+: "banana", 47: "apple", 48: "sandwich", 49: "orange", 50: "broccoli",
+: "carrot", 52: "hot dog", 53: "pizza", 54: "donut", 55: "cake",
+: "chair", 57: "couch", 58: "potted plant", 59: "bed", 60: "dining table",
+: "toilet", 62: "tv", 63: "laptop", 64: "mouse", 65: "remote", 66: "keyboard",
+: "cell phone", 68: "microwave", 69: "oven", 70: "toaster", 71: "sink",
+: "refrigerator", 73: "book", 74: "clock", 75: "vase", 76: "scissors",
+: "teddy bear", 78: "hair drier", 79: "toothbrush"
+}
+class AraDetection(ctypes.Structure):
+    _layout_ = "ms"
+    _pack_ = 1
+    _fields_ = [
+        ("xmin", ctypes.c_float), ("ymin", ctypes.c_float),
+        ("xmax", ctypes.c_float), ("ymax", ctypes.c_float),
+        ("confidence", ctypes.c_float), ("class_id", ctypes.c_int32),
+        ("class_name_ptr", ctypes.c_void_p)
+    ]
+def main():
+    if len(sys.argv) < 3:
+        print(f"Usage: {sys.argv[0]} <input_image> <output_image> [model]")
+        sys.exit(1)
+    input_image = sys.argv[1]
+    output_image = sys.argv[2]
+    model = "/usr/share/cnn/detection/yolov8n/model.dvm"
+    if len(sys.argv) > 3:
+        model = sys.argv[3]
+    if not os.path.exists(input_image):
+        print(f"ERROR: File '{input_image}' could not be located.")
+        sys.exit(1)
+    # Fetch native dimensions using ImageMagick
+    try:
+        dimensions = subprocess.check_output(f"identify -format '%w %h' {input_image}", shell=True).decode().split()
+        w_native, h_native = int(dimensions[0]), int(dimensions[1])
+    except Exception as e:
+        print(f"ERROR: Failed to read image properties using ImageMagick: {e}")
+        sys.exit(1)
+    # Print target properties cleanly
+    print(f"\nmodel: {model}")
+    print(f"image: {os.path.basename(input_image)} {w_native}x{h_native}")
+    MODEL_W, MODEL_H = 640, 640
+    pipe_str = (
+        f"multifilesrc location={input_image} loop=false num-buffers=2 ! decodebin name=d ! "
+        f"videoconvert ! videoscale ! video/x-raw,width={MODEL_W},height={MODEL_H} ! "
+        f"videoconvert ! video/x-raw,format=BGRA ! "
+        f"dvPre model={model} ! "
+        f"dvInf model={model} sock=/var/run/proxy.sock use-shm=true shm-path=/dev/shm/ara_inf_ ! "
+        f"dvPost model={model} orig-width={MODEL_W} orig-height={MODEL_H} ! "
+        f"appsink name=mysink sync=false async=false emit-signals=true"
+    )
+    # Before creating the launcher, adjust the system plugin registry ranking
+    # so GStreamer ignores v4l2jpegdec element (as it doesn't support BGRA output)
+    registry = Gst.Registry.get()
+    feature = registry.lookup_feature("v4l2jpegdec")
+    if feature:
+        # Lower its rank to ZERO so decodebin skips over it permanently
+        feature.set_rank(0)
+    pipeline = Gst.parse_launch(pipe_str)
+    sink = pipeline.get_by_name("mysink")
+    pipeline.set_state(Gst.State.PLAYING)
+    last_valid_raw_bytes = None
+    while True:
+        sample = sink.emit("pull-sample")
+        if not sample:
+            break
+        buffer = sample.get_buffer()
+        last_valid_raw_bytes = buffer.extract_dup(0, buffer.get_size())
+    pipeline.set_state(Gst.State.NULL)
+    processed_detections = []
+    if last_valid_raw_bytes and len(last_valid_raw_bytes) >= 4:
+        num_detections = int.from_bytes(last_valid_raw_bytes[:4], byteorder='little')
+        if 0 < num_detections < 1000:
+            print(f"DETECTIONS LOGGED: FOUND {num_detections} ACTIVE OBJECTS")
+            print("-" * 70)
+            offset = 4
+            ds = ctypes.sizeof(AraDetection)
+            for i in range(num_detections):
+                if offset + ds > len(last_valid_raw_bytes): break
+                det = AraDetection.from_buffer_copy(last_valid_raw_bytes[offset:offset+ds])
+                offset += ds
+                    # Compute native image coordinate translation mapping
+                x1_mapped = det.xmin * (w_native / MODEL_W)
+                x2_mapped = det.xmax * (w_native / MODEL_W)
+                y1_mapped = det.ymin * (h_native / MODEL_H)
+                y2_mapped = det.ymax * (h_native / MODEL_H)
+                coco_name = COCO_CLASSES.get(det.class_id, "unknown")
+                print(f"Object {i+1}: ID={det.class_id} | Name={coco_name} | Confidence={det.confidence * 100:.1f}%")
+                print(f"          Bounding Box -> [{int(x1_mapped)}, {int(y1_mapped)}] to [{int(x2_mapped)}, {int(y2_mapped)}]")
+                print("-" * 70)
+                processed_detections.append((coco_name, det.confidence, x1_mapped, y1_mapped, x2_mapped, y2_mapped))
+    # Render final multi-object annotated canvas
+    if processed_detections:
+        cmd_args = [f"convert {input_image}"]
+        for coco_name, conf, x1, y1, x2, y2 in processed_detections:
+            ix1, iy1, ix2, iy2 = int(x1), int(y1), int(x2), int(y2)
+            label = f"{coco_name} {conf*100:.1f}%"
+            cmd_args.append(f'-stroke green -strokewidth 2 -fill none -draw "rectangle {ix1},{iy1} {ix2},{iy2}"')
+            cmd_args.append(f'-stroke none -fill white -pointsize 16 -annotate +{ix1}+{iy1 - 6} "{label}"')
+        cmd_args.append(output_image)
+        draw_cmd = " ".join(cmd_args)
+        try:
+            subprocess.run(draw_cmd, shell=True, check=True)
+            print(f"SUCCESS: Mapped all boxes and text labels onto -> '{output_image}'\n")
+        except subprocess.CalledProcessError:
+            print("ERROR: ImageMagick rendering execution failed.\n")
+    else:
+        print("INFO: No operational object targets were captured by the NPU context.\n")
+if __name__ == '__main__':
+    main()
+EOF
+}}}
+  - The script using PyGObject which is a Python package that provides bindings for libraries based on GObject Introspection such as GTK, !WebKit, and GStreamer. It allows you to use C-based frameworks in python. We need to install the C libs for GSTreamer for this:
+{{{#!bash
+apt-get install -y \
+  libcairo2-dev \
+  libgirepository-2.0-dev \
+  python3-dev \
+  python3-gst-1.0 \
+  cmake pkg-config
+# we are also going to need to install gstreamer and its dev packages
+apt-get install -y \
+  libgstreamer1.0-dev \
+  libgstreamer-plugins-base1.0-dev \
+  libgstreamer-plugins-bad1.0-dev \
+  gstreamer1.0-plugins-base \
+  gstreamer1.0-plugins-good \
+  gstreamer1.0-plugins-bad \
+  gstreamer1.0-plugins-ugly \
+  gstreamer1.0-libav \
+  gstreamer1.0-tools
+}}}
+  - create a python virtual env (always a good idea to keep python dependencies containerized) and install python libs we need:
+{{{#!bash
+# create a venv (.venv)
+uv venv
+# install our scripts dependencies
+uv pip install pygobject
+}}}
+  - (optional) fetch some images for detection
+{{{#!bash
+# fetch a coco validation image; it contains a dog on a bench and the dog is at 208,147 to 293,289
+wget http://images.cocodataset.org/val2017/000000546829.jpg -O dog.jpg
+# use ffmpeg to grab a frame from within an MP4
+apt install -y ffmpeg
+ffmpeg -i /usr/share/ara2-vision-examples/sample_videos/video_0.mp4 -f null - # shows how lon git is (time=00:00:15.50)
+ffmpeg -i /usr/share/ara2-vision-examples/sample_videos/video_0.mp4 -ss 00:00:5 -frames:v 1 traffic.png
+}}}
+  - run the script (image_detect.py <source-image> <destination-image> [model-path])
+{{{#!bash
+uv run image_detect.py dog.jpg coco_detections.jpg
+}}}
+   - Note that without shm the pipeline needs to copy the raw image bytes over a local network-style socket connection. By mounting a dedicated memory path to /dev/shm you can eliminate that transfer (zero-copy): dvPre dumps the processed directly into a designated block of system RAM and dvInf uses a pointer to it
+   - you would think that if your original image was 1080x1920 and you resized it to the model size of 640x640 that if you tell dvPost the orig-width=1080 orig-height=1920 that it would scale the bounding boxes properly however in practice it seems it does not unless your image has the same aspect ratio of the model. mapping it as above (telling dvPost that the image is 640x640 and scaling ourselves) resolves this
 [=#eiq-aaf-connector]