Changes between Version 9 and Version 10 of expansion/gw16168


Ignore:
Timestamp:
06/05/2026 08:48:20 PM (8 days ago)
Author:
Tim Harvey
Comment:

added gstreamer plugin details and some image and video detection examples

Legend:

Unmodified
Added
Removed
Modified
  • expansion/gw16168

    v9 v10  
    135135 - on bootup make sure you wait for the console messages indicating the Proxy is launched before using it as it can take a couple of minutes
    136136 - the binary tools and libs are all currently dynamic linked against stdlibc
    137  - the GStreamer libs have compatibility issues with modern GStreamer
     137 - the GStreamer libs require GStreamer 1.26 or newer
    138138
    139139Verification steps:
     
    174174}}}
    175175
     176[=#gstreamer
     177=== GStreamer plugins
     178The rt-sdk-ara2 provides a set of gstreamer plugins for inference:
     179 - dvPre
     180 - dvInf
     181 - dvPost
     182
     183Without more documentation or source for these its likely best to think of them as: dvPre prepares buffers, dvInf hands them off to the NPU and dvPost processes the response.
     184
     185The dvPre element must have 32bit pixel samples (ie format=BGRA using 4 bytes per pixel, blue, green, red, alpha; alpha byte is completely empty padding data not used for transparency just as a structural spacer), not 24-bit format=RGB (3 bytes one for red, green, blue).
     186
     187All three elements require the model specified via the 'model' property. If using yolov8x for example you would specify the path to the yolov8x.dvm
     188
     189For detection models the dvPost element frame data will contain a buffer with number of bytes (32bit) followed by a series of detection structures containing the bounding box, confidence level, and COCO class ID of the object detected.
     190
     191The units for the bounding box are relative to the models size and will need to be scaled back to your original image size. For example the YOLO models operate on 640x640 pixel data. You can pass something larger in and it will essentially tile but its unclear if there is an advantage of doing that.
     192
     193The gstreamer plugins are currently provided as binary only shared objects. They are linked against stdlibc (libc.so.6) and libgstreamer-1.0.so.0 and compatible with GStreamer 1.26 or newer.
     194
     195If you are using a rootfs that does not have GStreamer 1.26 you will need to build it or provide it via virtualization. For example Ubuntu 24.x Noble has GStreamer 1.24, Ubuntu 25.x has GStreamer 1.26 and Ubuntu 26.x Ocelot has GStreamer 1.28. So if you were running Ubuntu Noble you could use distrobox/docker to install GStreamer 1.26 and its dependencies using Ubuntu 25.x.
     196
     197Examples:
     198 * Ubuntu noble (24.04):
     199  - Ubuntu noble has GStreamer 1.24 which is not compatible with the 1.26 plugins
     200  - one solution could be a GStreamer 1.26 PPA backport but we have not found any
     201  - one solution is a containerized Ubuntu 25.04 container on Ubuntu 24.04 rootfs:
     202{{{#!bash
     203apt update && apt install -y distrobox docker.io
     204# Create a 25.04 container that can see your hardware
     205distrobox create --image ubuntu:25.04 --name gst126 \  --volume /usr/lib/gstreamer-1.0:/opt/ara2/plugins:ro \
     206  --volume /usr/lib:/opt/ara2/libs:ro \
     207  --volume /usr/share/cnn:/usr/share/cnn \
     208  --volume /usr/share/llm:/usr/share/llm \
     209  --volume /dev/bus/usb:/dev/bus/usb
     210# enter the container to use it
     211distrobox enter gst126
     212# export vars via ~/.bashrc (exit and enter the distrobox to take effect)
     213echo "export GST_PLUGIN_PATH=/opt/ara2/plugins" >> ~/.bashrc
     214echo "export LD_LIBRARY_PATH=/opt/ara2/libs:\$LD_LIBRARY_PATH" >> ~/.bashrc
     215}}}
     216   - whenever using the ARA plugins you will need to make sure you do so in the gst126 environment
     217   - the volume param creates bind mounts between the host and the virtual target
     218   - you can also always access the host rootfs via /run/host
     219   - also make sure you install gstreamer and anything that uses it within that virtual environment
     220   - this uses virtualization, not emulation - there is no performance hit or latency added, its just a different set of executables
     221   - disk space for the ubuntu 25.04 base above is about 1.54GB
     222 * Ubuntu 26.04 resolute
     223  - Ubuntu resolute (26.04) has GStreamer 1.28 which the 1.26 plugins are backwards compatible with
     224  - gstreamer 1.28 decodebin is picking hardware-accelerated v4l2jpegdec (on Venice) instead of the standard software decoder jpegdec and v4l2jpegdec does not support YUV3 (typical for standard JPEG images) so if using it you will need to take steps to disable it or prefer jpegdec over it. For example you can use GST_PLUGIN_FEATURE_RANK="v4l2jpegdec:NONE" or set the rank at runtime such is done in the detection examples below
     225
     226Install GStreamer:
     227{{{#!bash
     228apt-get update && apt install -y \
     229   gstreamer1.0-x \
     230   gstreamer1.0-tools \
     231   gstreamer1.0-plugins-base \
     232   gstreamer1.0-plugins-good \
     233   gstreamer1.0-plugins-bad \
     234   gstreamer1.0-plugins-ugly \
     235   gstreamer1.0-libav \
     236   v4l-utils
     237}}}
     238 - this adds about 500MiB of disk space
     239
     240Specify Plugin path:
     241{{{#!bash
     242# export now to current shell
     243export GST_PLUGIN_PATH=/usr/lib/gstreamer-1.0/
     244# put in .bashrc so it happens for any new bash shell
     245echo "export GST_PLUGIN_PATH=/usr/lib/gstreamer-1.0/" >> ~/.bashrc
     246}}}
     247 - this tells GStreamer to look for plugins in the non-standard location of the ARA gstreamer plugins
     248
     249At this point you should be able to inspect the dvPre, dvInf, and dvPost elements:
     250{{{#!bash
     251gst-inspect-1.0 dvPre
     252gst-inspect-1.0 dvInf
     253gst-inspect-1.0 dvPost
     254}}}
     255
     256=== Detection Examples
     257Examples:
     258 * gst-launch pipeline prototypeing:
     259  - enabling debug level 6 on dvPost will show the number of object detections in its debug output but if you want to do anything with that data you need to write an application that can decode frame buffers. Still this is useful for prototyping:
     260   * perform detection on a v4l2 video device like a webcam:
     261{{{#!bash
     262DEV=/dev/video2
     263MODEL=/usr/share/cnn/detection/yolov8n/model.dvm
     264GST_DEBUG="dvPost:6" \
     265gst-launch-1.0 -v \
     266  v4l2src device=$DEV ! \
     267  video/x-raw,width=640,height=480,framerate=30/1 ! \
     268  videoconvert ! video/x-raw,format=BGRA ! \
     269  dvPre model=$MODEL ! \
     270  dvInf model=$MODEL sock=/var/run/proxy.sock use-shm=false ! \
     271  dvPost model=$MODEL ! \
     272  fakesink sync=false | grep Detected
     273}}}
     274    - see wiki:linux/persistent_device_naming#video for details about making video devices have persistent device names
     275   * perform a detection on an image:
     276{{{#!bash
     277URI=file:///$PWD/traffic.png
     278MODEL=/usr/share/cnn/detection/yolov8n/model.dvm
     279GST_DEBUG="dvPost:6" \
     280GST_PLUGIN_FEATURE_RANK="v4l2jpegdec:NONE" \
     281gst-launch-1.0 -v \
     282  urisourcebin uri=$URI ! decodebin ! \
     283  videoconvert ! video/x-raw,format=BGRA ! \
     284  dvPre model=$MODEL ! \
     285  dvInf model=$MODEL sock=/var/run/proxy.sock use-shm=false ! \
     286  dvPost model=$MODEL ! \
     287  fakesink sync=false | grep Detected
     288}}}
     289    - the GST_PLUGIN_FEATURE_RANK is to disable the use of the v4l2jpegdec hardware decode on GStreamer 1.28 as it does not support a compatible format needed by dvPre (jet jpegdec does)
     290 * Image detection with boxing via Python
     291  - Python is incredibly useful for accessing GStreamer and handling the ARA detection frame data and imagemagick provides excellent tools for converting and drawing on images:
     292  - (optional) install lighttpd so that we can easily see our resulting images via a browser
     293{{{#!bash
     294apt-get install -y lighttpd
     295# add configuration for directory listing and mapping of /root to /
     296cat << EOF >> /etc/lighttpd/lighttpd.conf
     297dir-listing.encoding    = "utf-8"
     298server.dir-listing      = "enable"
     299
     300# directory access
     301alias.url += (
     302        "/root" => "/root",
     303)
     304EOF
     305# make the dir executable
     306chmod ugo+x .
     307# restart the web server
     308/etc/init.d/lighttpd restart
     309}}}
     310  - install imagemagick which we will use to draw named boxes for detections
     311{{{#!bash
     312apt-get install -y imagemagick
     313}}}
     314  - create a dir for us to work in and create the script
     315{{{#!bash
     316mkdir image-detect; cd image-detect
     317# create python script
     318cat <<\EOF > image_detect.py
     319#!/usr/bin/env python3
     320"""
     321Ara NPU Multi-Format Universal Image Decoder
     322============================================
     323"""
     324
     325import ctypes
     326import os
     327import sys
     328import subprocess
     329import gi
     330
     331gi.require_version('Gst', '1.0')
     332from gi.repository import Gst
     333
     334Gst.init(None)
     335
     336# Standard COCO Class Mapping for printing human-readable labels
     337COCO_CLASSES = {
     338    0: "person", 1: "bicycle", 2: "car", 3: "motorcycle", 4: "airplane", 5: "bus",
     339    6: "train", 7: "truck", 8: "boat", 9: "traffic light", 10: "fire hydrant",
     340    11: "stop sign", 12: "parking meter", 13: "bench", 14: "bird", 15: "cat",
     341    16: "dog", 17: "horse", 18: "sheep", 19: "cow", 20: "elephant", 21: "bear",
     342    22: "zebra", 23: "giraffe", 24: "backpack", 25: "umbrella", 26: "handbag",
     343    27: "tie", 28: "suitcase", 29: "frisbee", 30: "skis", 31: "snowboard",
     344    32: "sports ball", 33: "kite", 34: "baseball bat", 35: "baseball glove",
     345    36: "skateboard", 37: "surfboard", 38: "tennis racket", 39: "bottle",
     346    40: "wine glass", 41: "cup", 42: "fork", 43: "knife", 44: "spoon", 45: "bowl",
     347    46: "banana", 47: "apple", 48: "sandwich", 49: "orange", 50: "broccoli",
     348    51: "carrot", 52: "hot dog", 53: "pizza", 54: "donut", 55: "cake",
     349    56: "chair", 57: "couch", 58: "potted plant", 59: "bed", 60: "dining table",
     350    61: "toilet", 62: "tv", 63: "laptop", 64: "mouse", 65: "remote", 66: "keyboard",
     351    67: "cell phone", 68: "microwave", 69: "oven", 70: "toaster", 71: "sink",
     352    72: "refrigerator", 73: "book", 74: "clock", 75: "vase", 76: "scissors",
     353    77: "teddy bear", 78: "hair drier", 79: "toothbrush"
     354}
     355
     356class AraDetection(ctypes.Structure):
     357    _layout_ = "ms"
     358    _pack_ = 1
     359    _fields_ = [
     360        ("xmin", ctypes.c_float), ("ymin", ctypes.c_float),
     361        ("xmax", ctypes.c_float), ("ymax", ctypes.c_float),
     362        ("confidence", ctypes.c_float), ("class_id", ctypes.c_int32),
     363        ("class_name_ptr", ctypes.c_void_p)
     364    ]
     365
     366def main():
     367    if len(sys.argv) < 3:
     368        print(f"Usage: {sys.argv[0]} <input_image> <output_image> [model]")
     369        sys.exit(1)
     370
     371    input_image = sys.argv[1]
     372    output_image = sys.argv[2]
     373    model = "/usr/share/cnn/detection/yolov8n/model.dvm"
     374    if len(sys.argv) > 3:
     375        model = sys.argv[3]
     376
     377    if not os.path.exists(input_image):
     378        print(f"ERROR: File '{input_image}' could not be located.")
     379        sys.exit(1)
     380
     381    # Fetch native dimensions using ImageMagick
     382    try:
     383        dimensions = subprocess.check_output(f"identify -format '%w %h' {input_image}", shell=True).decode().split()
     384        w_native, h_native = int(dimensions[0]), int(dimensions[1])
     385    except Exception as e:
     386        print(f"ERROR: Failed to read image properties using ImageMagick: {e}")
     387        sys.exit(1)
     388   
     389    # Print target properties cleanly
     390    print(f"\nmodel: {model}")
     391    print(f"image: {os.path.basename(input_image)} {w_native}x{h_native}")
     392
     393    MODEL_W, MODEL_H = 640, 640
     394
     395    pipe_str = (
     396        f"multifilesrc location={input_image} loop=false num-buffers=2 ! decodebin name=d ! "
     397        f"videoconvert ! videoscale ! video/x-raw,width={MODEL_W},height={MODEL_H} ! "
     398        f"videoconvert ! video/x-raw,format=BGRA ! "
     399        f"dvPre model={model} ! "
     400        f"dvInf model={model} sock=/var/run/proxy.sock use-shm=true shm-path=/dev/shm/ara_inf_ ! "
     401        f"dvPost model={model} orig-width={MODEL_W} orig-height={MODEL_H} ! "
     402        f"appsink name=mysink sync=false async=false emit-signals=true"
     403    )
     404
     405    # Before creating the launcher, adjust the system plugin registry ranking
     406    # so GStreamer ignores v4l2jpegdec element (as it doesn't support BGRA output)
     407    registry = Gst.Registry.get()
     408    feature = registry.lookup_feature("v4l2jpegdec")
     409    if feature:
     410        # Lower its rank to ZERO so decodebin skips over it permanently
     411        feature.set_rank(0)
     412
     413    pipeline = Gst.parse_launch(pipe_str)
     414    sink = pipeline.get_by_name("mysink")
     415    pipeline.set_state(Gst.State.PLAYING)
     416
     417    last_valid_raw_bytes = None
     418
     419    while True:
     420        sample = sink.emit("pull-sample")
     421        if not sample:
     422            break
     423        buffer = sample.get_buffer()
     424        last_valid_raw_bytes = buffer.extract_dup(0, buffer.get_size())
     425
     426    pipeline.set_state(Gst.State.NULL)
     427   
     428    processed_detections = []
     429
     430    if last_valid_raw_bytes and len(last_valid_raw_bytes) >= 4:
     431        num_detections = int.from_bytes(last_valid_raw_bytes[:4], byteorder='little')
     432       
     433        if 0 < num_detections < 1000:
     434            print(f"DETECTIONS LOGGED: FOUND {num_detections} ACTIVE OBJECTS")
     435            print("-" * 70)
     436           
     437            offset = 4
     438            ds = ctypes.sizeof(AraDetection)
     439           
     440            for i in range(num_detections):
     441                if offset + ds > len(last_valid_raw_bytes): break
     442                det = AraDetection.from_buffer_copy(last_valid_raw_bytes[offset:offset+ds])
     443                offset += ds
     444               
     445                    # Compute native image coordinate translation mapping
     446                x1_mapped = det.xmin * (w_native / MODEL_W)
     447                x2_mapped = det.xmax * (w_native / MODEL_W)
     448                y1_mapped = det.ymin * (h_native / MODEL_H)
     449                y2_mapped = det.ymax * (h_native / MODEL_H)
     450               
     451                coco_name = COCO_CLASSES.get(det.class_id, "unknown")
     452               
     453                print(f"Object {i+1}: ID={det.class_id} | Name={coco_name} | Confidence={det.confidence * 100:.1f}%")
     454                print(f"          Bounding Box -> [{int(x1_mapped)}, {int(y1_mapped)}] to [{int(x2_mapped)}, {int(y2_mapped)}]")
     455                print("-" * 70)
     456               
     457                processed_detections.append((coco_name, det.confidence, x1_mapped, y1_mapped, x2_mapped, y2_mapped))
     458
     459    # Render final multi-object annotated canvas
     460    if processed_detections:
     461        cmd_args = [f"convert {input_image}"]
     462        for coco_name, conf, x1, y1, x2, y2 in processed_detections:
     463            ix1, iy1, ix2, iy2 = int(x1), int(y1), int(x2), int(y2)
     464            label = f"{coco_name} {conf*100:.1f}%"
     465            cmd_args.append(f'-stroke green -strokewidth 2 -fill none -draw "rectangle {ix1},{iy1} {ix2},{iy2}"')
     466            cmd_args.append(f'-stroke none -fill white -pointsize 16 -annotate +{ix1}+{iy1 - 6} "{label}"')
     467           
     468        cmd_args.append(output_image)
     469        draw_cmd = " ".join(cmd_args)
     470       
     471        try:
     472            subprocess.run(draw_cmd, shell=True, check=True)
     473            print(f"SUCCESS: Mapped all boxes and text labels onto -> '{output_image}'\n")
     474        except subprocess.CalledProcessError:
     475            print("ERROR: ImageMagick rendering execution failed.\n")
     476    else:
     477        print("INFO: No operational object targets were captured by the NPU context.\n")
     478
     479if __name__ == '__main__':
     480    main()
     481EOF
     482}}}
     483  - The script using PyGObject which is a Python package that provides bindings for libraries based on GObject Introspection such as GTK, !WebKit, and GStreamer. It allows you to use C-based frameworks in python. We need to install the C libs for GSTreamer for this:
     484{{{#!bash
     485apt-get install -y \
     486  libcairo2-dev \
     487  libgirepository-2.0-dev \
     488  python3-dev \
     489  python3-gst-1.0 \
     490  cmake pkg-config
     491# we are also going to need to install gstreamer and its dev packages
     492apt-get install -y \
     493  libgstreamer1.0-dev \
     494  libgstreamer-plugins-base1.0-dev \
     495  libgstreamer-plugins-bad1.0-dev \
     496  gstreamer1.0-plugins-base \
     497  gstreamer1.0-plugins-good \
     498  gstreamer1.0-plugins-bad \
     499  gstreamer1.0-plugins-ugly \
     500  gstreamer1.0-libav \
     501  gstreamer1.0-tools
     502}}}
     503  - create a python virtual env (always a good idea to keep python dependencies containerized) and install python libs we need:
     504{{{#!bash
     505# create a venv (.venv)
     506uv venv
     507# install our scripts dependencies
     508uv pip install pygobject
     509}}}
     510  - (optional) fetch some images for detection
     511{{{#!bash
     512# fetch a coco validation image; it contains a dog on a bench and the dog is at 208,147 to 293,289
     513wget http://images.cocodataset.org/val2017/000000546829.jpg -O dog.jpg
     514# use ffmpeg to grab a frame from within an MP4
     515apt install -y ffmpeg
     516ffmpeg -i /usr/share/ara2-vision-examples/sample_videos/video_0.mp4 -f null - # shows how lon git is (time=00:00:15.50)
     517ffmpeg -i /usr/share/ara2-vision-examples/sample_videos/video_0.mp4 -ss 00:00:5 -frames:v 1 traffic.png
     518}}}
     519  - run the script (image_detect.py <source-image> <destination-image> [model-path])
     520{{{#!bash
     521uv run image_detect.py dog.jpg coco_detections.jpg
     522}}}
     523   - Note that without shm the pipeline needs to copy the raw image bytes over a local network-style socket connection. By mounting a dedicated memory path to /dev/shm you can eliminate that transfer (zero-copy): dvPre dumps the processed directly into a designated block of system RAM and dvInf uses a pointer to it
     524   - you would think that if your original image was 1080x1920 and you resized it to the model size of 640x640 that if you tell dvPost the orig-width=1080 orig-height=1920 that it would scale the bounding boxes properly however in practice it seems it does not unless your image has the same aspect ratio of the model. mapping it as above (telling dvPost that the image is 640x640 and scaling ourselves) resolves this
    176525
    177526[=#eiq-aaf-connector]