Changes between Version 5 and Version 6 of expansion/gw16168


Ignore:
Timestamp:
05/01/2026 08:32:45 PM (109 minutes ago)
Author:
Tim Harvey
Comment:

add ara-rt-sdk and eqi-aaf-connector install and examples

Legend:

Unmodified
Added
Removed
Modified
  • expansion/gw16168

    v5 v6  
    1616 * PPA: Power, Performance and Accuracy - metrics reported by the compiler
    1717 * SOF: Schedule Optimization Factor - a measure reported by the compiler
     18 * CNN: Convolutional Neural Network - a deep learning model designed to analyze and process grid-like data such as images, videos and sometimes audio and text
     19 * LLM: Large Language Model - an AI model trained on massive amounts of text data to understand, summarize, and generate human-like language
     20 * VLM: Vision Language Model - a multimodal AI that bridges the gap between sight and language. It essentially gives an LLM the ability to "see" by integrating a vision encoder with a language
     21 * sLLM: small Language Model - a lightweight version of an LLM designed to be more efficient, especially for "edge" devices with limited hardware resources
    1822
    1923== Documentation and links
    2024Public:
     25 * [https://www.nxp.com/design/design-center/software/embedded-software/ara-software-development-kit:ARA-SDK NXP ARA SDK Landing page]
    2126 * https://github.com/nxp-imx/rt-sdk-ara2 - NXP's repo for ara2 runtime SDK v2.04 with dynamic linked binaries
    2227 * https://github.com/nxp-imx-support/uiodma-driver - Kernel driver GPL-2.0
     
    2530= NXP Ara240 DNPU AI Accelerator Quick Start
    2631
    27 Coming soon!
    28 
     32== Using NXP deb distribution packages
     33Currently NXP is distributing the Ara2 runtime in binary form. They have released the kernel driver as opensource which resolves kernel compatibility issues which is a huge step but the userspace apps and libraries remain dynamic linked binary objects.
     34
     35The current deb packages have some shortcomings:
     36 - packages are not very consistent; some have a systemd service in the data, others create one via postinst
     37 - they were intended to install on top of the NXP Embedded Linux Firmware (version L6.12.34-2.1.0) and intended to support only NXP dev kit boards so the dependencies are incomplete and don't match what would be on other Linux based root filesystems (Ubuntu system for example)
     38
     39If you extract the deb's and examine the DEBIAN directory you can see how to install them on other boards and root filesystems.
     40
     41It is fairly common for AI models to make use of python and NXP is using that here. The rt-sdk-ara2 includes a couple of Python Wheels that are used in the examples. A Python Wheel is a standard built-package format for distributing Python libraries. It is essentially a ZIP-format archive with a .whl extension that contains all the files needed for a package to run immediately after being. It's also standard when using Python to run into package version incompatibilities which is why user based Python virtual environments are used.
     42
     43Note the deb files require an NXP account to download (from [https://www.nxp.com/design/design-center/software/embedded-software/ara-software-development-kit:ARA-SDK NXP ARA SDK Landing page]) so the instructions below assume you have them already in the current directory.
     44
     45[=#rt-sdk-ara2]
     46=== rt-sdk-ara2
     47The ara2 runtime should not really be considered an 'SDK' - it has nothing to do with software development, its simply the set of utils and libs needed to use the Ara2.
     48
     49The rt-sdk-ara2 provides a complete runtime environment for AI/ML acceleration using the Ara240 NPU on for aarch64. This package includes:
     50 * Runtime libraries for Ara240 NPU integration
     51 * Python bindings (DVAPI) for custom inference applications
     52 * Optimum-Ara framework for LLMs and VLMs
     53 * GStreamer plugins for Real-Time Detection Object Applications
     54 * Helper scripts for monitoring, benchmarking, and model management
     55 * Systemd service for automatic hardware initialization
     56
     57Installation on a Gateworks board with Ubuntu based OS:
     58 - extract the debian 'data' (do not install the package!)
     59{{{#!bash
     60# extract data (but don't install)
     61dpkg-deb --vextract rt-sdk-ara2_2.0.4.deb /
     62}}}
     63 - take care of postinst steps
     64  - miscelaneous
     65{{{#!bash
     66# create app dirs (used for models)
     67mkdir -pv /usr/share/{cnn,llm}
     68# get rid of circular symlink
     69rm /usr/share/rt-sdk-ara240_2.0.4/rt-sdk-ara240_2.0.4
     70}}}
     71  - install uv package manager for Python virtualization and packaging for local user (which is installed to ~/.local/bin so we create symlinks to /usr/bin)
     72{{{#!bash
     73apt update && apt install -y curl
     74curl -LsSf https://astral.sh/uv/install.sh | sh
     75ln -s /root/.local/bin/uv /usr/bin/uv
     76ln -s /root/.local/bin/uvx /usr/bin/uvx
     77}}}
     78  - build driver (the one in the deb is specific to the IMX BSP kernel)
     79{{{#!bash
     80apt update && apt install -y build-essential git bc file flex bison
     81git clone https://github.com/nxp-imx-support/uiodma-driver
     82( cd uiodma-driver/uiodma; make )
     83# install it where the rt service expects to find it (over the top of the non-compatible one)
     84cp uiodma-driver/uiodma/uiodma.ko /usr/share/rt-sdk-ara240/driver/
     85}}}
     86  - enable service:
     87{{{#!bash
     88# enable service
     89systemctl enable rt-sdk-ara2.service
     90# start service now (unless you reboot)
     91systemctl start rt-sdk-ara2.service
     92}}}
     93  - use 'fetch_models' to pre-compiled models for testing via the fetch_models script which will fetch models from !HuggingFace.
     94{{{#!bash
     95# list models available for nxp/ara
     96fetch_models --list
     97# install YOLOv8
     98fetch_models --repo-id nxp/YOLOv8 # 746MB (711MiB)
     99}}}
     100   - the script is a python wrapper that uses uvx and the fetch-models python wheel (/usr/share/python-wheels/fetch_models-1.0.0-py3-none-any.whl) to fetch and install models from !HuggingFace HUB
     101   - the models will be installed in either /usr/share/cnn (Convolutional Neural Network) and /usr/share/llm (Large Language Model)
     102   - NXP has Ara2 optimized models at https://huggingface.co/nxp
     103   - the script has a hard coded list of models available and where to install them locally. You can use 'python -m zipfile -e /usr/share/python-wheels/fetch_models-1.0.0-py3-none-any.whl ./fetch_models' to see what it's doing
     104
     105Notable Files:
     106 - /usr/lib/
     107  - libaraclient_aarch64.so - base library for interfacing with ara2
     108  - libara_vision_inference.so - inference lib that builds on libaraclient
     109 - /usr/lib/gstreamer-1.0
     110  - libgstdvPre.so
     111  - libgstdvInfo.so
     112  - libgstdvPost.so
     113 - /usr/share/rt-sdk-ara240 (symlink to a version independent dir at same location)
     114  - hw_utils/boot_img - firmware files
     115  - hw_utils/ddr_config - ddr binaries
     116  - hw_utils/bins/ - the hw utils for bringup/programming
     117  - optimum-ara/ - extension of the Hugging Face library that integrates with Ara240 DNPU
     118  - scripts - various wrappers around the tools etc
     119  - nnapp - tool for benchmarking models
     120  - config - various example yaml config files used for proxy/nnapp
     121  - include/dvapi.py - python bindings to dvapi
     122  - driver/uiodma.ko - driver (where the setup script expects to find it)
     123 - /usr/share/python-wheels - python wheels for fetch_models and optimum_ara
     124 - /usr/shar/doc/rt-sdk-ara2 - license info
     125 - /usr/include/sdk_ara - headers for C libs
     126 - /usr/bin - various scripts
     127 - /etc/udev/rules.d/99-ara2.rules - udev rule which makes the PCI ID dependent on the systemd service
     128 - /etc/systemd/system/rt-sdk-ara2.service - systemd service that handles the various hw util config
     129 - /etc/rt-sdk-ara240/cnn_config.yaml - config for nnapp
     130 - /etc/rt-sdk-ara240/proxy_config.yam - config for proxy
     131
     132Notes:
     133 - This will not program flash - that is a manual step only required if there is an update
     134 - The 'uv' package manager is a fast all-in-one Python package and project manager written in Rust which makes it easy to work with virtual env's to avoid Python package version clashing which is essential
     135 - on bootup make sure you wait for the console messages indicating the Proxy is launched before using it as it can take a couple of minutes
     136 - the binary tools and libs are all currently dynamic linked against stdlibc
     137 - the GStreamer libs have compatibility issues with modern GStreamer
     138
     139Verification steps:
     140 1. show chip_info
     141{{{#!bash
     142chip_info.sh
     143}}}
     144 1. verify service
     145{{{#!bash
     146# show service status
     147systemctl status rt-sdk-ara2.service --no-pager -l
     148# view detailed service logs
     149journalctl -u rt-sdk-ara2.service
     150# verify proxy is running (critical)
     151ps -eaf | grep proxy_ara240
     152}}}
     153
     154Examples:
     155 - Download pre-compiled models for testing:
     156  - The fetch_models script from the ara2-rt will fetch models from !HuggingFace.
     157{{{#!bash
     158# list models available for nxp/ara
     159fetch_models --list
     160# install YOLOv8
     161fetch_models --repo-id nxp/YOLOv8 # 746MB (711MiB)
     162}}}
     163  - the 'fetch_models' script is a python wrapper that uses uvx and the fetch-models python wheel (/usr/share/python-wheels/fetch_models-1.0.0-py3-none-any.whl) to fetch and install models from !HuggingFace HUB
     164  - the models will be installed in /usr/share/cnn (Convolutional Neural Network) and /usr/share/llm (Large Language Model)
     165  - NXP has Ara2 optimized models at https://huggingface.co/nxp
     166 - Run performance benchmark (uses nnapp)
     167{{{#!bash
     168run_model_perf.sh
     169}}}
     170  - the 'run_model_perf.sh' script makes it easy to list and show model categories and models and is a wrapper around the nnapp app which has a lot of options and a config file
     171 - monitor real-time NPU metrics including utilization, temperature, DRAM usage and device state (interactively during benchmarking or model execution)
     172{{{#!bash
     173ara2_metrics.sh
     174}}}
     175
     176
     177[=#eiq-aaf-connector]
     178=== eIQ AAF Connector
     179The eIQ AAF Connector (edge Intelligence Ara Application Framework)
     180is a REST-based server that enables LLM inference on NXP i.MX processors with the ARA-240 DNPU. The API implemented is the de-facto API standard created by OpenAI for ChatGPT. It provides a simple Chat Completions-based HTTP interface for serving models to client applications.
     181
     182Requirements:
     183 - python 3.13 (we will install in a virtual env)
     184 - uv - used for the user-specific Python virtual environment
     185 - Optimum Ara framework for running Large Language Models (LLMs) and Vision-Language Models (VLMs) on Ara240 (part of rt-sdk)
     186 - OpenCV (dependency of the QwenVL engine)
     187 - Models
     188
     189Installation on a Gateworks board with Ubuntu based OS:
     190 - extract the debian 'data' (do not install the package!)
     191{{{#!bash
     192# extract data (but don't install)
     193dpkg-deb --vextract eiq-aaf-connector_2.0.deb /
     194}}}
     195 - take care of postinst steps
     196  1. Create the /usr/share/eiq/aaf-connector/venv (used by /usr/share/eiq/aaf-connector/venv/bin/connector)
     197{{{#!bash
     198# needs python 3.13 so we will install it in a virtual env for this user
     199uv python install 3.13
     200uv venv --python 3.13 "/usr/share/eiq/aaf-connector/venv"
     201# activate venv
     202source "/usr/share/eiq/aaf-connector/venv/bin/activate"
     203# install Python dependencies in venv from the Optimum Ara wheel
     204uv pip install --no-progress /usr/share/python-wheels/optimum_ara-2.0.0.2-py3-none-any.whl
     205# install Python dependencies in venv from the eIQ wheel in this package
     206uv pip install --no-progress /usr/share/python-wheels/eiq_aaf_connector-2.0.0-py3-none-any.whl
     207# ditch the default opencv-python which depends on libgl1-mesa and install the headless version instead
     208uv pip uninstall opencv-python
     209uv pip install opencv-python-headless
     210# deactivate venv
     211deactivate
     212}}}
     213  1. Create systemd service file (not sure why this wasn't in the deb)
     214{{{#!bash
     215cat > /etc/systemd/system/eiq-aaf-connector.service << EOF
     216[Unit]
     217Description=eIQ AAF Connector Service
     218After=network.target rt-sdk-ara2.service
     219
     220[Service]
     221Type=simple
     222User=root
     223WorkingDirectory=/usr/share/eiq/aaf-connector
     224ExecStart=/usr/share/eiq/aaf-connector/venv/bin/connector --host 127.0.0.1 --port 8000
     225Restart=on-failure
     226RestartSec=5s
     227StandardOutput=journal
     228StandardError=journal
     229
     230[Install]
     231WantedBy=multi-user.target
     232EOF
     233}}}
     234  - If you wish this to be accessible from the Network set the host to '0.0.0.0' instead of '127.0.0.1':
     235{{{#!bash
     236sed -i 's|--host 127.0.0.1|--host 0.0.0.0|g' /etc/systemd/system/eiq-aaf-connector.service
     237}}}
     238  1. add Ara2 optimized LLM models (these get installed to /usr/share/llm)
     239{{{#!bash
     240fetch_models --repo-id nxp/Qwen2.5-7B-Instruct-Ara240 # 7.7GiB
     241fetch_models --repo-id nxp/Qwen2.5-Coder-1.5B-Ara240 # 1.67GiB
     242}}}
     243  1. edit the config file to enable the two models we just downloaded (using jq):
     244{{{#!bash
     245apt update && apt install -y jq
     246jq '(.available_models[] | select(.name == "Qwen2.5-Coder-1.5B") |  .enabled) = true' /usr/share/eiq/aaf-connector/server_config.json > /tmp/config.json && \
     247  mv /tmp/config.json /usr/share/eiq/aaf-connector/server_config.json
     248jq '(.available_models[] | select(.name == "Qwen2.5-7B-Instruct") |  .enabled) = true' /usr/share/eiq/aaf-connector/server_config.json > /tmp/config.json && \
     249  mv /tmp/config.json /usr/share/eiq/aaf-connector/server_config.json
     250jq '(.available_models[] | select(.name == "Qwen2.5-7B-Instruct") | .enabled) = true' /usr/share/eiq
     251/aaf-connector/server_config.json > /tmp/config.json && mv /tmp/config.json /usr/share/eiq/aaf-connector/server_config.j
     252son
     253}}}
     254  - you can just as easily edit the file manually if you want
     255  1. Enable and start service
     256{{{#!bash
     257# Enable service on boot
     258systemctl enable eiq-aaf-connector.service
     259# Start the service now (or reboot)
     260systemctl start eiq-aaf-connector.service
     261}}}
     262
     263Note that it takes several minutes for the service to actually be ready for connections as it must process the models (monitor with 'journalctl -u eiq-aaf-connector.service --no-pager -f' and test that its ready for listening with 'ss -tulpn | grep :8000').
     264
     265By default, the connector configured above will start on 127.0.0.1:8000 which is the local loopback interface. To be able to run requests from another device, you can change the host to '0.0.0.0' in the service file.
     266
     267Notable Files:
     268 - /usr/share/eiq/aaf-connector/server_config.json (server config file)
     269 - /usr/share/python-wheels/eiq_aaf_connector-2.0.0-py3-none-any.whl - Python wheel
     270 - /usr/bin/aaf-connector - shell script that activates the venv and executes the connector
     271 - /usr/share/eiq/aaf-connector/venv - Python virtual env used by connector
     272 - /etc/systemd/system/eiq-aaf-connector.service - systemd service
     273
     274
     275The connector self-hosts API documentation at http://<serverip>:8000/docs
     276
     277Example Usage:
     278 - verify connector running
     279{{{#!bash
     280# show service status
     281systemctl status eiq-aaf-connector.service --no-pager -l
     282# view detailed service logs
     283journalctl -u eiq-aaf-connector.service
     284# verify process exists
     285ps -ef | grep aaf-connector
     286# verify port open
     287ss -tulpn | grep :8000 # show IP:PORT server is listening on
     288}}}
     289 - view API docs and interact with server (requires changing the host to '0.0.0.0' in the !ExecStart config for /etc/systemd/system/eiq-aaf-connector.service by opening !http://<serverip>:8000/docs
     290 - use API via curl/jq
     291{{{#!bash
     292# make sure curl and jq are installed (jq allows easy interaction with json data)
     293apt install -y curl jq
     294# list of models
     295curl -X 'GET' \
     296  'http://127.0.0.1:8000/v1/models' \
     297  -H 'accept: application/json' | jq
     298# get info about a specific model (Qwen2.5-7B-Instruct)
     299curl -X 'GET' \
     300  'http://127.0.0.1:8000/params/Qwen2.5-7B-Instruct' \
     301  -H 'accept: application/json' | jq
     302# send a LLM query
     303curl -X POST http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
     304  "model": "Qwen2.5-7B-Instruct",
     305  "messages": [
     306    {"role": "system", "content": "You are a helpful assistant running on NXP i.MX hardware."},
     307    {"role": "user", "content": "Explain what an NPU is in one sentence."}
     308  ],
     309  "max_tokens": 50
     310}' | jq
     311}}}
     312 - run connector by hand (useful for troubleshooting or monitoring)
     313{{{#!bash
     314systemctl stop eiq-aaf-connector.service
     315source "/usr/share/eiq/aaf-connector/venv/bin/activate"
     316connector --host 0.0.0.0 --port 8000 # will run until stopped
     317deactivate
     318}}}
     319
     320
     321== Ara2 SDK examples
     322
     323Here are some Ara2 SDK examples that were 'vibe coded' within minutes
     324
     325=== dvapi stats
     326This is an ANSI c app that provides an example of using the dvapi to connect to the proxy and obtain NPU endpoint stats such as temperature, clocks and usage. Basically it's a re-implementation of the closed source /usr/share/rt-sdk-ara240/scripts/ara2_metrics_bin/hw_metrics.out.
     327
     328ara_status.c:
     329{{{#!c
     330#include <stdio.h>
     331#include <stdlib.h>
     332#include "dvapi.h"
     333
     334int main() {
     335    dv_session_t *session = NULL;
     336    dv_endpoint_t *ep_list = NULL;
     337    int ep_count = 0;
     338    dv_status_code_t status;
     339    const char *socket_path = "/run/proxy.sock";
     340
     341    // 1. Establish session
     342    status = dv_session_create_via_unix_socket(socket_path, &session);
     343    if (status != DV_SUCCESS) {
     344        fprintf(stderr, "Failed to connect: %s\n", dv_stringify_status_code(status));
     345        return 1;
     346    }
     347
     348    // 2. Get list of NPU endpoints
     349    dv_endpoint_get_list(session, &ep_list, &ep_count);
     350
     351    for (int i = 0; i < ep_count; i++) {
     352        dv_endpoint_t *ep = &ep_list[i];
     353        dv_endpoint_statistics_t *stats = NULL;
     354        int s_count = 0;
     355        bool is_busy = false;
     356
     357        // 3. Retrieve status and statistics
     358        dv_get_endpoint_busyness(session, ep, &is_busy);
     359        status = dv_endpoint_get_statistics(session, ep, &stats, &s_count);
     360
     361        if (status == DV_SUCCESS && s_count > 0) {
     362            // DRAM Calculations (Bytes to GB)
     363            double used_gb = (double)stats->ep_dram_stats.ep_total_dram_occupancy_size / 1073741824.0;
     364            double total_gb = (double)stats->ep_dram_stats.ep_total_dram_size / 1073741824.0;
     365            double dram_pct = (total_gb > 0) ? (used_gb / total_gb) * 100.0 : 0.0;
     366
     367            // NPU Utilization (Queue occupancy)
     368            double npu_load = 0.0;
     369            if (stats->ep_infq_stats && stats->ep_infq_stats->length > 0) {
     370                npu_load = ((double)stats->ep_infq_stats->occupancy_count / stats->ep_infq_stats->length) * 100.0;
     371            }
     372
     373            printf("--- NPU Endpoint %d Statistics ---\n", i);
     374            printf("Busy State:       %s\n", is_busy ? "TRUE" : "FALSE");
     375            printf("NPU Utilization:  %.1f%%\n", npu_load);
     376            printf("Temperature:      %.1f C\n", stats->ep_temp);
     377            printf("NNP Clock:        %d MHz\n", stats->ep_nnp_clk);
     378            printf("SBP Clock:        %d MHz\n", stats->ep_sbp_clk);
     379            printf("DRAM Clock:       %d MHz\n", stats->ep_dram_clk);
     380           
     381            // Format: DRAM Usage: 8.2GB/16.0GB (51.3%)
     382            printf("DRAM Usage:       %.1fGB/%.1fGB (%.1f%%)\n", used_gb, total_gb, dram_pct);
     383            printf("\n");
     384
     385            dv_endpoint_free_statistics(stats, s_count);
     386        }
     387    }
     388
     389    // 4. Cleanup
     390    dv_endpoint_free_group(ep_list);
     391    dv_session_close(session);
     392    return 0;
     393}
     394}}}
     395
     396Compile:
     397{{{#!bash
     398apt update && apt install build-essentials
     399gcc ara_status.c -I/usr/include/sdk_ara/ -L/usr/lib/ -laraclient_aarch64 -o ara_status
     400}}}
     401
     402Execution:
     403{{{#!bash
     404# ./ara_status
     405--- NPU Endpoint 0 Statistics ---
     406Busy State:       FALSE
     407NPU Utilization:  0.0%
     408Temperature:      56.0 C
     409NNP Clock:        900 MHz
     410SBP Clock:        355 MHz
     411DRAM Clock:       1066 MHz
     412DRAM Usage:       10.0GB/16.0GB (62.5%)
     413}}}
     414
     415=== command-line python chatbot
     416This is a command-line chatbot written in python using the eIQ AAF Connector
     417
     418chat.py:
     419{{{#!python
     420import json
     421import requests
     422import time
     423import sys
     424
     425API_URL = "http://127.0.0.1:8000/v1/chat/completions"
     426MODEL_NAME = "Qwen2.5-7B-Instruct"
     427
     428def chat():
     429    print(f"--- i.MX LLM Session (Model: {MODEL_NAME}) ---")
     430    print("Type 'exit' to stop.\n")
     431   
     432    history = [{"role": "system", "content": "You are a helpful AI assistant."}]
     433
     434    while True:
     435        user_input = input("You: ")
     436        if user_input.lower() in ['exit', 'quit']:
     437            break
     438
     439        history.append({"role": "user", "content": user_input})
     440        payload = {
     441            "model": MODEL_NAME,
     442            "messages": history,
     443            "temperature": 0.7,
     444            "stream": True
     445        }
     446
     447        print("AI: ", end="", flush=True)
     448       
     449        # Start timing
     450        start_time = time.time()
     451        full_reply = ""
     452        token_count = 0
     453
     454        try:
     455            response = requests.post(API_URL, json=payload, stream=True)
     456            response.raise_for_status()
     457
     458            for line in response.iter_lines():
     459                if line:
     460                    decoded_line = line.decode('utf-8')
     461                    if decoded_line.startswith("data: "):
     462                        content = decoded_line[6:]
     463                        if content.strip() == "[DONE]":
     464                            break
     465                       
     466                        chunk = json.loads(content)
     467                        if "choices" in chunk and chunk["choices"][0]["delta"].get("content"):
     468                            text = chunk["choices"][0]["delta"]["content"]
     469                            print(text, end="", flush=True)
     470                            full_reply += text
     471                            token_count += 1 # Rough estimate of tokens
     472           
     473            # End timing
     474            end_time = time.time()
     475            duration = end_time - start_time
     476            tps = token_count / duration if duration > 0 else 0
     477
     478            print(f"\n\n--- Stats ---")
     479            print(f"Time taken: {duration:.2f} seconds")
     480            print(f"Throughput: {tps:.2f} tokens/sec")
     481            print(f"-------------\n")
     482           
     483            history.append({"role": "assistant", "content": full_reply})
     484
     485        except Exception as e:
     486            print(f"\nError: {e}")
     487
     488if __name__ == "__main__":
     489   chat()
     490}}}
     491
     492Execution:
     493{{{#!bash
     494$ uv venv # create virtual python env in current dir
     495$ uv pip install requests # install python deps
     496$ uv run chat.py # run in venv
     497--- i.MX LLM Session (Model: Qwen2.5-7B-Instruct) ---
     498Type 'exit' to stop.
     499
     500You: Why is the sky blue
     501AI: The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light has a shorter wavelength and is scattered more than other colors by the gases and particles in the atmosphere. This scattering makes the sky appear blue to our eyes.
     502
     503During sunrise and sunset, the sky can appear red or orange because the light has to travel through more of the Earth's atmosphere. This longer path means that more blue and green light is scattered out of the beam, leaving the red and orange wavelengths to dominate the light that reaches our eyes.
     504
     505So, the blue color of the sky is primarily due to the way shorter wavelength light is scattered by the Earth's atmosphere.
     506
     507--- Stats ---
     508Time taken: 29.18 seconds
     509Throughput: 5.04 tokens/sec
     510-------------
     511
     512You: exit
     513}}}
     514
     515=== Web based python chatbot
     516This is a web based chatbot in python using eIQ AAF Connector
     517
     518webchat.py:
     519{{{#!python
     520import sys
     521import os
     522from datetime import datetime
     523
     524# --- KINARA SDK PATH INJECTION ---
     525DVAPI_DIR = "/usr/share/rt-sdk-ara240_2.0.4/include"
     526if os.path.exists(DVAPI_DIR):
     527    sys.path.append(DVAPI_DIR)
     528
     529import streamlit as st
     530import requests
     531import json
     532import time
     533import psutil
     534import threading
     535import argparse
     536
     537# Attempt to import the Kinara Python APIs
     538try:
     539    from dvapi import DVSession, dv_endpoint_get_statistics, dv_endpoint_free_statistics
     540except ImportError:
     541    st.error(f"Critical: dvapi.py not found at {DVAPI_DIR}")
     542    st.stop()
     543
     544# --- ARGUMENT PARSING ---
     545parser = argparse.ArgumentParser()
     546parser.add_argument("--host", type=str, default="127.0.0.1", help="AAF Connector Host")
     547parser.add_argument("--port", type=str, default="8000", help="AAF Connector Port")
     548parser.add_argument("--proxy-sock", type=str, default="/var/run/proxy.sock", help="Kinara Proxy socket")
     549args, _ = parser.parse_known_args()
     550
     551# --- CONFIGURATION ---
     552MODEL_NAME = "Qwen2.5-7B-Instruct"
     553API_URL = f"http://{args.host}:{args.port}/v1/chat/completions"
     554LOGO_URL = "/root/gateworks_logo.png"
     555
     556# --- HARDWARE TELEMETRY HELPERS ---
     557def get_dvapi_npu_stats():
     558    try:
     559        ret, session = DVSession.create_via_unix_socket(args.proxy_sock)
     560        if ret != 0: return None
     561        with session:
     562            ret, ep_list = session.get_endpoint_list()
     563            if ret != 0 or not ep_list: return None
     564            ret, stats_ptr, count = dv_endpoint_get_statistics(session._session, ep_list[0]._endpoint)
     565            if ret == 0 and count.value > 0:
     566                s = stats_ptr[0]
     567                TOTAL_CAPACITY_GB = 16.0
     568                free_gb = s.ep_dram_stats.ep_total_free_size / 1073741824
     569                used_gb = max(0, TOTAL_CAPACITY_GB - free_gb)
     570                dram_pct = (used_gb / TOTAL_CAPACITY_GB) * 100
     571                is_busy = st._npu_lock.locked()
     572                data = {"temp": s.ep_temp, "util": 100 if is_busy else 0, "ram_pct": dram_pct}
     573                dv_endpoint_free_statistics(stats_ptr, count)
     574                return data
     575    except: return None
     576
     577def get_system_thermals():
     578    zones = []
     579    try:
     580        for zone in sorted(os.listdir("/sys/class/thermal/")):
     581            if zone.startswith("thermal_zone"):
     582                with open(f"/sys/class/thermal/{zone}/temp", "r") as f:
     583                    z_temp = int(f.read().strip()) / 1000.0
     584                zones.append(z_temp)
     585    except: pass
     586    return zones
     587
     588def build_sidebar_html():
     589    n_stats = get_dvapi_npu_stats()
     590    cpu_usage = psutil.cpu_percent()
     591    sys_ram = psutil.virtual_memory().percent
     592    thermals = get_system_thermals()
     593   
     594    npu_html = f"<div style='border-top:1px solid #444; padding-top:5px; font-size:0.82rem;'><b>🔥 Ara2 NPU</b><br>"
     595    if n_stats:
     596        npu_html += f"NPU: {n_stats['util']}% {n_stats['temp']:.1f}C | RAM: {n_stats['ram_pct']:.1f}%"
     597    else:
     598        npu_html += "NPU Telemetry Unavailable"
     599    npu_html += "</div>"
     600
     601    sys_html = f"<div style='border-top:1px solid #444; margin-top:8px; padding-top:5px; font-size:0.82rem;'><b>💻 Syst
     602m</b><br>"
     603    temp_str = "/".join([f"{t:.1f}C" for t in thermals])
     604    sys_html += f"CPU: {cpu_usage:.1f}% {temp_str} | RAM: {sys_ram:.1f}%</div>"
     605
     606    perf_val = st.session_state.get('last_perf', 'N/A')
     607    perf_html = f"<div style='border-top:1px solid #444; margin-top:8px; padding-top:5px; font-size:0.82rem;'><b>⚡ Las
     608 Result</b><br>{perf_val}</div>"
     609    return npu_html + sys_html + perf_html
     610
     611# --- GLOBAL STATE ---
     612if not hasattr(st, '_npu_lock'): st._npu_lock = threading.Lock()
     613if not hasattr(st, '_active_user'): st._active_user = "None"
     614
     615st.set_page_config(page_title="Gateworks Venice AI", layout="wide")
     616
     617# --- SIDEBAR ---
     618with st.sidebar:
     619    try: st.image(LOGO_URL, width=220)
     620    except: st.write("### Gateworks Venice")
     621   
     622    status_slot = st.empty()
     623    # Simplified to just show the IP address
     624    user_id = st.context.ip_address or "127.0.0.1"
     625
     626    if st._npu_lock.locked():
     627        status_slot.warning(f"⚠️ BUSY: {st._active_user}")
     628    else:
     629        status_slot.success("🟢 READY")
     630   
     631    st.caption(f"User: {user_id}")
     632   
     633    stats_slot = st.empty()
     634    stats_slot.markdown(build_sidebar_html(), unsafe_allow_html=True)
     635
     636# --- MAIN INTERFACE ---
     637st.title("🤖 i.MX Edge LLM")
     638
     639if "messages" not in st.session_state: st.session_state.messages = []
     640for msg in st.session_state.messages:
     641    with st.chat_message(msg["role"]): st.markdown(msg["content"])
     642
     643if prompt := st.chat_input("Ask the NPU..."):
     644    st.chat_message("user").markdown(prompt)
     645    st.session_state.messages.append({"role": "user", "content": prompt})
     646
     647    # Console: Log the Incoming Request / Queue status
     648    ts_in = datetime.now().strftime("%H:%M:%S")
     649    print(f"[{ts_in}] QUEUED: Request from {user_id} -> '{prompt[:40]}...'")
     650
     651    with st.chat_message("assistant"):
     652        response_placeholder = st.empty()
     653       
     654        # This lock handles the "Queued" logic—it will block here if someone else is talking
     655        with st._npu_lock:
     656            st._active_user = user_id
     657            status_slot.warning(f"⚠️ BUSY: {user_id}")
     658           
     659            ts_start = datetime.now().strftime("%H:%M:%S")
     660            print(f"[{ts_start}] PROCESSING: Active inference for {user_id}")
     661           
     662            full_response, token_count, start_time = "", 0, time.time()
     663
     664            try:
     665                payload = {"model": MODEL_NAME, "messages": st.session_state.messages, "stream": True}
     666                r = requests.post(API_URL, json=payload, stream=True, timeout=120)
     667               
     668                for line in r.iter_lines():
     669                    if line:
     670                        decoded = line.decode('utf-8').replace('data: ', '')
     671                        if decoded.strip() == "[DONE]": break
     672                        try:
     673                            chunk = json.loads(decoded)
     674                            content = chunk["choices"][0]["delta"].get("content", "")
     675                            if content:
     676                                full_response += content
     677                                token_count += 1
     678                                response_placeholder.markdown(full_response + "▌")
     679                               
     680                                if token_count % 12 == 0:
     681                                    stats_slot.markdown(build_sidebar_html(), unsafe_allow_html=True)
     682                        except: continue
     683
     684                duration = time.time() - start_time
     685                tps = token_count / duration if duration > 0 else 0
     686                st.session_state.last_perf = f"{token_count} tokens @ {tps:.1f} t/s"
     687               
     688                response_placeholder.markdown(full_response)
     689                st.session_state.messages.append({"role": "assistant", "content": full_response})
     690
     691                # Console: Log Completion
     692                ts_out = datetime.now().strftime("%H:%M:%S")
     693                print(f"[{ts_out}] COMPLETE: {user_id} | {token_count} tokens | {tps:.1f} t/s")
     694
     695            except Exception as e:
     696                st.error(f"Error: {e}")
     697                print(f"[{datetime.now().strftime('%H:%M:%S')}] ERROR: {e}")
     698            finally:
     699                st._active_user = "None"
     700                stats_slot.markdown(build_sidebar_html(), unsafe_allow_html=True)
     701                status_slot.success("🟢 READY")
     702                st.rerun()
     703}}}
     704
     705Execution:
     706{{{#!bash
     707$ uv venv # create virtual python env in current dir
     708$ uv pip install streamlit requests psutil argparse # install python deps
     709$ uv run streamlit run webchat.py --server.address 0.0.0.0 --server.port 8501 -- --user-map users.json --host 127.0.0.1 --port 8000
     710}}}
    29711
    30712