| 27 | | Coming soon! |
| 28 | | |
| | 32 | == Using NXP deb distribution packages |
| | 33 | Currently NXP is distributing the Ara2 runtime in binary form. They have released the kernel driver as opensource which resolves kernel compatibility issues which is a huge step but the userspace apps and libraries remain dynamic linked binary objects. |
| | 34 | |
| | 35 | The current deb packages have some shortcomings: |
| | 36 | - packages are not very consistent; some have a systemd service in the data, others create one via postinst |
| | 37 | - they were intended to install on top of the NXP Embedded Linux Firmware (version L6.12.34-2.1.0) and intended to support only NXP dev kit boards so the dependencies are incomplete and don't match what would be on other Linux based root filesystems (Ubuntu system for example) |
| | 38 | |
| | 39 | If you extract the deb's and examine the DEBIAN directory you can see how to install them on other boards and root filesystems. |
| | 40 | |
| | 41 | It is fairly common for AI models to make use of python and NXP is using that here. The rt-sdk-ara2 includes a couple of Python Wheels that are used in the examples. A Python Wheel is a standard built-package format for distributing Python libraries. It is essentially a ZIP-format archive with a .whl extension that contains all the files needed for a package to run immediately after being. It's also standard when using Python to run into package version incompatibilities which is why user based Python virtual environments are used. |
| | 42 | |
| | 43 | Note the deb files require an NXP account to download (from [https://www.nxp.com/design/design-center/software/embedded-software/ara-software-development-kit:ARA-SDK NXP ARA SDK Landing page]) so the instructions below assume you have them already in the current directory. |
| | 44 | |
| | 45 | [=#rt-sdk-ara2] |
| | 46 | === rt-sdk-ara2 |
| | 47 | The ara2 runtime should not really be considered an 'SDK' - it has nothing to do with software development, its simply the set of utils and libs needed to use the Ara2. |
| | 48 | |
| | 49 | The rt-sdk-ara2 provides a complete runtime environment for AI/ML acceleration using the Ara240 NPU on for aarch64. This package includes: |
| | 50 | * Runtime libraries for Ara240 NPU integration |
| | 51 | * Python bindings (DVAPI) for custom inference applications |
| | 52 | * Optimum-Ara framework for LLMs and VLMs |
| | 53 | * GStreamer plugins for Real-Time Detection Object Applications |
| | 54 | * Helper scripts for monitoring, benchmarking, and model management |
| | 55 | * Systemd service for automatic hardware initialization |
| | 56 | |
| | 57 | Installation on a Gateworks board with Ubuntu based OS: |
| | 58 | - extract the debian 'data' (do not install the package!) |
| | 59 | {{{#!bash |
| | 60 | # extract data (but don't install) |
| | 61 | dpkg-deb --vextract rt-sdk-ara2_2.0.4.deb / |
| | 62 | }}} |
| | 63 | - take care of postinst steps |
| | 64 | - miscelaneous |
| | 65 | {{{#!bash |
| | 66 | # create app dirs (used for models) |
| | 67 | mkdir -pv /usr/share/{cnn,llm} |
| | 68 | # get rid of circular symlink |
| | 69 | rm /usr/share/rt-sdk-ara240_2.0.4/rt-sdk-ara240_2.0.4 |
| | 70 | }}} |
| | 71 | - install uv package manager for Python virtualization and packaging for local user (which is installed to ~/.local/bin so we create symlinks to /usr/bin) |
| | 72 | {{{#!bash |
| | 73 | apt update && apt install -y curl |
| | 74 | curl -LsSf https://astral.sh/uv/install.sh | sh |
| | 75 | ln -s /root/.local/bin/uv /usr/bin/uv |
| | 76 | ln -s /root/.local/bin/uvx /usr/bin/uvx |
| | 77 | }}} |
| | 78 | - build driver (the one in the deb is specific to the IMX BSP kernel) |
| | 79 | {{{#!bash |
| | 80 | apt update && apt install -y build-essential git bc file flex bison |
| | 81 | git clone https://github.com/nxp-imx-support/uiodma-driver |
| | 82 | ( cd uiodma-driver/uiodma; make ) |
| | 83 | # install it where the rt service expects to find it (over the top of the non-compatible one) |
| | 84 | cp uiodma-driver/uiodma/uiodma.ko /usr/share/rt-sdk-ara240/driver/ |
| | 85 | }}} |
| | 86 | - enable service: |
| | 87 | {{{#!bash |
| | 88 | # enable service |
| | 89 | systemctl enable rt-sdk-ara2.service |
| | 90 | # start service now (unless you reboot) |
| | 91 | systemctl start rt-sdk-ara2.service |
| | 92 | }}} |
| | 93 | - use 'fetch_models' to pre-compiled models for testing via the fetch_models script which will fetch models from !HuggingFace. |
| | 94 | {{{#!bash |
| | 95 | # list models available for nxp/ara |
| | 96 | fetch_models --list |
| | 97 | # install YOLOv8 |
| | 98 | fetch_models --repo-id nxp/YOLOv8 # 746MB (711MiB) |
| | 99 | }}} |
| | 100 | - the script is a python wrapper that uses uvx and the fetch-models python wheel (/usr/share/python-wheels/fetch_models-1.0.0-py3-none-any.whl) to fetch and install models from !HuggingFace HUB |
| | 101 | - the models will be installed in either /usr/share/cnn (Convolutional Neural Network) and /usr/share/llm (Large Language Model) |
| | 102 | - NXP has Ara2 optimized models at https://huggingface.co/nxp |
| | 103 | - the script has a hard coded list of models available and where to install them locally. You can use 'python -m zipfile -e /usr/share/python-wheels/fetch_models-1.0.0-py3-none-any.whl ./fetch_models' to see what it's doing |
| | 104 | |
| | 105 | Notable Files: |
| | 106 | - /usr/lib/ |
| | 107 | - libaraclient_aarch64.so - base library for interfacing with ara2 |
| | 108 | - libara_vision_inference.so - inference lib that builds on libaraclient |
| | 109 | - /usr/lib/gstreamer-1.0 |
| | 110 | - libgstdvPre.so |
| | 111 | - libgstdvInfo.so |
| | 112 | - libgstdvPost.so |
| | 113 | - /usr/share/rt-sdk-ara240 (symlink to a version independent dir at same location) |
| | 114 | - hw_utils/boot_img - firmware files |
| | 115 | - hw_utils/ddr_config - ddr binaries |
| | 116 | - hw_utils/bins/ - the hw utils for bringup/programming |
| | 117 | - optimum-ara/ - extension of the Hugging Face library that integrates with Ara240 DNPU |
| | 118 | - scripts - various wrappers around the tools etc |
| | 119 | - nnapp - tool for benchmarking models |
| | 120 | - config - various example yaml config files used for proxy/nnapp |
| | 121 | - include/dvapi.py - python bindings to dvapi |
| | 122 | - driver/uiodma.ko - driver (where the setup script expects to find it) |
| | 123 | - /usr/share/python-wheels - python wheels for fetch_models and optimum_ara |
| | 124 | - /usr/shar/doc/rt-sdk-ara2 - license info |
| | 125 | - /usr/include/sdk_ara - headers for C libs |
| | 126 | - /usr/bin - various scripts |
| | 127 | - /etc/udev/rules.d/99-ara2.rules - udev rule which makes the PCI ID dependent on the systemd service |
| | 128 | - /etc/systemd/system/rt-sdk-ara2.service - systemd service that handles the various hw util config |
| | 129 | - /etc/rt-sdk-ara240/cnn_config.yaml - config for nnapp |
| | 130 | - /etc/rt-sdk-ara240/proxy_config.yam - config for proxy |
| | 131 | |
| | 132 | Notes: |
| | 133 | - This will not program flash - that is a manual step only required if there is an update |
| | 134 | - The 'uv' package manager is a fast all-in-one Python package and project manager written in Rust which makes it easy to work with virtual env's to avoid Python package version clashing which is essential |
| | 135 | - on bootup make sure you wait for the console messages indicating the Proxy is launched before using it as it can take a couple of minutes |
| | 136 | - the binary tools and libs are all currently dynamic linked against stdlibc |
| | 137 | - the GStreamer libs have compatibility issues with modern GStreamer |
| | 138 | |
| | 139 | Verification steps: |
| | 140 | 1. show chip_info |
| | 141 | {{{#!bash |
| | 142 | chip_info.sh |
| | 143 | }}} |
| | 144 | 1. verify service |
| | 145 | {{{#!bash |
| | 146 | # show service status |
| | 147 | systemctl status rt-sdk-ara2.service --no-pager -l |
| | 148 | # view detailed service logs |
| | 149 | journalctl -u rt-sdk-ara2.service |
| | 150 | # verify proxy is running (critical) |
| | 151 | ps -eaf | grep proxy_ara240 |
| | 152 | }}} |
| | 153 | |
| | 154 | Examples: |
| | 155 | - Download pre-compiled models for testing: |
| | 156 | - The fetch_models script from the ara2-rt will fetch models from !HuggingFace. |
| | 157 | {{{#!bash |
| | 158 | # list models available for nxp/ara |
| | 159 | fetch_models --list |
| | 160 | # install YOLOv8 |
| | 161 | fetch_models --repo-id nxp/YOLOv8 # 746MB (711MiB) |
| | 162 | }}} |
| | 163 | - the 'fetch_models' script is a python wrapper that uses uvx and the fetch-models python wheel (/usr/share/python-wheels/fetch_models-1.0.0-py3-none-any.whl) to fetch and install models from !HuggingFace HUB |
| | 164 | - the models will be installed in /usr/share/cnn (Convolutional Neural Network) and /usr/share/llm (Large Language Model) |
| | 165 | - NXP has Ara2 optimized models at https://huggingface.co/nxp |
| | 166 | - Run performance benchmark (uses nnapp) |
| | 167 | {{{#!bash |
| | 168 | run_model_perf.sh |
| | 169 | }}} |
| | 170 | - the 'run_model_perf.sh' script makes it easy to list and show model categories and models and is a wrapper around the nnapp app which has a lot of options and a config file |
| | 171 | - monitor real-time NPU metrics including utilization, temperature, DRAM usage and device state (interactively during benchmarking or model execution) |
| | 172 | {{{#!bash |
| | 173 | ara2_metrics.sh |
| | 174 | }}} |
| | 175 | |
| | 176 | |
| | 177 | [=#eiq-aaf-connector] |
| | 178 | === eIQ AAF Connector |
| | 179 | The eIQ AAF Connector (edge Intelligence Ara Application Framework) |
| | 180 | is a REST-based server that enables LLM inference on NXP i.MX processors with the ARA-240 DNPU. The API implemented is the de-facto API standard created by OpenAI for ChatGPT. It provides a simple Chat Completions-based HTTP interface for serving models to client applications. |
| | 181 | |
| | 182 | Requirements: |
| | 183 | - python 3.13 (we will install in a virtual env) |
| | 184 | - uv - used for the user-specific Python virtual environment |
| | 185 | - Optimum Ara framework for running Large Language Models (LLMs) and Vision-Language Models (VLMs) on Ara240 (part of rt-sdk) |
| | 186 | - OpenCV (dependency of the QwenVL engine) |
| | 187 | - Models |
| | 188 | |
| | 189 | Installation on a Gateworks board with Ubuntu based OS: |
| | 190 | - extract the debian 'data' (do not install the package!) |
| | 191 | {{{#!bash |
| | 192 | # extract data (but don't install) |
| | 193 | dpkg-deb --vextract eiq-aaf-connector_2.0.deb / |
| | 194 | }}} |
| | 195 | - take care of postinst steps |
| | 196 | 1. Create the /usr/share/eiq/aaf-connector/venv (used by /usr/share/eiq/aaf-connector/venv/bin/connector) |
| | 197 | {{{#!bash |
| | 198 | # needs python 3.13 so we will install it in a virtual env for this user |
| | 199 | uv python install 3.13 |
| | 200 | uv venv --python 3.13 "/usr/share/eiq/aaf-connector/venv" |
| | 201 | # activate venv |
| | 202 | source "/usr/share/eiq/aaf-connector/venv/bin/activate" |
| | 203 | # install Python dependencies in venv from the Optimum Ara wheel |
| | 204 | uv pip install --no-progress /usr/share/python-wheels/optimum_ara-2.0.0.2-py3-none-any.whl |
| | 205 | # install Python dependencies in venv from the eIQ wheel in this package |
| | 206 | uv pip install --no-progress /usr/share/python-wheels/eiq_aaf_connector-2.0.0-py3-none-any.whl |
| | 207 | # ditch the default opencv-python which depends on libgl1-mesa and install the headless version instead |
| | 208 | uv pip uninstall opencv-python |
| | 209 | uv pip install opencv-python-headless |
| | 210 | # deactivate venv |
| | 211 | deactivate |
| | 212 | }}} |
| | 213 | 1. Create systemd service file (not sure why this wasn't in the deb) |
| | 214 | {{{#!bash |
| | 215 | cat > /etc/systemd/system/eiq-aaf-connector.service << EOF |
| | 216 | [Unit] |
| | 217 | Description=eIQ AAF Connector Service |
| | 218 | After=network.target rt-sdk-ara2.service |
| | 219 | |
| | 220 | [Service] |
| | 221 | Type=simple |
| | 222 | User=root |
| | 223 | WorkingDirectory=/usr/share/eiq/aaf-connector |
| | 224 | ExecStart=/usr/share/eiq/aaf-connector/venv/bin/connector --host 127.0.0.1 --port 8000 |
| | 225 | Restart=on-failure |
| | 226 | RestartSec=5s |
| | 227 | StandardOutput=journal |
| | 228 | StandardError=journal |
| | 229 | |
| | 230 | [Install] |
| | 231 | WantedBy=multi-user.target |
| | 232 | EOF |
| | 233 | }}} |
| | 234 | - If you wish this to be accessible from the Network set the host to '0.0.0.0' instead of '127.0.0.1': |
| | 235 | {{{#!bash |
| | 236 | sed -i 's|--host 127.0.0.1|--host 0.0.0.0|g' /etc/systemd/system/eiq-aaf-connector.service |
| | 237 | }}} |
| | 238 | 1. add Ara2 optimized LLM models (these get installed to /usr/share/llm) |
| | 239 | {{{#!bash |
| | 240 | fetch_models --repo-id nxp/Qwen2.5-7B-Instruct-Ara240 # 7.7GiB |
| | 241 | fetch_models --repo-id nxp/Qwen2.5-Coder-1.5B-Ara240 # 1.67GiB |
| | 242 | }}} |
| | 243 | 1. edit the config file to enable the two models we just downloaded (using jq): |
| | 244 | {{{#!bash |
| | 245 | apt update && apt install -y jq |
| | 246 | jq '(.available_models[] | select(.name == "Qwen2.5-Coder-1.5B") | .enabled) = true' /usr/share/eiq/aaf-connector/server_config.json > /tmp/config.json && \ |
| | 247 | mv /tmp/config.json /usr/share/eiq/aaf-connector/server_config.json |
| | 248 | jq '(.available_models[] | select(.name == "Qwen2.5-7B-Instruct") | .enabled) = true' /usr/share/eiq/aaf-connector/server_config.json > /tmp/config.json && \ |
| | 249 | mv /tmp/config.json /usr/share/eiq/aaf-connector/server_config.json |
| | 250 | jq '(.available_models[] | select(.name == "Qwen2.5-7B-Instruct") | .enabled) = true' /usr/share/eiq |
| | 251 | /aaf-connector/server_config.json > /tmp/config.json && mv /tmp/config.json /usr/share/eiq/aaf-connector/server_config.j |
| | 252 | son |
| | 253 | }}} |
| | 254 | - you can just as easily edit the file manually if you want |
| | 255 | 1. Enable and start service |
| | 256 | {{{#!bash |
| | 257 | # Enable service on boot |
| | 258 | systemctl enable eiq-aaf-connector.service |
| | 259 | # Start the service now (or reboot) |
| | 260 | systemctl start eiq-aaf-connector.service |
| | 261 | }}} |
| | 262 | |
| | 263 | Note that it takes several minutes for the service to actually be ready for connections as it must process the models (monitor with 'journalctl -u eiq-aaf-connector.service --no-pager -f' and test that its ready for listening with 'ss -tulpn | grep :8000'). |
| | 264 | |
| | 265 | By default, the connector configured above will start on 127.0.0.1:8000 which is the local loopback interface. To be able to run requests from another device, you can change the host to '0.0.0.0' in the service file. |
| | 266 | |
| | 267 | Notable Files: |
| | 268 | - /usr/share/eiq/aaf-connector/server_config.json (server config file) |
| | 269 | - /usr/share/python-wheels/eiq_aaf_connector-2.0.0-py3-none-any.whl - Python wheel |
| | 270 | - /usr/bin/aaf-connector - shell script that activates the venv and executes the connector |
| | 271 | - /usr/share/eiq/aaf-connector/venv - Python virtual env used by connector |
| | 272 | - /etc/systemd/system/eiq-aaf-connector.service - systemd service |
| | 273 | |
| | 274 | |
| | 275 | The connector self-hosts API documentation at http://<serverip>:8000/docs |
| | 276 | |
| | 277 | Example Usage: |
| | 278 | - verify connector running |
| | 279 | {{{#!bash |
| | 280 | # show service status |
| | 281 | systemctl status eiq-aaf-connector.service --no-pager -l |
| | 282 | # view detailed service logs |
| | 283 | journalctl -u eiq-aaf-connector.service |
| | 284 | # verify process exists |
| | 285 | ps -ef | grep aaf-connector |
| | 286 | # verify port open |
| | 287 | ss -tulpn | grep :8000 # show IP:PORT server is listening on |
| | 288 | }}} |
| | 289 | - view API docs and interact with server (requires changing the host to '0.0.0.0' in the !ExecStart config for /etc/systemd/system/eiq-aaf-connector.service by opening !http://<serverip>:8000/docs |
| | 290 | - use API via curl/jq |
| | 291 | {{{#!bash |
| | 292 | # make sure curl and jq are installed (jq allows easy interaction with json data) |
| | 293 | apt install -y curl jq |
| | 294 | # list of models |
| | 295 | curl -X 'GET' \ |
| | 296 | 'http://127.0.0.1:8000/v1/models' \ |
| | 297 | -H 'accept: application/json' | jq |
| | 298 | # get info about a specific model (Qwen2.5-7B-Instruct) |
| | 299 | curl -X 'GET' \ |
| | 300 | 'http://127.0.0.1:8000/params/Qwen2.5-7B-Instruct' \ |
| | 301 | -H 'accept: application/json' | jq |
| | 302 | # send a LLM query |
| | 303 | curl -X POST http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ |
| | 304 | "model": "Qwen2.5-7B-Instruct", |
| | 305 | "messages": [ |
| | 306 | {"role": "system", "content": "You are a helpful assistant running on NXP i.MX hardware."}, |
| | 307 | {"role": "user", "content": "Explain what an NPU is in one sentence."} |
| | 308 | ], |
| | 309 | "max_tokens": 50 |
| | 310 | }' | jq |
| | 311 | }}} |
| | 312 | - run connector by hand (useful for troubleshooting or monitoring) |
| | 313 | {{{#!bash |
| | 314 | systemctl stop eiq-aaf-connector.service |
| | 315 | source "/usr/share/eiq/aaf-connector/venv/bin/activate" |
| | 316 | connector --host 0.0.0.0 --port 8000 # will run until stopped |
| | 317 | deactivate |
| | 318 | }}} |
| | 319 | |
| | 320 | |
| | 321 | == Ara2 SDK examples |
| | 322 | |
| | 323 | Here are some Ara2 SDK examples that were 'vibe coded' within minutes |
| | 324 | |
| | 325 | === dvapi stats |
| | 326 | This is an ANSI c app that provides an example of using the dvapi to connect to the proxy and obtain NPU endpoint stats such as temperature, clocks and usage. Basically it's a re-implementation of the closed source /usr/share/rt-sdk-ara240/scripts/ara2_metrics_bin/hw_metrics.out. |
| | 327 | |
| | 328 | ara_status.c: |
| | 329 | {{{#!c |
| | 330 | #include <stdio.h> |
| | 331 | #include <stdlib.h> |
| | 332 | #include "dvapi.h" |
| | 333 | |
| | 334 | int main() { |
| | 335 | dv_session_t *session = NULL; |
| | 336 | dv_endpoint_t *ep_list = NULL; |
| | 337 | int ep_count = 0; |
| | 338 | dv_status_code_t status; |
| | 339 | const char *socket_path = "/run/proxy.sock"; |
| | 340 | |
| | 341 | // 1. Establish session |
| | 342 | status = dv_session_create_via_unix_socket(socket_path, &session); |
| | 343 | if (status != DV_SUCCESS) { |
| | 344 | fprintf(stderr, "Failed to connect: %s\n", dv_stringify_status_code(status)); |
| | 345 | return 1; |
| | 346 | } |
| | 347 | |
| | 348 | // 2. Get list of NPU endpoints |
| | 349 | dv_endpoint_get_list(session, &ep_list, &ep_count); |
| | 350 | |
| | 351 | for (int i = 0; i < ep_count; i++) { |
| | 352 | dv_endpoint_t *ep = &ep_list[i]; |
| | 353 | dv_endpoint_statistics_t *stats = NULL; |
| | 354 | int s_count = 0; |
| | 355 | bool is_busy = false; |
| | 356 | |
| | 357 | // 3. Retrieve status and statistics |
| | 358 | dv_get_endpoint_busyness(session, ep, &is_busy); |
| | 359 | status = dv_endpoint_get_statistics(session, ep, &stats, &s_count); |
| | 360 | |
| | 361 | if (status == DV_SUCCESS && s_count > 0) { |
| | 362 | // DRAM Calculations (Bytes to GB) |
| | 363 | double used_gb = (double)stats->ep_dram_stats.ep_total_dram_occupancy_size / 1073741824.0; |
| | 364 | double total_gb = (double)stats->ep_dram_stats.ep_total_dram_size / 1073741824.0; |
| | 365 | double dram_pct = (total_gb > 0) ? (used_gb / total_gb) * 100.0 : 0.0; |
| | 366 | |
| | 367 | // NPU Utilization (Queue occupancy) |
| | 368 | double npu_load = 0.0; |
| | 369 | if (stats->ep_infq_stats && stats->ep_infq_stats->length > 0) { |
| | 370 | npu_load = ((double)stats->ep_infq_stats->occupancy_count / stats->ep_infq_stats->length) * 100.0; |
| | 371 | } |
| | 372 | |
| | 373 | printf("--- NPU Endpoint %d Statistics ---\n", i); |
| | 374 | printf("Busy State: %s\n", is_busy ? "TRUE" : "FALSE"); |
| | 375 | printf("NPU Utilization: %.1f%%\n", npu_load); |
| | 376 | printf("Temperature: %.1f C\n", stats->ep_temp); |
| | 377 | printf("NNP Clock: %d MHz\n", stats->ep_nnp_clk); |
| | 378 | printf("SBP Clock: %d MHz\n", stats->ep_sbp_clk); |
| | 379 | printf("DRAM Clock: %d MHz\n", stats->ep_dram_clk); |
| | 380 | |
| | 381 | // Format: DRAM Usage: 8.2GB/16.0GB (51.3%) |
| | 382 | printf("DRAM Usage: %.1fGB/%.1fGB (%.1f%%)\n", used_gb, total_gb, dram_pct); |
| | 383 | printf("\n"); |
| | 384 | |
| | 385 | dv_endpoint_free_statistics(stats, s_count); |
| | 386 | } |
| | 387 | } |
| | 388 | |
| | 389 | // 4. Cleanup |
| | 390 | dv_endpoint_free_group(ep_list); |
| | 391 | dv_session_close(session); |
| | 392 | return 0; |
| | 393 | } |
| | 394 | }}} |
| | 395 | |
| | 396 | Compile: |
| | 397 | {{{#!bash |
| | 398 | apt update && apt install build-essentials |
| | 399 | gcc ara_status.c -I/usr/include/sdk_ara/ -L/usr/lib/ -laraclient_aarch64 -o ara_status |
| | 400 | }}} |
| | 401 | |
| | 402 | Execution: |
| | 403 | {{{#!bash |
| | 404 | # ./ara_status |
| | 405 | --- NPU Endpoint 0 Statistics --- |
| | 406 | Busy State: FALSE |
| | 407 | NPU Utilization: 0.0% |
| | 408 | Temperature: 56.0 C |
| | 409 | NNP Clock: 900 MHz |
| | 410 | SBP Clock: 355 MHz |
| | 411 | DRAM Clock: 1066 MHz |
| | 412 | DRAM Usage: 10.0GB/16.0GB (62.5%) |
| | 413 | }}} |
| | 414 | |
| | 415 | === command-line python chatbot |
| | 416 | This is a command-line chatbot written in python using the eIQ AAF Connector |
| | 417 | |
| | 418 | chat.py: |
| | 419 | {{{#!python |
| | 420 | import json |
| | 421 | import requests |
| | 422 | import time |
| | 423 | import sys |
| | 424 | |
| | 425 | API_URL = "http://127.0.0.1:8000/v1/chat/completions" |
| | 426 | MODEL_NAME = "Qwen2.5-7B-Instruct" |
| | 427 | |
| | 428 | def chat(): |
| | 429 | print(f"--- i.MX LLM Session (Model: {MODEL_NAME}) ---") |
| | 430 | print("Type 'exit' to stop.\n") |
| | 431 | |
| | 432 | history = [{"role": "system", "content": "You are a helpful AI assistant."}] |
| | 433 | |
| | 434 | while True: |
| | 435 | user_input = input("You: ") |
| | 436 | if user_input.lower() in ['exit', 'quit']: |
| | 437 | break |
| | 438 | |
| | 439 | history.append({"role": "user", "content": user_input}) |
| | 440 | payload = { |
| | 441 | "model": MODEL_NAME, |
| | 442 | "messages": history, |
| | 443 | "temperature": 0.7, |
| | 444 | "stream": True |
| | 445 | } |
| | 446 | |
| | 447 | print("AI: ", end="", flush=True) |
| | 448 | |
| | 449 | # Start timing |
| | 450 | start_time = time.time() |
| | 451 | full_reply = "" |
| | 452 | token_count = 0 |
| | 453 | |
| | 454 | try: |
| | 455 | response = requests.post(API_URL, json=payload, stream=True) |
| | 456 | response.raise_for_status() |
| | 457 | |
| | 458 | for line in response.iter_lines(): |
| | 459 | if line: |
| | 460 | decoded_line = line.decode('utf-8') |
| | 461 | if decoded_line.startswith("data: "): |
| | 462 | content = decoded_line[6:] |
| | 463 | if content.strip() == "[DONE]": |
| | 464 | break |
| | 465 | |
| | 466 | chunk = json.loads(content) |
| | 467 | if "choices" in chunk and chunk["choices"][0]["delta"].get("content"): |
| | 468 | text = chunk["choices"][0]["delta"]["content"] |
| | 469 | print(text, end="", flush=True) |
| | 470 | full_reply += text |
| | 471 | token_count += 1 # Rough estimate of tokens |
| | 472 | |
| | 473 | # End timing |
| | 474 | end_time = time.time() |
| | 475 | duration = end_time - start_time |
| | 476 | tps = token_count / duration if duration > 0 else 0 |
| | 477 | |
| | 478 | print(f"\n\n--- Stats ---") |
| | 479 | print(f"Time taken: {duration:.2f} seconds") |
| | 480 | print(f"Throughput: {tps:.2f} tokens/sec") |
| | 481 | print(f"-------------\n") |
| | 482 | |
| | 483 | history.append({"role": "assistant", "content": full_reply}) |
| | 484 | |
| | 485 | except Exception as e: |
| | 486 | print(f"\nError: {e}") |
| | 487 | |
| | 488 | if __name__ == "__main__": |
| | 489 | chat() |
| | 490 | }}} |
| | 491 | |
| | 492 | Execution: |
| | 493 | {{{#!bash |
| | 494 | $ uv venv # create virtual python env in current dir |
| | 495 | $ uv pip install requests # install python deps |
| | 496 | $ uv run chat.py # run in venv |
| | 497 | --- i.MX LLM Session (Model: Qwen2.5-7B-Instruct) --- |
| | 498 | Type 'exit' to stop. |
| | 499 | |
| | 500 | You: Why is the sky blue |
| | 501 | AI: The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with molecules and small particles in the air. Sunlight is made up of different colors, each of which has a different wavelength. Blue light has a shorter wavelength and is scattered more than other colors by the gases and particles in the atmosphere. This scattering makes the sky appear blue to our eyes. |
| | 502 | |
| | 503 | During sunrise and sunset, the sky can appear red or orange because the light has to travel through more of the Earth's atmosphere. This longer path means that more blue and green light is scattered out of the beam, leaving the red and orange wavelengths to dominate the light that reaches our eyes. |
| | 504 | |
| | 505 | So, the blue color of the sky is primarily due to the way shorter wavelength light is scattered by the Earth's atmosphere. |
| | 506 | |
| | 507 | --- Stats --- |
| | 508 | Time taken: 29.18 seconds |
| | 509 | Throughput: 5.04 tokens/sec |
| | 510 | ------------- |
| | 511 | |
| | 512 | You: exit |
| | 513 | }}} |
| | 514 | |
| | 515 | === Web based python chatbot |
| | 516 | This is a web based chatbot in python using eIQ AAF Connector |
| | 517 | |
| | 518 | webchat.py: |
| | 519 | {{{#!python |
| | 520 | import sys |
| | 521 | import os |
| | 522 | from datetime import datetime |
| | 523 | |
| | 524 | # --- KINARA SDK PATH INJECTION --- |
| | 525 | DVAPI_DIR = "/usr/share/rt-sdk-ara240_2.0.4/include" |
| | 526 | if os.path.exists(DVAPI_DIR): |
| | 527 | sys.path.append(DVAPI_DIR) |
| | 528 | |
| | 529 | import streamlit as st |
| | 530 | import requests |
| | 531 | import json |
| | 532 | import time |
| | 533 | import psutil |
| | 534 | import threading |
| | 535 | import argparse |
| | 536 | |
| | 537 | # Attempt to import the Kinara Python APIs |
| | 538 | try: |
| | 539 | from dvapi import DVSession, dv_endpoint_get_statistics, dv_endpoint_free_statistics |
| | 540 | except ImportError: |
| | 541 | st.error(f"Critical: dvapi.py not found at {DVAPI_DIR}") |
| | 542 | st.stop() |
| | 543 | |
| | 544 | # --- ARGUMENT PARSING --- |
| | 545 | parser = argparse.ArgumentParser() |
| | 546 | parser.add_argument("--host", type=str, default="127.0.0.1", help="AAF Connector Host") |
| | 547 | parser.add_argument("--port", type=str, default="8000", help="AAF Connector Port") |
| | 548 | parser.add_argument("--proxy-sock", type=str, default="/var/run/proxy.sock", help="Kinara Proxy socket") |
| | 549 | args, _ = parser.parse_known_args() |
| | 550 | |
| | 551 | # --- CONFIGURATION --- |
| | 552 | MODEL_NAME = "Qwen2.5-7B-Instruct" |
| | 553 | API_URL = f"http://{args.host}:{args.port}/v1/chat/completions" |
| | 554 | LOGO_URL = "/root/gateworks_logo.png" |
| | 555 | |
| | 556 | # --- HARDWARE TELEMETRY HELPERS --- |
| | 557 | def get_dvapi_npu_stats(): |
| | 558 | try: |
| | 559 | ret, session = DVSession.create_via_unix_socket(args.proxy_sock) |
| | 560 | if ret != 0: return None |
| | 561 | with session: |
| | 562 | ret, ep_list = session.get_endpoint_list() |
| | 563 | if ret != 0 or not ep_list: return None |
| | 564 | ret, stats_ptr, count = dv_endpoint_get_statistics(session._session, ep_list[0]._endpoint) |
| | 565 | if ret == 0 and count.value > 0: |
| | 566 | s = stats_ptr[0] |
| | 567 | TOTAL_CAPACITY_GB = 16.0 |
| | 568 | free_gb = s.ep_dram_stats.ep_total_free_size / 1073741824 |
| | 569 | used_gb = max(0, TOTAL_CAPACITY_GB - free_gb) |
| | 570 | dram_pct = (used_gb / TOTAL_CAPACITY_GB) * 100 |
| | 571 | is_busy = st._npu_lock.locked() |
| | 572 | data = {"temp": s.ep_temp, "util": 100 if is_busy else 0, "ram_pct": dram_pct} |
| | 573 | dv_endpoint_free_statistics(stats_ptr, count) |
| | 574 | return data |
| | 575 | except: return None |
| | 576 | |
| | 577 | def get_system_thermals(): |
| | 578 | zones = [] |
| | 579 | try: |
| | 580 | for zone in sorted(os.listdir("/sys/class/thermal/")): |
| | 581 | if zone.startswith("thermal_zone"): |
| | 582 | with open(f"/sys/class/thermal/{zone}/temp", "r") as f: |
| | 583 | z_temp = int(f.read().strip()) / 1000.0 |
| | 584 | zones.append(z_temp) |
| | 585 | except: pass |
| | 586 | return zones |
| | 587 | |
| | 588 | def build_sidebar_html(): |
| | 589 | n_stats = get_dvapi_npu_stats() |
| | 590 | cpu_usage = psutil.cpu_percent() |
| | 591 | sys_ram = psutil.virtual_memory().percent |
| | 592 | thermals = get_system_thermals() |
| | 593 | |
| | 594 | npu_html = f"<div style='border-top:1px solid #444; padding-top:5px; font-size:0.82rem;'><b>🔥 Ara2 NPU</b><br>" |
| | 595 | if n_stats: |
| | 596 | npu_html += f"NPU: {n_stats['util']}% {n_stats['temp']:.1f}C | RAM: {n_stats['ram_pct']:.1f}%" |
| | 597 | else: |
| | 598 | npu_html += "NPU Telemetry Unavailable" |
| | 599 | npu_html += "</div>" |
| | 600 | |
| | 601 | sys_html = f"<div style='border-top:1px solid #444; margin-top:8px; padding-top:5px; font-size:0.82rem;'><b>💻 Syst |
| | 602 | m</b><br>" |
| | 603 | temp_str = "/".join([f"{t:.1f}C" for t in thermals]) |
| | 604 | sys_html += f"CPU: {cpu_usage:.1f}% {temp_str} | RAM: {sys_ram:.1f}%</div>" |
| | 605 | |
| | 606 | perf_val = st.session_state.get('last_perf', 'N/A') |
| | 607 | perf_html = f"<div style='border-top:1px solid #444; margin-top:8px; padding-top:5px; font-size:0.82rem;'><b>⚡ Las |
| | 608 | Result</b><br>{perf_val}</div>" |
| | 609 | return npu_html + sys_html + perf_html |
| | 610 | |
| | 611 | # --- GLOBAL STATE --- |
| | 612 | if not hasattr(st, '_npu_lock'): st._npu_lock = threading.Lock() |
| | 613 | if not hasattr(st, '_active_user'): st._active_user = "None" |
| | 614 | |
| | 615 | st.set_page_config(page_title="Gateworks Venice AI", layout="wide") |
| | 616 | |
| | 617 | # --- SIDEBAR --- |
| | 618 | with st.sidebar: |
| | 619 | try: st.image(LOGO_URL, width=220) |
| | 620 | except: st.write("### Gateworks Venice") |
| | 621 | |
| | 622 | status_slot = st.empty() |
| | 623 | # Simplified to just show the IP address |
| | 624 | user_id = st.context.ip_address or "127.0.0.1" |
| | 625 | |
| | 626 | if st._npu_lock.locked(): |
| | 627 | status_slot.warning(f"⚠️ BUSY: {st._active_user}") |
| | 628 | else: |
| | 629 | status_slot.success("🟢 READY") |
| | 630 | |
| | 631 | st.caption(f"User: {user_id}") |
| | 632 | |
| | 633 | stats_slot = st.empty() |
| | 634 | stats_slot.markdown(build_sidebar_html(), unsafe_allow_html=True) |
| | 635 | |
| | 636 | # --- MAIN INTERFACE --- |
| | 637 | st.title("🤖 i.MX Edge LLM") |
| | 638 | |
| | 639 | if "messages" not in st.session_state: st.session_state.messages = [] |
| | 640 | for msg in st.session_state.messages: |
| | 641 | with st.chat_message(msg["role"]): st.markdown(msg["content"]) |
| | 642 | |
| | 643 | if prompt := st.chat_input("Ask the NPU..."): |
| | 644 | st.chat_message("user").markdown(prompt) |
| | 645 | st.session_state.messages.append({"role": "user", "content": prompt}) |
| | 646 | |
| | 647 | # Console: Log the Incoming Request / Queue status |
| | 648 | ts_in = datetime.now().strftime("%H:%M:%S") |
| | 649 | print(f"[{ts_in}] QUEUED: Request from {user_id} -> '{prompt[:40]}...'") |
| | 650 | |
| | 651 | with st.chat_message("assistant"): |
| | 652 | response_placeholder = st.empty() |
| | 653 | |
| | 654 | # This lock handles the "Queued" logic—it will block here if someone else is talking |
| | 655 | with st._npu_lock: |
| | 656 | st._active_user = user_id |
| | 657 | status_slot.warning(f"⚠️ BUSY: {user_id}") |
| | 658 | |
| | 659 | ts_start = datetime.now().strftime("%H:%M:%S") |
| | 660 | print(f"[{ts_start}] PROCESSING: Active inference for {user_id}") |
| | 661 | |
| | 662 | full_response, token_count, start_time = "", 0, time.time() |
| | 663 | |
| | 664 | try: |
| | 665 | payload = {"model": MODEL_NAME, "messages": st.session_state.messages, "stream": True} |
| | 666 | r = requests.post(API_URL, json=payload, stream=True, timeout=120) |
| | 667 | |
| | 668 | for line in r.iter_lines(): |
| | 669 | if line: |
| | 670 | decoded = line.decode('utf-8').replace('data: ', '') |
| | 671 | if decoded.strip() == "[DONE]": break |
| | 672 | try: |
| | 673 | chunk = json.loads(decoded) |
| | 674 | content = chunk["choices"][0]["delta"].get("content", "") |
| | 675 | if content: |
| | 676 | full_response += content |
| | 677 | token_count += 1 |
| | 678 | response_placeholder.markdown(full_response + "▌") |
| | 679 | |
| | 680 | if token_count % 12 == 0: |
| | 681 | stats_slot.markdown(build_sidebar_html(), unsafe_allow_html=True) |
| | 682 | except: continue |
| | 683 | |
| | 684 | duration = time.time() - start_time |
| | 685 | tps = token_count / duration if duration > 0 else 0 |
| | 686 | st.session_state.last_perf = f"{token_count} tokens @ {tps:.1f} t/s" |
| | 687 | |
| | 688 | response_placeholder.markdown(full_response) |
| | 689 | st.session_state.messages.append({"role": "assistant", "content": full_response}) |
| | 690 | |
| | 691 | # Console: Log Completion |
| | 692 | ts_out = datetime.now().strftime("%H:%M:%S") |
| | 693 | print(f"[{ts_out}] COMPLETE: {user_id} | {token_count} tokens | {tps:.1f} t/s") |
| | 694 | |
| | 695 | except Exception as e: |
| | 696 | st.error(f"Error: {e}") |
| | 697 | print(f"[{datetime.now().strftime('%H:%M:%S')}] ERROR: {e}") |
| | 698 | finally: |
| | 699 | st._active_user = "None" |
| | 700 | stats_slot.markdown(build_sidebar_html(), unsafe_allow_html=True) |
| | 701 | status_slot.success("🟢 READY") |
| | 702 | st.rerun() |
| | 703 | }}} |
| | 704 | |
| | 705 | Execution: |
| | 706 | {{{#!bash |
| | 707 | $ uv venv # create virtual python env in current dir |
| | 708 | $ uv pip install streamlit requests psutil argparse # install python deps |
| | 709 | $ uv run streamlit run webchat.py --server.address 0.0.0.0 --server.port 8501 -- --user-map users.json --host 127.0.0.1 --port 8000 |
| | 710 | }}} |