VOOZH about

URL: https://thenewstack.io/tutorial-accelerate-ai-at-edge-with-onnx-runtime-and-intel-neural-compute-stick-2/

⇱ Tutorial: Accelerate AI at Edge with ONNX Runtime and Intel Neural Compute Stick 2 - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2020-07-31 10:00:23
Tutorial: Accelerate AI at Edge with ONNX Runtime and Intel Neural Compute Stick 2
feature,tutorial,
Edge Computing / Software Development

Tutorial: Accelerate AI at Edge with ONNX Runtime and Intel Neural Compute Stick 2

In this tutorial, we will walk you through the steps of accelerating an ONNX model on an edge device powered by Intel Movidius Neural Compute Stick (NCS) 2 and Intel’s Distribution of OpenVINO Toolkit.
Jul 31st, 2020 10:00am by Janakiram MSV
👁 Featued image for: Tutorial: Accelerate AI at Edge with ONNX Runtime and Intel Neural Compute Stick 2
This post is the fifth and the last in a series of introductory tutorials on the Open Neural Network Exchange (ONNX), an initiative from AWS, Microsoft, and Facebook to define a standard for interoperability across machine learning platforms. See: Part 1, Part 2, Part 3, and Part 4.

In the previous parts of this series, we have explored the concept of ONNX model format and runtime. In the last and final tutorial, I will walk you through the steps of accelerating an ONNX model on an edge device powered by Intel Movidius Neural Compute Stick (NCS) 2 and Intel’s Distribution of OpenVINO Toolkit. We will run the Tiny YOLO2 model first on the desktop based on CPU and then on an edge device with almost no change to the code.

Quick Recap — ONNX Runtime

Apart from bringing interoperability across deep learning frameworks, ONNX promises optimized execution of neural network graph depending on the availability of hardware. The ONNX Runtime abstracts various hardware architectures such as AMD64 CPU, ARM64 CPU, GPU, FPGA, and VPU.

For example, the same ONNX model can deliver better inference performance when it is run against a GPU backend without any optimization done to the model. This is possible due to the plugin model of ONNX that supports multiple execution providers.

👁 Image

A hint provided to ONNX Runtime just before creating the inference session translates to a considerable performance boost.

The below code snippet is an example of such an optimization hint for the ONNX Runtime to utilize an Intel Integrated Graphics backend.

import onnxruntime as rt
rt.capi._pybind_state.set_openvino_device("GPU_FP32")
sess = rt.InferenceSession('TinyYOLO.onnx')

When the same model is used in a smart camera powered by an Intel NCS device, the backend can be changed to target the MYRIAD Vision Processing Unit (VPU).

rt.capi._pybind_state.set_openvino_device("MYRIAD_FP16")

In the below sections, we will build a simple object detection system based on the popular Tiny YOLO v2 model. We will first run this on a PC to execute the model against a CPU backend before moving it to the edge device with a VPU.

Prerequisites

To finish this tutorial, you need the following:

Setting up the Environment

Start by creating a Python virtual environment for the project.

python -m venv demoenv
source demoenv/bin/activate

Create a requirements.txt file with the required Python modules.

onnxruntime
opencv-python

Since we are going to detect up to 20 objects, create a file called labels.txt with the below labels:

aeroplane,bicycle,bird,boat,bottle,bus,car,cat,chair,cow,diningtable,dog,horse,motorbike,person,pottedplant,sheep,sofa,train,tvmonitor

Finally, download the Tiny YOLO v2 model from the ONNX Model Zoo.

Object Detection with Tiny YOLO V2 on Desktop

We are now ready to code the inference program based on Tiny YOLO v2 and ONNX Runtime. Create a file, infer.py with the below code:

import cv2
import numpy as np
import onnxruntime as rt

def preprocess(msg):
	inp = np.array(msg).reshape((len(msg),1))
	frame = cv2.imdecode(inp.astype(np.uint8), 1)
	frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	frame = np.array(frame).astype(np.float32)
	frame = cv2.resize(frame, (416, 416))
	frame = frame.transpose(2, 0, 1)
	frame = np.reshape(frame, (1, 3, 416, 416))
	return frame

def infer(frame, sess, conf_threshold):
	input_name = sess.get_inputs()[0].name
	output={}

	def softmax(x):
		return np.exp(x) / np.sum(np.exp(x), axis=0)

	def sigmoid(x):
		return 1/(1+np.exp(-x))

	pred = sess.run(None, {input_name: frame})
	pred = np.array(pred[0][0])

	labels_file = open("labels.txt")
	labels = labels_file.read().split(",")

	tiny_yolo_cell_width = 13
	tiny_yolo_cell_height = 13
	num_boxes = 5
	tiny_yolo_classes = 20


	for bx in range (0, tiny_yolo_cell_width):
		for by in range (0, tiny_yolo_cell_height):
			for bound in range (0, num_boxes):
				channel = bound*25
				tx = pred[channel][by][bx]
				ty = pred[channel+1][by][bx]
				tw = pred[channel+2][by][bx]
				th = pred[channel+3][by][bx]
				tc = pred[channel+4][by][bx]

				confidence = sigmoid(tc)
				class_out = pred[channel+5:channel+5+tiny_yolo_classes][bx][by]
				class_out = softmax(np.array(class_out))
				class_detected = np.argmax(class_out)
				display_confidence = class_out[class_detected]*confidence
			 if display_confidence > conf_threshold:
				 output['object']=labels[class_detected]
				 output['confidence']=display_confidence
	return output

def main():
	cam=0
	conf_threshold=0.10
	sess = rt.InferenceSession('TinyYOLO.onnx')
	while (True):
		cv2.waitKey(5)
		cap = cv2.VideoCapture(cam)
		ret, frame = cap.read()
		#cv2.imshow('frame',frame)
		ret, enc = cv2.imencode('.jpg', frame)
		enc = enc.flatten()
		fr=preprocess(enc.tolist())
		p=infer(fr,sess,conf_threshold)
		print(p)

if __name__ == "__main__":
 main()

If you are familiar with OpenCV and basic Convolutional Neural Networks (CNN), the code is self-explanatory.

It does three things:

    1. Grabs the frame from the webcam
    2. Converts and preprocesses the frame as expected by the model
    3. Finally, it performs inference on the frame to detect objects that match the confidence level and pairs it with one of the labels from the CSV file

If you have multiple cameras attached to the machine, don’t forget to update the index appropriately by changing the value of cam variable.

Executing the code shows the objects it found along with the confidence score. Adjust the confidence threshold based on your requirement.

{'object': 'diningtable', 'confidence': 0.1934369369567218}
{'object': 'diningtable', 'confidence': 0.12359955877868607}
{'object': 'diningtable', 'confidence': 0.11795787527541246}
{'object': 'chair', 'confidence': 0.13212954996625334}
{'object': 'diningtable', 'confidence': 0.1899228051957825}
{'object': 'chair', 'confidence': 0.1374235041020961}
{'object': 'chair', 'confidence': 0.1632368686534813}

This scenario represents ONNX Runtime performing inference against a CPU backend. In the next step, we will port this code to run on an edge device powered by Intel NCS 2.

Object Detection with Tiny YOLO V2 at the Edge

Assuming you have an Ubuntu 18.04 machine connected to an Intel NCS 2 device running the latest version of Intel OpenVINO Toolkit, you are ready to execute the code at the edge. Otherwise, follow the steps to configure Intel NCS 2 and OpenVINO Toolkit as per the documentation.

If you have an Up Squared AI Vision X Kit, you can use it for this tutorial.

Even if you don’t install the entire OpenVINO Toolkit, ensure you install the Myriad rules drivers for NCS on the host machine according to the reference.

Microsoft has provided Docker images and Dockerfile for mainstream environments. Let’s start by downloading the container image for OpenVINO Toolkit with Myriad.

docker pull mcr.microsoft.com/azureml/onnxruntime:latest-openvino-myriad

Create a directory, tinyyolo, on the Ubuntu machine and copy the files from your PC. Your directory should contain the below files:

infer.py

requirements.txt

labels.txt

TinyYOLO.onnx

Before we execute the code, let’s add a line that tells ONNX Runtime about the presence of the Intel NCS device.

Open infer.py and add the below line just before creating the inference session variable.

rt.capi._pybind_state.set_openvino_device("MYRIAD_FP16")

We are set to run the inference code within the Docker container based on the Myriad device.

Let’s launch the Docker container by mapping the /dev directory and mounting the tinyyolo directory. We also need to add the --privileged and --network host flags to provide appropriate permissions to access the camera and the NCS USB device.

While in the tinyyolo directory, execute the below command:

docker run \
--privileged \
-v /dev:/dev \
-v $PWD:/tinyyolo \
--network host \
-it --rm mcr.microsoft.com/azureml/onnxruntime:latest-openvino-myriad /bin/bash

After getting into the shell, let’s move into the directory and install the prerequisites.

cd /tinyyolo
pip install -r requirements.txt

Execute the code to see the inference output in the terminal.

python infer.py

It may take a few minutes for the graph to get loaded and warmed up. You should now see the objects detected by the camera in the terminal.

This scenario can be easily extended to publish the inference output to an MQTT channel configured locally or in the cloud. Refer to my previous AIoT tutorial and a video demo of this use case.

Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.

Feature Image by Robert Balog from Pixabay,

At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: feedback@thenewstack.io.

TRENDING STORIES
Janakiram MSV (Jani) is a practicing architect, research analyst, and advisor to Silicon Valley startups. He focuses on the convergence of modern infrastructure powered by cloud-native technology and machine intelligence driven by generative AI. Before becoming an entrepreneur, he spent...
Read more from Janakiram MSV
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Docker, Unit.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.