Object Detector in 5 Minutes

Xnor.ai makes it easy to embed machine learning-powered computer vision into applications on any device. This tutorial shows you how to build a simple object detector in Python using an Xnor Bundle.

Downloading the SDK

The latest version of the Xnor developer SDK can be found on AI2GO. This SDK includes samples and documentation that support developing applications using Xnor Bundles.

Once you’ve downloaded the SDK, extract it to a convenient location. You can use the unzip tool on the command line for this:

unzip ~/Downloads/<downloaded SDK name>.zip
cd ~/Downloads/xnor-sdk-<hardware-target>


You can install the Python wheel for any Xnor Bundle with pip:

python3 -m pip install --user lib/<model>/xnornet*.whl

The wheel files inside every Xnor Bundle expose the XnorNet Python API. Install one and import xnornet to use an Xnor Bundle in your Python application.

Getting Started

Start the Python interactive shell:

$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

All of Xnor’s functionality is included within the xnornet module, so import that:

>>> import xnornet


The first step is to load a model:

>>> model = xnornet.Model.load_built_in()

A model is a single “brain” with certain capabilities. For example, some models are designed to do object detection for people, pets, and cars, whereas other models might be able to distinguish different types of fish.

If you downloaded a bundle containing multiple models, you can pass it an argument with the name of the model you want to load. If you only have one model in your bundle, you don’t need to pass it any parameters, and it will load the one model you have.


Now that you’ve got a model, you’re going to need some image to test it on. The SDK’s samples/test-images directory contains several sample images. For this example, we’ll use dog.jpg. First read the data into a variable:

>>> dog_jpeg = open('samples/test-images/dog.jpg', 'rb').read()

This opens the dog.jpg file in binary mode and reads all the data out of it. Now dog_jpeg contains the JPEG-encoded bytes of the image.

Next, wrap this in an input object for Xnor:

>>> input = xnornet.Input.jpeg_image(dog_jpeg)

You can also use other types of input formats; see the reference for more information.


Once you’ve got a model and an image, you can tell the model to look at the image and tell you what it sees. Here’s what you might get from the person-pet-vehicle model included with this SDK:

>>> objects = model.evaluate(input)
>>> import pprint
>>> pprint.pprint(objects)
[BoundingBox(class_label=ClassLabel(class_id=1780030805, label='pet'), rectangle=Rectangle(x=0.11145900189876556, y=0.31542733311653137, width=0.3670963644981384, height=0.684572696685791)),
 BoundingBox(class_label=ClassLabel(class_id=1373721585, label='vehicle'), rectangle=Rectangle(x=0.597267746925354, y=0.1386657953262329, width=0.3302210569381714, height=0.1692405343055725)),
 BoundingBox(class_label=ClassLabel(class_id=1373721585, label='vehicle'), rectangle=Rectangle(x=0.08285203576087952, y=0.2904859483242035, width=0.8408514261245728, height=0.4245692789554596))]

Now objects contains a list of objects that it found in the image. All results are Python objects that you can manipulate as you please. For example, to get the labels of all the objects in the image:

>>> [object.class_label.label for object in objects]
['pet', 'vehicle', 'vehicle']

What’s Next?

  • Try using a classification model, which tells you what’s in the image but not where within the image the objects are located.

  • Try some of the samples, to see how to use camera input.

  • Read the reference, to see all the possible functions you can call.

  • Go out and build something, and post it in the showcase!