Object Detector in 5 Minutes

Xnor.ai makes it easy to embed machine learning-powered computer vision into applications on any device. This tutorial shows you how to build a simple object detector in Swift using an Xnor Bundle.

Downloading the SDK

The Swift SDK requires XCode 10.2 and Swift 5, which are both available for free.

The latest version of the Xnor developer SDK can be found on AI2GO. This SDK includes samples and documentation that support developing applications using Xnor Bundles.

Using the SDK

Let’s do a whirlwind tour of how to use Xnor, ending up with an application that finds all objects in an image and prints them out.

You’ll need to start by creating a project in Xcode. Pick File ‣ New ‣ Project…, then after clicking macOS, choose Command-Line Tool and click Next. Pick a name for your application, set the language to Swift, and click Next. Choose where to save your project and click Create.

This is a brand new Swift project. Now bring in Xnor: Select File ‣ Add Files to “(project name)”… from the menu and locate lib/person-pet-vehicle-detector/XnorNet.framework and choose that. Make sure Copy items if needed is checked, as well as the checkbox in Add to targets, and click Add.

Time to write some code. Crack open main.swift and stick in an import for XnorNet, right under the import Foundation line:

import XnorNet

Here’s where the real fun starts.


The first step is to load a model. A model is a single “brain” with specific capabilities. For example, some models are designed to do object detection for people, pets, and cars, whereas other models might be able to distinguish different types of fish from each other.

Here’s how you load a model:

let model = try Model(builtIn: nil)

This is telling Xnor to load the default built-in model. If you’ve got a bundle with multiple models in it, you can specify a name here, but most bundles have only one model in them and so the default is fine.


Now that you’ve got a model, you’re going to need an image to test it on.

The SDK’s data directory contains several sample images. For this example, we’ll use dog.jpg. First read the data into memory:

let dogJPEG = try Data(contentsOf: URL(fileURLWithPath: "/Users/pat/Downloads/xnor-sdk-macos/samples/test-images/dog.jpg"))

(Make sure to replace this path with the actual full path of dog.jpg on your computer.)

Then, wrap it into an Input:

let input = try Input(jpegImage: dogJPEG)

We’ve used a JPEG file for convenience, but you can also create an Input in other ways; for example, with raw RGB data. Check out the reference for all the details.


At this point you’re ready to run an inference. It goes like this:

let result = try model.evaluate(input: input)

So what’s this result? It’s an instance of EvaluationResult, and you need to cast it down to the data type you actually expect. Since we’re using the person-pet-vehicle model, which is a detector that produces bounding boxes, we need to cast it down into a class BoundingBoxes (and then extract value out of it, to get the actual array of class BoundingBoxes). Then we can just loop over the boxes and print them out:

for box in (result as! BoundingBoxes).value {
    print("Detected a \(box.classLabel.label) at x=\(box.rectangle.x), y=\(box.rectangle.y)!")

There you have it, an object detector in under 10 lines of Swift.

What’s Next?