Object Detector in 10 Minutes

Xnor.ai makes it easy to embed machine-learning-powered computer vision into applications on any device. This tutorial shows how to build a simple object detector into your Android application using an Xnor Bundle.

Downloading the SDK

The latest version of the Xnor developer SDK can be found on AI2GO. This SDK includes samples and documentation that support developing applications using Xnor Bundles.

Once you’ve downloaded the SDK, extract it to a convenient location. You can use the unzip tool on the command line for this:

unzip ~/Downloads/<downloaded SDK name>.zip
cd xnor-sdk-<hardware-target>

Project Setup

Before we go any further, you’ll need to have an app to develop on. You can use an existing app if you’d like, or you can create a new one. To create a new one, start up Android Studio and select Start a new Android Studio project; click through the prompts, switching the language to Java if Kotlin is selected. (We are compatible with both Java and Kotlin, but this tutorial uses Java.)

All the functionality of your Xnor Bundle is contained in one file, xnornet.aar. You can find it in the unzipped SDK (and any Xnor Bundle from AI2GO) under lib/<model>/xnornet.aar. For this tutorial, use the “person pet vehicle detector” model. Once you’ve found it, you’ll need to add it to your project. In Android Studio, you can:

  1. In the menu bar, select File ‣ New ‣ New Module….

  2. Pick Import .JAR/.AAR Package from the list and click Next.

  3. Click the ellipsis button next to the File name label and select xnornet.aar.

  4. Click Finish.

Then you’ll need to tell Android Studio that you want the application to depend on it:

  • In the project navigator on the left, find app, then right click it and choose Open Module Settings.

  • Go to the Dependencies section (button on the left).

  • Select the app module, then click the Plus icon at the top and choose Module Dependency.

  • Check the box next to xnornet and click OK.

To reduce some boilerplate, we’ll also add a dependency on Apache Commons IO. This isn’t needed for Xnor Bundles in general; it just cuts down on some boring code for this tutorial. To add it:

  • In the same dependencies section from above, click the Plus icon again but choose Library Dependency.

  • Type commons-io:commons-io, press enter, and select version 2.6; then click OK.

  • Press OK to dismiss the module settings dialog.

Your project is all set up, and you should be ready to start writing code. For simplicity, we’ll assume you’re going to write this in the overridden onCreate of MainActivity that Android Studio generates by default, but you can put it elsewhere if you know your way around Android.

Models

The first step is to load a model. A model is a single “brain” with specific capabilities. For example, some models are designed to do object detection for people, pets, and cars, whereas other models might be able to distinguish different types of fish from each other.

Models are loaded using Model.loadBuiltIn:

try (Model model = Model.loadBuiltIn(this, null)) {
    // ...
}

We’re taking two parameters here: The first one needs to point at an Android Context; here we’re just passing this since we’re running inside of an Activity, which counts as a Context. For the second parameter, we’re passing null; if your Xnor Bundle has multiple models in it, you can choose which one to load here, but we’ll just pass null to pick the default model.

We’re using a try-with-resources statement here to make sure that the model is properly disposed of when you’re finished with it. You don’t have to use a try-with-resources block, but if you don’t use one, make sure you call Model.close on the model when you’re done with it to avoid memory leaks. The try-with-resources just makes sure this happens for you with a minimum of effort.

Inputs

Now that you’ve got a model, you’re going to need an image to test it on.

The SDK’s data directory contains several sample images. For this example, we’ll use dog.jpg. To make it accessible to your Android application, you’ll first need to add it as a resource. In Android Studio, you’d do that by:

  1. From the menu bar, picking File ‣ New ‣ Folder ‣ Assets Folder, then click Finish. A new assets directory should appear in the project navigator on the left.

  2. Open the Android Studio project’s folder in your system’s file manager, then copy dog.jpg into the app/src/main/assets folder.

  3. Check that Android Studio now shows dog.jpg under assets.

Next, you’ll load that dog.jpg into memory. You can do that with Android’s AssetManager:

byte[] dogJpeg;
try {
    try (InputStream inputStream = getAssets().open("dog.jpg")) {
        dogJpeg = IOUtils.toByteArray(inputStream);
    }
} catch (IOException ioe) {
    throw new RuntimeException("Could not read dog JPEG", ioe);
}

We currently require all data to be in ByteBuffers, so put it in one:

ByteBuffer dogJpegBuffer = ByteBuffer.allocateDirect(dogJpeg.length);
dogJpegBuffer.put(dogJpeg);

Then you can create an input from it:

try (Input input = Input.jpegImage(dogJpegBuffer)) {
    // ...
}

The Xnor library can take inputs in any number of forms, including:

  • JPEG images

  • Raw RGB data

  • Raw YUV data in various formats

For this example we are passing a raw JPEG for simplicity, but see the reference for other possibilities.

Evaluating

Once you’ve got a model and an image, you can tell the model to look at the image and tell you what it sees:

try (EvaluationResult result = model.evaluate(input)) {
    // ...
}

At this point, you have an EvaluationResult object, but you’ll still need to get data out of it. The model we’re using, a person-pet-vehicle detector, always produces bounding boxes, so we’ll want to use EvaluationResult.getBoundingBoxes():

List<BoundingBox> boundingBoxes = result.getBoundingBoxes();

Other types of models produce different kinds of results; for example, classification models produce class labels. See the reference for more information.

In any case, now that you’ve got the list of bounding boxes, it’s plain old data from here. You can print it, display it, etc., however you want. For simplicity we’ll just print to the Android log:

final String LOG_TAG = "DetectionResults";
Log.d(LOG_TAG, String.format("Found %d bounding boxes", boundingBoxes.size()));
for (BoundingBox box : boundingBoxes) {
    Log.d(LOG_TAG, String.format(
            "Found %s near x=%f, y=%f",
            box.getClassLabel().getLabel(),
            box.getRectangle().getX(),
            box.getRectangle().getY()));
}

Try it Out

Rev it up and and try it out by clicking the Play button in the toolbar of Android Studio. You’ll be prompted to select a device; you can create a virtual device if you don’t have a real Android development device handy. The build may take a few moments, and then the emulator if you’re using one might take a little while to start up, but before long, your application should pop up. While your application is still running, pop back into Android Studio and find the Logcat tab near the status bar. Open it, then search for DetectionResults in the search box. You should be greeted by:

D/DetectionResults: Found 3 bounding boxes
D/DetectionResults: Found pet near x=0.111459, y=0.315427
D/DetectionResults: Found vehicle near x=0.597268, y=0.138666
D/DetectionResults: Found vehicle near x=0.082852, y=0.290486

What’s Next?

  • Try using a classification model, which tells you what’s in the image but not where in the image the objects are located.

  • Try some of the samples, to see how to use camera input.

  • Read the reference, to see all the possible functions you can call.

  • Go out and build something, and post it in the showcase!