Object Detector in 10 Minutes

Xnor.ai makes it easy to embed machine learning-powered computer vision into applications on any device. This tutorial shows how to build a simple object detector in C using an Xnor Bundle.

Downloading the SDK

The latest version of the Xnor developer SDK can be found on AI2GO. This SDK includes samples and documentation that support developing applications using Xnor Bundles.

Once you’ve downloaded the SDK, extract it to a convenient location. You can use the unzip tool on the command line for this:

unzip ~/Downloads/<downloaded SDK name>.zip
cd xnor-sdk-<hardware-target>

Using the SDK

The unzipped SDK (and any Xnor Bundle from AI2GO) contains these files, among others:

  • include/xnornet.h: This is the public header for XnorNet. Include this in your code to use the XnorNet library.

  • lib/<model>/libxnornet.so: This is an Xnor Bundle compiled for use as a C library. It’s a standard shared object that exports the XnorNet C API. Link against it to use an Xnor Bundle in your application.

The code that accompanies this tutorial can be found in samples/c/object_detector.c.

Models

The first step is to load a model. A model is a single “brain” with specific capabilities. For example, some models are designed to do object detection for people, pets, and cars, whereas other models might be able to distinguish different types of fish from each other.

Models are loaded using the xnor_model_load_built_in() function:

xnor_model* model;
panic_on_error(xnor_model_load_built_in(NULL, NULL, &model));

panic_on_error is defined in the example code, and will exit the application with an error message in the unlikely case that the XnorNet library fails to load the model. You can keep this behavior, or instead handle errors using existing mechanisms in your application.

Inputs

Now that you’ve got a model, you’re going to need an image to test it on.

The SDK’s data directory contains several sample images. For this example, we’ll use dog.jpg. First read the data into memory:

const char* filename = "./samples/test-images/dog.jpg";  /* or elsewhere if desired */
uint8_t* dog_jpeg;
size_t dog_jpeg_length;
read_entire_file(filename, &dog_jpeg, &dog_jpeg_length);

read_entire_file is a function defined in the example code that opens a file and reads all its contents into memory. Now dog_jpeg points to all the JPEG-encoded bytes of the image, and dog_jpeg_length contains the number of bytes of the JPEG.

Next, wrap this data in an input object for Xnor:

xnor_input* input;
panic_on_error(xnor_input_create_jpeg_image(dog_jpeg, dog_jpeg_length, &input));

Note that xnor_input_create_jpeg_image() holds a reference to the data you give it, and does not make a copy; it is your responsibility to make sure the image data remains valid until after the xnor_input is freed (see Cleaning up below).

You can also use other types of input formats. For example, you may have a camera that outputs a raw array of RGB pixels. There are functions to create xnor_input objects for these other formats as well. See the reference for more information.

Evaluating

Once you’ve got a model and an image, you can tell the model to look at the image and tell you what it sees.

xnor_evaluation_result* result;
panic_on_error(xnor_model_evaluate(model, input, NULL, &result));

Now all that’s left to do is extract the desired data:

#define MAX_BOXES 10
xnor_bounding_box boxes[MAX_BOXES];
int num_boxes = xnor_evaluation_result_get_bounding_boxes(result, boxes, MAX_BOXES);
if (num_boxes > MAX_BOXES) {
    /* if there are more than MAX_BOXES boxes,
       xnor_evaluation_result_get_bounding_boxes will still return the
       total number of boxes, so we clamp it down to our maximum */
    num_boxes = MAX_BOXES;
}
if (num_boxes < 0) {
    /* An error occurred! Maybe this wasn't an object detection model? */
    fputs("Error: Not an object detection model\n", stderr);
    return EXIT_FAILURE;
}
for (int i = 0; i < num_boxes; ++i) {
    printf("I see a %s\n", boxes[i].class_label.label);
}

Now boxes contains a list of objects that the model found in the image. All results are standard C structures that you can manipulate as you please. (Note that the label string will be freed and become invalid after you free the evaluation results, so make a copy if you plan to use it for longer than that.)

Cleaning up

Cleaning up objects that are no longer in use will prevent memory leaks that can slow down applications. While this isn’t a serious problem in a tutorial application that only processes a single image, it will begin to cause problems as you scale up.:

xnor_input_free(input);
free(dog_jpeg);  /* Note: dog_jpeg must be freed AFTER input is freed! */
xnor_evaluation_result_free(result);
xnor_model_free(model);

When freeing XnorNet objects, always use the provided xnor_*_free function rather than using free() directly. (dog_jpeg was application-allocated, not an XnorNet object, so the standard free() is used.)

Compiling and Running

The provided Makefile handles compiling and linking the sample application for you. All you need to do is run make from samples/c:

$ cd samples/c
$ make

Once compilation is complete, run the executable:

$ build/object_detector
I see a pet
I see a vehicle

For those curious what the Makefile is doing, here’s what the compiler needs to do to build against an Xnor model:

  • Locate the xnornet.h header file: Pass -Iinclude to GCC or Clang, or the equivalent in a different compiler.

  • Locate the libxnornet.so file: Pass -Llib to GCC or Clang, or the equivalent in a different compiler.

  • Link against libxnornet.so: Pass -lxnornet to GCC or Clang, or the equivalent in a different compiler.

  • Make sure your operating system knows where to find the libxnornet.so file. This can be done a few different ways:

    • Pass -Wl,-rpath,\$ORIGIN (commas, backslash, and dollar sign included) to GCC or Clang. This adds a directive to the executable that allows it to look in its own directory for shared objects (such as libxnornet.so) instead of being limited to system library directories.

    • Set LD_LIBRARY_PATH to the location of libxnornet.so when you run your application

    • Install libxnornet.so into your system library directory (for example, /usr/lib/x86_64-linux-gnu).

The Makefile constructs something like the following gcc command to produce the final executable:

$ gcc -o object_detector -I../../include -L../../lib -lxnornet -Wl,-rpath,\$ORIGIN/../lib object_detector.c

What’s Next?

  • Try using a classification model, which tells you what’s in the image but not where in the image the objects are located.

  • Try some of the samples, to see how to use camera input.

  • Read the reference, to see all the possible functions you can call.

  • Go out and build something, and post it in the showcase!