C API Reference

Making inferences is the core operation of a machine learning model. An inference requires a model and an input, and produces a result which your application can query for details.

A model will typically be loaded only once, when the part of the application that requires inference capabilities is initializating. This model can then be reused to run inferences on multiple inputs while that part of the application is running. When that part of the application has finished, the model should be freed to release its run-time resources.

See Object Detector in 10 Minutes for a brief introduction to the XnorNet API.

Model

Models represent a set of learned information that allows inferences to be drawn based on inputs. Models expect certain kinds of input (for example, some models work on images, while others only work on audio) and produce certain kinds of outputs (for example, some models can tell you where objects are in an image, while other models can tell you what is in the image but not where).

In order to use a model in an application, it must first be loaded. Most distributions of the XnorNet library come bundled with one or more models baked into the library itself. In these cases, use xnor_model_load_built_in() to load the model. To retrieve the list of names of models that can be loaded, use xnor_model_enumerate_built_in().

xnor_model

An opaque type representing a model. You should always refer to this with a pointer (xnor_model*), since the full definition of the type is unavailable.

xnor_error* xnor_model_load_built_in(const char* model_name, const xnor_model_load_options* load_options, xnor_model** result)

Loads a built-in model of the given name. See xnor_model_load_options for more information about the options available when loading a model.

Parameters
  • model_name – The name of the model to load, or NULL to load the default when there is only one model in the bundle.

  • load_options – A pointer to an xnor_model_load_options to use when loading this model.

  • result – Pointer to the xnor_model* to place the result in.

Returns

NULL on success; a pointer to xnor_error on error. (See Error Handling.)

Example: Loading the default model with default load options:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
xnor_model* model;
xnor_error* error = xnor_model_load_built_in(NULL, NULL, &model);
if (error != NULL) {
    fprintf(stderr, "Failed to load model: %s\n",
            xnor_error_get_description(error));
    xnor_error_free(error);
    exit(1);
}
/* ... use model ... */
/* Done with model, free it: */
xnor_model_free(model);
int32_t xnor_model_enumerate_built_in(const char** model_names_out, int32_t model_names_out_size)

Lists the models available in the bundle and copies the list of names of built-in models into a buffer. Each of the names resulting from this operation will be a valid argument to xnor_model_load_built_in(). If you already know the name of the model you want to load, you do not need to use this function. Instead, pass the name directly to xnor_model_load_built_in().

Parameters
  • model_names_out – Pointer to array to copy model names into. Can be NULL if and only if model_names_out_size is zero.

  • model_names_out_size – Maximum number of elements of model_names_out to fill.

Returns

Total number of model names available, even if this exceeds model_names_out_size.

The strings placed into the array are statically allocated and must not be freed.

Example: If you only care about the first 5 models in the bundle, this is all you need:

1
2
3
4
5
6
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
const char* model_names[5];
int32_t num_model_names = xnor_model_enumerate_built_in(model_names, 5);
for (int32_t i = 0; i < MIN(num_model_names, 5); ++i) {
    printf("Model name: %s\n", model_names[i]);
}

Example: To handle an arbitrary number of model names, one possible strategy is to first query for the total number of model names, and then allocate an array of that size:

1
2
3
4
int32_t num_model_names = xnor_model_enumerate_built_in(NULL, 0);
const char** model_names = calloc(num_model_names, sizeof(const char*));
if (model_names == NULL) { abort(); }
xnor_model_enumerate_built_in(model_names, num_model_names);
xnor_error* xnor_model_get_info(xnor_model* model, xnor_model_info* info)

Returns information about a loaded model, including its evaluation result type and set of class names. To use this function, the caller must:

  1. Allocate the xnor_model_info struct (for example by declaring a stack variable),

  2. Set the xnor_model_info.xnor_model_info_size field to the size of the xnor_model_info struct,

  3. Pass a pointer to their struct into this function.

Parameters
Returns

NULL on success; a pointer to an xnor_error on error. (See Error Handling.)

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
xnor_model* model;
xnor_error* error = xnor_model_load_built_in(NULL, NULL, &model);
/* ... check error ... */
xnor_model_info info;
info.xnor_model_info_size = sizeof(info);
xnor_error* error = xnor_model_get_info(model, &info);
if (error != NULL) {
    fprintf(stderr, "Couldn't get model info: %s\n",
            xnor_error_get_description(error));
    xnor_error_free(error);
} else {
    printf("Xnor model %s version %s\n", info.name, info.version);
}
xnor_error* xnor_model_evaluate(xnor_model* model, const xnor_input* input, void* reserved, xnor_evaluation_result** result)

Evaluate a model on an input, yielding a result.

Parameters
  • model – The model to use for evaluation, from xnor_model_load_built_in().

  • input – The input to run through the model. (See Input for ways to construct inputs.)

  • reserved – Reserved; must be NULL.

  • result – Pointer to the xnor_evaluation_result* to place the result in.

Returns

NULL on success; a pointer to xnor_error on error. (See Error Handling.)

Example: Evaluating a JPEG image with a built-in model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
xnor_model* model;
/* panic_on_error is defined in the examples/c directory */
panic_on_error(xnor_model_load_built_in(NULL, NULL, &model));
/* ... */
xnor_input* input;
/* assuming jpeg_data and jpeg_data_length defined earlier */
panic_on_error(xnor_input_create_jpeg_image(jpeg_data, jpeg_data_length, &input));
xnor_evaluation_result* result;
panic_on_error(xnor_model_evaluate(model, input, NULL, &result));
xnor_input_free(input);
/* ... use result (see Evaluation Results section) ... */
xnor_evaluation_result_free(result);
/* ... */
/* Done with model, free it: */
xnor_model_free(model);
void xnor_model_free(xnor_model* model)

Release all the resources of the given model. After calling this function, model may no longer be used.

Parameters
  • model – The model to free. No-op if NULL.

Example:

1
2
3
4
/* ... create model ... */
/* ... use model ... */
/* Done with model, free it: */
xnor_model_free(model);

Model Information

Information about models can be obtained by using the xnor_model_get_info() API. The following type is the interface for this API. It can be used to retrieve more information about a loaded model without having to evaluate it.

xnor_model_info

A collection of information about a xnor_model. Callers should allocate these and fill out the xnor_model_info.xnor_model_info_size field before passing a pointer to the resulting struct to xnor_model_get_info(). The data pointed to by this struct has the same lifetime as the xnor_model, and should not be referenced after xnor_model_free() is called on the same model.

Example:

1
2
3
4
xnor_model_info info;
info.xnor_model_info_size = sizeof(info);
xnor_error* error = xnor_model_get_info(model, &info);
printf("Xnor model %s version %s\n", info.name, info.version);
size_t xnor_model_info_size

The size of the xnor_model_info struct. This should always be set to sizeof(xnor_model_info). (This is used to aid backwards compatibility between API versions).

const char* name

A friendly name for the model. This is typically the same as the name of the folder that the model’s libxnornet.so was found in.

xnor_evaluation_result_type result_type

The evaluation result type returned when calling xnor_model_evaluate() with this model.

const char* version

A string that can be used to distinguish different versions of the same model. This is mostly useful for debugging and reporting problems to Xnor.

int num_class_labels

The number of class labels in class_labels

const char* const* class_labels

The set of class label strings that can be returned by this model (for example, as xnor_class_label.label). Only applicable for models with well-defined classes.

Model Load Options

Model evaluation behavior can be changed by passing a pointer to an xnor_model_load_options structure when loading a model. This type represents a set of configuration options that tell XnorNet how to load and evaluate an xnor_model at run-time.

xnor_model_load_options

An opaque type representing a set of model configuration options. You should always refer to this with a pointer (xnor_model_load_options*), since the full definition of the type is unavailable.

void xnor_model_load_options_create(void)

Create an instance of xnor_model_load_options.

Returns

a pointer to the created instance. The returned object must be freed using xnor_model_load_options_free() when it is no longer needed.

Example: Using the default load options to load a built-in model (this is equivalent to passing in NULL for the load options):

1
2
3
4
5
6
7
xnor_model* model;
xnor_model_load_options* options = xnor_model_load_options_create();
xnor_error* error = xnor_model_load_built_in(NULL, options, &model);
/* ... check error ...*/
/* ... Done with load options, free it: ... */
xnor_model_load_options_free(options);
/* ... use model ... */
void xnor_model_load_options_set_threading_model(xnor_model_load_options* options, xnor_threading_model threading_model)

Sets the threading model to single or multi-threaded. In single-threaded mode, calling xnor_model_evaluate() will evaluate the model on a single thread (usually the calling thread) and block until evaluation is complete. In multi-threaded mode, calling xnor_model_evaluate() will evaluate the model on multiple threads to take advantage of hardware concurrency.

Parameters

Example: Setting the evaluation mode in single-threaded when loading a built-in model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
xnor_model* model;
xnor_model_load_options* options = xnor_model_load_options_create();
xnor_error* error = xnor_model_load_options_set_threading_model(
    kXnorThreadingModelSingleThreaded);
if (error != NULL) {
    fprintf(stderr, "Failed to set model to single-threaded mode: %s\n",
            xnor_error_get_description(error));
    xnor_error_free(error);
    exit(1);
}
error = xnor_model_load_built_in(NULL, options, &model);
/* ... check error ...*/
/* ... Done with load options, free it: ... */
xnor_model_load_options_free(options);
/* ... use model ... */
void xnor_model_load_options_free(xnor_model_load_options* options)

Release the model load options. After calling this function, options may no longer be used. This is safe to do after options has been used to load a model.

Parameters
  • model – The set of model load options to free. No-op if NULL.

Example:

1
2
3
4
/* ... create model load options ... */
/* ... call xnor_model_load_built_in with load options ... */
/* Done with model load options, free it: */
xnor_model_load_options_free(options);
xnor_threading_model

Specifies which threading model to use when evaluating Xnor models:

kXnorThreadingModelMultiThreaded

Evaluate models using multiple threads (if available). This will typically allow XnorNet to leverage the hardware parallelism of the system by creating one thread per logical core.

kXnorThreadingModelSingleThreaded

Evaluate models using only one thread, usually the calling thread. This option is useful if the calling application is managing one or more Xnor models with its own concurrency scheme.

Input

In order to pass data to a model to be evaluated, it first must be wrapped in an xnor_input structure. This allows multiple types of model inputs to be supported without complicating the model evaluation interface. Currently, only image inputs are supported, but multiple formats of images are accepted:

Which one you use depends on the type of data available. For example, JPEG may be most convenient for reading images from files, but if you are interfacing directly with a camera, one of the YUV formats may be more useful.

xnor_input

An opaque type representing an input to a model. You should always refer to this with a pointer (xnor_input*), since the full definition of the type is unavailable.

xnor_error* xnor_input_create_jpeg_image(const uint8_t* data, int32_t data_length, xnor_input** result)

Creates a new xnor_input representing a JPEG image.

Parameters
  • data – Pointer to the first byte of the data of the JPEG image. This pointer must remain valid until xnor_input_free() is called (no copy of the data is made).

  • data_length – Number of bytes of data of the JPEG image.

  • result – Pointer to the xnor_input* to place the result into.

Returns

NULL on success; a pointer to xnor_error on error. (See Error Handling.)

Example: Assuming you have already read a JPEG image from a file into the variables jpeg_data and jpeg_data_length (for example, using the read_file_contents() function from examples/c), you can create an input corresponding to the image:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
xnor_input* input;
xnor_error* error = xnor_input_create_jpeg_image(jpeg_data, jpeg_data_length);
if (error != NULL) {
    fprintf(stderr, "Failed to create input: %s\n", xnor_error_get_description(error));
    xnor_error_free(error);
    exit(1);
}
/* ... call xnor_model_evaluate ... */
/* Done with input, free it: */
xnor_input_free(input);
xnor_error* xnor_input_create_rgb_image(int32_t width, int32_t height, const uint8_t* data, xnor_input** result)

Creates an input corresponding to an image specified with raw RGB data.

Parameters
  • width – Width of the image.

  • height – Height of the image.

  • data – Pointer to the first byte of the image data. This pointer must remain valid until xnor_input_free() is called (no copy of the data is made).

  • result – Pointer to the xnor_input* to place the result into.

Returns

NULL on success; a pointer to xnor_error on error. (See Error Handling.)

This function might be used if you have already decompressed an image, or received the image data from a connected device that natively outputs RGB data.

See RGB Images for a detailed description of the RGB image format.

xnor_error* xnor_input_create_yuv422_image(int32_t width, int32_t height, const uint8_t* data, xnor_input** result)

Creates an input corresponding to an image in raw YUV422 format.

Parameters
  • width – Width of the image. Must be even.

  • height – Height of the image.

  • data – Pointer to the first byte of the image data. This pointer must remain valid until xnor_input_free() is called (no copy of the data is made).

  • result – Pointer to the xnor_input* to place the result into.

Returns

NULL on success; a pointer to xnor_error on error. (See Error Handling.)

YUV422 is often used in camera outputs. If your camera natively outputs YUV422, it can be more efficient to pass the YUV422 input directly to XnorNet rather than converting it to a different format first.

See YUV422 Images for a detailed description of the YUV422 format.

xnor_error* xnor_input_create_yuv420p_image(int32_t width, int32_t height, const uint8_t* y_plane_data, const uint8_t* u_plane_data, const uint8_t* v_plane_data, xnor_input** result)

Creates an input corresponding to an image in raw YUV420p (planar) format.

Parameters
  • width – Width of the image. Must be even.

  • height – Height of the image. Must be even.

  • y_plane_data – Pointer to the first byte of the Y plane of the image. This pointer must remain valid until xnor_input_free() is called (no copy of the data is made).

  • u_plane_data – Pointer to the first byte of the U plane of the image. This pointer must remain valid until xnor_input_free() is called (no copy of the data is made).

  • v_plane_data – As for u_plane_data, but for the V plane.

  • result – Pointer to the xnor_input* to place the result into.

Returns

NULL on success; a pointer to xnor_error on error. (See Error Handling.)

YUV420p is often used in camera outputs. If your camera natively outputs YUV420p, it can be more efficient to pass the YUV420p input directly to XnorNet rather than converting it to a different format first.

See YUV420p Images for a detailed description of the YUV420p format. Note that the U and V planes must not be interleaved; if they are, that is YUV420sp (semi-planar), see below.

xnor_error* xnor_input_create_yuv420sp_nv12_image(int32_t width, int32_t height, const uint8_t* y_plane_data, const uint8_t* uv_plane_data, xnor_input** result)

Creates an input corresponding to an image in raw YUV420sp (semi-planar) format, with U channel first (NV12).

Parameters
  • width – Width of the image. Must be even.

  • height – Height of the image. Must be even.

  • y_plane_data – Pointer to the first byte of the Y plane of the image. This pointer must remain valid until xnor_input_free() is called (no copy of the data is made).

  • uv_plane_data – Pointer to the first byte of the UV plane of the image. This pointer must remain valid until xnor_input_free() is called (no copy of the data is made).

  • result – Pointer to the xnor_input* to place the result into.

Returns

NULL on success; a pointer to xnor_error on error. (See Error Handling.)

YUV420sp is often used in camera outputs. If your camera natively outputs YUV420sp, it can be more efficient to pass the YUV420sp input directly to XnorNet rather than converting it to a different format first.

See YUV420sp (NV12 and NV21) Images for a detailed description of the YUV420sp formats.

xnor_error* xnor_input_create_yuv420sp_nv21_image(int32_t width, int32_t height, const uint8_t* y_plane_data, const uint8_t* vu_plane_data, xnor_input** result)

Same as above but with the chroma in VU order (NV21). See YUV420sp (NV12 and NV21) Images for a detailed description of the YUV420sp formats.

void xnor_input_free(xnor_input* input)

Free the input structure. Call this when you are done using an input. This does not free the underlying data (e.g., the JPEG image data passed to xnor_input_create_jpeg_image()); that remains your responsibility.

Parameters
  • input – The input to free. No-op if NULL.

Evaluation Results

xnor_evaluation_result

An opaque type representing the output of a model, after evaluating it on an input. You should always refer to this with a pointer (xnor_evaluation_result*), and never try to inspect the contents of the pointer directly.

void xnor_evaluation_result_free(xnor_evaluation_result* result)

Releases the resources associated with the evaluation result. After calling this, you cannot use the evaluation result any further. All pointers within data extracted from the evaluation result (for example, xnor_class_label.label) will also be released and can no longer be used.

Parameters
  • result – The evaluation to free. No-op if NULL.

xnor_evaluation_result_type xnor_evaluation_result_get_type(xnor_evaluation_result* result)

Determines the type of results included within an evaluation result. For a given model, the result type is fixed, so you only need to use this function if you do not know ahead of time what type of output the model will produce.

Parameters
  • result – The evaluation result whose type is being checked.

Returns

The type of the given evaluation result.

xnor_evaluation_result_type

Describes the type of data contained within an evaluation result. All values not listed below should be treated as if they were kXnorEvaluationResultTypeUnknown, in case new types of evaluation results are added later.

kXnorEvaluationResultTypeClassLabels

The evaluation result contains class labels (the type of object within the image was identified, but not localized to a particular area of the image). xnor_evaluation_result_get_class_labels() can be used to retrieve the class label data.

kXnorEvaluationResultTypeBoundingBoxes

The evaluation result contains bounding boxes (specific identified objects, localized to certain regions of the image). xnor_evaluation_result_get_bounding_boxes() can be used to retrieve the bounding box data.

kXnorEvaluationResultTypeUnknown

The evaluation result does not have a known type.

Classification

Classification models take an image as the input and try to determine what is in it, but not where within the image it is. Results are returned as a list of guesses sorted by decreasing confidence. For example, given a picture of a Siberian husky, the model might predict with high confidence that it is a husky and with lower confidence that it could be an Alaskan malamute or other type of dog.

int32_t xnor_evaluation_result_get_class_labels(xnor_evaluation_result* result, xnor_class_label* out, int32_t out_size)

Copy (at most) the first out_size class labels from the evaluation result into out, returning the total number of results available.

Parameters
  • result – The evaluation result to extract the class labels from.

  • out – Array of xnor_class_labels to copy into. May be NULL if and only if out_size is zero.

  • out_size – Maximum number of class labels to copy into out.

Returns

Total number of class labels available in this result, regardless of out_size. In case of error (for example, the evaluation result does not contain class labels), returns -1.

Example: If you only care about the most likely class label, you can create a single variable instead of an array:

1
2
3
4
5
6
7
8
9
xnor_evaluation_result* result;
panic_on_error(xnor_model_evaluate(model, input, NULL, &result));
xnor_class_label class_label;
if (xnor_evaluation_result_get_class_labels(result, &label, 1) == -1) {
    fprintf(stderr, "This is not a classification model, cannot return classes\n");
    exit(1);
}
printf("You are most likely looking at a %s\n", class_label.label);
xnor_evaluation_result_free(result);

Example: If you want to retrieve all the possible classes the image could be, you can dynamically allocate an array large enough to hold all of them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
xnor_evaluation_result* result;
panic_on_error(xnor_model_evaluate(model, input, NULL, &result));
int32_t num_class_labels = xnor_evaluation_result_get_class_labels(result, NULL, 0);
if (num_class_labels == -1) {
    fprintf(stderr, "This is not a classification model, cannot return classes\n");
    exit(1);
}
xnor_class_label* class_labels = calloc(num_class_labels, sizeof(xnor_class_label));
if (class_labels == NULL) {
    fprintf(stderr, "Failed to allocate memory\n");
    exit(1);
}
xnor_evaluation_result_get_class_labels(result, class_labels, num_class_labels);
for (int32_t i = 0; i < num_class_labels; ++i) {
  printf("class %d: %s\n", class_labels[i].class_id,
         class_labels[i].label);
}
free(class_labels);
xnor_evaluation_result_free(result);
xnor_class_label

Represents a single prediction about the type of object in the image.

int32_t class_id

An integer representing the class of this object. Values may be arbitrary; however, for a given model, this will always remain consistent with the label text.

const char* label

Null-terminated string representing the type of object detected in the image. This string becomes invalid after calling xnor_evaluation_result_free() on the evaluation result it was obtained from.

uint32_t reserved

This member is reserved for future use.

Object Detection

Unlike classification models, object detection models can find multiple objects at once and identify their locations within the input image.

int32_t xnor_evaluation_result_get_bounding_boxes(xnor_evaluation_result* result, xnor_bounding_box* out, int32_t out_size)

Copy (at most) the first out_size bounding boxes from the evaluation result into out, returning the total number of results available.

Parameters
  • result – The evaluation result to extract the bounding boxes from.

  • out – Array of xnor_bounding_boxes to copy into. May be NULL if and only if out_size is zero.

  • out_size – Maximum number of bounding boxes to copy into out.

Returns

Total number of bounding boxes available in this result, regardless of out_size. In case of error (for example, the evaluation result does not contain bounding boxes), returns -1.

Example: Say you only want to handle up to 10 objects detected within an image and want to know where they are. Then you can use an object detection model like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/* ... assuming you've already run xnor_model_evaluate to get an xnor_evaluation_result ... */
#define MAX_BOXES 10
xnor_bounding_box boxes[MAX_BOXES];
int32_t num_boxes = xnor_evaluation_result_get_bounding_boxes(result, boxes, MAX_BOXES);
if (num_boxes == -1) {
    fprintf(stderr, "Not an object detection model, can't get boxes\n");
    exit(1);
}
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
for (int32_t i = 0; i < MIN(num_boxes, MAX_BOXES); ++i) {
    /* Example of how you might process box information: */

    float area = boxes[i].rectangle.width * boxes[i].rectangle.height;
    const char* size_description;
    if (area > 0.8 * 0.8) { size_description = "very big"; }
    else if (area > 0.5 * 0.5) { size_description = "big"; }
    else if (area > 0.3 * 0.3) { size_description = "medium-sized"; }
    else if (area > 0.15 * 0.15) { size_description = "small"; }
    else { size_description = "tiny"; }

    float aspect_ratio = boxes[i].rectangle.width / boxes[i].rectangle.height;
    const char* ratio_description;
    if (aspect_ratio > 2) { aspect_ratio = "very flat"; }
    else if (aspect_ratio > 1.2) { aspect_ratio = "slightly wider than tall"; }
    else if (aspect_ratio > 1 / 1.2) { aspect_ratio = "squareish"; }
    else if (aspect_ratio > 0.5) { aspect_ratio = "slightly taller than wide"; }
    else { aspect_ratio = "very tall"; }

    printf("I see a %s, %s, %s\n", size_description, ratio_description,
           boxes[i].class_label.label);
}
/* remember to free xnor_evaluation_result_free when you're done */

Example: It’s also possible to get all the boxes with no predefined limit if you want, by dynamically allocating an array of the right size:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
/* ... evaluate model to obtain result ... */
int32_t num_boxes = xnor_evaluation_result_get_bounding_boxes(result, NULL, 0);
if (num_boxes == -1) {
    fprintf(stderr, "Not an object detection model, can't get boxes\n");
    exit(1);
}
xnor_bounding_box* boxes = calloc(num_boxes, sizeof(xnor_bounding_box));
if (boxes == NULL) {
    fprintf(stderr, "Failed to allocate memory\n");
    exit(1);
}
xnor_evaluation_result_get_bounding_boxes(result, boxes, num_boxes);
/* ... use boxes ... */
free(boxes);
xnor_evaluation_result_free(result);
xnor_bounding_box

Represents a single object detected within an image, localized to a particular area of the image.

xnor_class_label class_label

Type of object identified within the image; see xnor_class_label for detail.

xnor_rectangle rectangle

Where within the image the object was located; see xnor_rectangle for detail.

xnor_rectangle

Rectangles identify a portion of the image with floating-point coordinates. Regardless of the size of the input image, the left side is considered to have X coordinate 0 and the right side X coordinate 1. Similarly, the top has a Y coordinate of 0 and the bottom has a Y coordinate of 1.

float x

The X coordinate of the left side of the rectangle, from 0 to 1.

float y

The Y coordinate of the top side of the rectangle, from 0 to 1.

float width

The width of the rectangle as a proportion of the total image width.

float height

The height of the rectangle as a proportion of the total image height.

Example: If an object were detected in the upper-right-hand quadrant of an image, the rectangle would be {x = 0.5, y = 0.0, width = 0.5, height = 0.5}.

Example: To convert an xnor_rectangle to pixel coordinates within the original image, multiply the x and width values by the image’s width in pixels, and the y and height values by the image’s height in pixels.

Segmentation

Segmentation models allow you to identify which pixels of an image represent a particular class of object. They can be thought of as an automated version of the Lasso tool in many popular image manipulation programs. The masks they create are more detailed than bounding boxes and allow you to create visualizations of objects or process objects or backgrounds independently of each other.

int32_t xnor_evaluation_result_get_segmentation_masks(xnor_evaluation_result* result, xnor_segmentation_mask* out, int32_t out_size)

Copy (at most) the first out_size segmentation masks from the evaluation result into out, returning the total number of results available.

Parameters
  • result – The evaluation result to extract the segmentation masks from.

  • out – Array of xnor_segmentation_masks to copy into. May be NULL if and only if out_size is zero.

  • out_size – Maximum number of segmentation masks to copy into out.

Returns

Total number of segmentation masks available in this result, regardless of out_size. In case of error (for example, the evaluation result does not contain segmentation masks), returns -1.

Example: Say you want to draw an overlay that colors people in bright green but leaves the background as it is. You can use a segmentation model that identifies people:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/* ... evaluate model to obtain result ... */
xnor_segmentation_mask person_mask;
int32_t num_masks =
    xnor_evaluation_result_get_segmentation_masks(result, &person_mask, 1);
if (num_masks == -1) {
    fprintf(stderr, "Not a segmentation model, can't get masks\n");
    exit(1);
}
#define COLOR(r, g, b, a) ((r) << 24 | (g) << 16 | (b) << 8 | a)
uint32_t* overlay =
    calloc(person_mask.bitmap.width * person_mask.bitmap.height,
           sizeof(uint32_t));
if (overlay == NULL) {
    fprintf(stderr, "Failed to allocate memory\n");
    exit(1);
}
for (int32_t y = 0; y < person_mask.bitmap.height; ++y) {
    for (int32_t x = 0; x < person_mask.bitmap.width; ++x) {
        /* Calculate the index to look up the coordinate (x, y) in the
         * mask array (and our overlay) */
        int32_t byte_index = y * person_mask.bitmap.stride + x / 8;
        int32_t bit_index = x % 8;
        if (person_mask.bitmap.data[byte_index] >> bit_index & 1) {
            /* This pixel is a person!
             * Blit a fully green pixel to the overlay */
            overlay[index] = COLOR(0, 255, 0, 255);
        } else {
            /* Nope, not a person!
             * Write a transparent pixel to the overlay */
            overlay[index] = COLOR(0, 0, 0, 0);
        }
    }
}
/* ... Alpha-blend the overlay onto the image ...
 * Remember to free xnor_evaluation_result_free when you're done */

Example: It’s also possible to get all the masks with no pre-defined limit if you want, by dynamically allocating an array of the right size:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
/* ... evaluate model to obtain result ... */
int32_t num_masks =
    xnor_evaluation_result_get_segmentation_masks(result, NULL, 0);
if (num_masks == -1) {
    fprintf(stderr, "Not a segmentation model, can't get masks\n");
    exit(1);
}
xnor_segmentation_mask* masks =
    calloc(num_masks, sizeof(xnor_segmentation_mask));
if (masks == NULL) {
    fprintf(stderr, "Failed to allocate memory\n");
    exit(1);
}
xnor_evaluation_result_get_segmentation_masks(result, masks, num_masks);
/* ... use masks ... */
free(masks);
xnor_evaluation_result_free(result);
xnor_segmentation_mask

Segmentation masks associate arbitrarily shaped regions of an image with a particular class of object.

xnor_class_label class_label

Type of object identified by this mask; see xnor_class_label for detail.

xnor_bitmap bitmap

The map which identifies regions that match the given class. A bit in this bitmap is set if and only if the area of the image corresponding to the (x, y) coordinate of the bit is of the class given by class_label. That is, if the point corresponding to (x, y) in the image is this class, the bit at that point in the mask will be 1. The bitmap’s dimensions may be smaller or larger and in a different aspect ratio than the input image, so the bitmap must be sampled by translating between the two coordinate systems.

xnor_bitmap

A bitmap is a 2D array of 1-bit values. For efficiency, the bits are packed into bytes.

int32_t width

The width of the populated data, given in bits. The width may not be a multiple of the row alignment (unspecified, but at least 1 byte), in which case the high-order bits of the last byte will be “padded” with 0. Additional padding bytes filled with 0 may be present.

int32_t height

Number of rows in the bitmap. The top row comes first.

int32_t stride

The offset, in bytes, from one row of the mask to the next.

const uint8_t* data

A buffer of size stride * height. Within each byte, moving from the least significant bit to the most significant bit corresponds to advancing by column through the image. That is, the top left of the bitmap is data[0] & 1, the next pixel to the right of that is data[0] & (1 << 1), the next data[0] & (1 << 2), etc. After 8 columns, the next column is data[1] & 1, and so on.

This dimensions of this bitmap may be different than those of the original image, as though it were resized. You will have to sample the bitmap by translating the coordinates of points in the original image to the coordinate system of the bitmap.

Example: Here’s a way to determine whether a class exists at a given point in the input image’s coordinate space:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
bool is_class_at_pixel(xnor_segmentation_mask mask,
                       int32_t image_x, int32_t image_y,
                       int32_t image_width, int32_t image_height) {
    // Perform a simple nearest-neighbor resample by first converting to a
    // 0..1 coordinate space and then to the mask space
    float normalized_x = ((float)image_x) / image_width;
    float normalized_y = ((float)image_y) / image_height;
    int32_t mask_x = (int)(normalized_x * mask.bitmap.width);
    int32_t mask_y = (int)(normalized_y * mask.bitmap.height);
    // Now use the rescaled coordinates to sample the mask
    int32_t mask_byte_x = mask_x / 8;
    int32_t mask_bit_x = mask_x % 8;
    uint8_t byte =
        mask.bitmap.data[mask_y * mask.bitmap.stride + mask_byte_x];
    return (byte >> mask_bit_x) & 1;
}

Error Handling

Most functions in the XnorNet C API have a return type of xnor_error*. In case of success, NULL will be returned. If an error occurs, a non-NULL xnor_error* will be returned that describes the error.

In a few other cases, notably xnor_model_enumerate_built_in(), xnor_evaluation_result_get_class_labels(), xnor_evaluation_result_get_bounding_boxes(), and xnor_evaluation_result_get_segmentation_masks(), these functions fail by returning -1. There is no way to retrieve error detail when these functions fail.

xnor_error

Opaque representation of an error that has occurred. You should always refer to this with a pointer (xnor_error*), and never try to inspect the contents of the pointer directly.

const char* xnor_error_get_description(xnor_error* error)

Get a string description of the error that occurred. The string remains valid until xnor_error_free() is called.

void xnor_error_free(xnor_error* error)

Release the resources associated with the error. After freeing the error, neither it nor the return value of xnor_error_get_description() may be used.

Example: For simple command-line applications where exiting with an error message is an appropriate way of handling errors, a function can be created that detects errors, prints error descriptions, and then exits:

1
2
3
4
5
6
7
8
static void panic_on_error(xnor_error* error) {
    if (error != NULL) {
        fprintf(stderr, "Error from XnorNet library: %s\n",
                xnor_error_get_description(error));
        xnor_error_free(error);
        exit(1);
    }
}

This can then be used when calling XnorNet functions:

1
2
3
xnor_model* model;
/* Now no "if" check is required */
panic_on_error(xnor_model_load_built_in(NULL, NULL, &model));

Applications that handle errors in other ways can adapt this code to better fit their existing error handling.