Python API Reference

The XnorNet library requires Python 3.5 or later.

Model

Models represent a set of learned information that allows inferences to be drawn based on inputs. Models expect certain kinds of input (for example, some models work on images, while others only work on audio) and produce certain kinds of outputs (for example, some models can tell you where objects are in an image, while other models can tell you what is in the image but not where).

In order to use a model in your application, it must first be loaded. Most distributions of the XnorNet library come bundled with one or more models baked into the library itself. In these cases, use Model.load_built_in() to load the model. To retrieve the list of names of models that can be loaded, use Model.enumerate_built_in().

class xnornet.Model
MULTI_THREADED

A constant for use in model loading (load_built_in()). A model loaded with this as its threading_model will run evaluate() using multiple threads (if available). This allows XnorNet to leverage the hardware concurrency of the system.

SINGLE_THREADED

A constant for use in model loading (load_built_in()). A model loaded with this as its threading_model will run evaluate() using only one thread (usually the calling threading). This option is useful if the calling application is managing one or more Xnor models with its own concurrency scheme.

classmethod load_built_in([name[, threading_model]])

Loads a built-in model.

Parameters
  • name – The name of the model to load (optional). If there is only one model in your inference bundle, the name can be omitted or set to None to load the model.

  • threading_model – Must be either MULTI_THREADED or SINGLE_THREADED. In single-threaded mode, calling evaluate() will evaluate the model on a single thread (usually the calling thread) and block until evaluation is complete. In multi-threaded mode, calling evaluate() will evaluate the model on as many threads as there are available cores.

Returns

The loaded model.

Example: Loading the default model:

import xnornet
model = xnornet.Model.load_built_in()
# ... use model ...

Example: Loading a model by name in multi-threaded mode (which is the default threading mode):

import xnornet
model = xnornet.Model.load_built_in("person-classifer",
                                    xnornet.Model.MULTI_THREADED)
# ... use model ...
classmethod enumerate_built_in()

Returns a list of all the names that can be passed to Model.load_built_in().

Example: Loading all models available:

import xnornet
model_names = xnornet.Model.enumerate_built_in()
models = []
for model_name in model_names:
    models.append(xnornet.Model.load_built_in(model_name))
evaluate(input)

Evaluate a model on an input, yielding a result.

Parameters

input – The input to run through the model. (See Input for ways to construct inputs.)

Returns

Depends on type of model; see below.

The type of the return value depends on the type of the model:

  • Classification models: A list of ClassLabels are returned.

  • Object detection models: A list of BoundingBoxes are returned.

  • Segmentation models: A list of SegmentationMasks are returned.

Example: Evaluating a JPEG image with a built-in model:

import xnornet
model = xnornet.Model.load_built_in()
jpeg_data = open('./samples/test-images/dog.jpg', 'rb').read()
input = xnornet.Input.jpeg_image(jpeg_data)
result = model.evaluate(input)
# ... use result ...
name

A friendly name for the model. This is typically the same as the name of the folder that the model’s xnornet-*.whl was found in.

result_type

An EvaluationResultType value which indicates the type of result (ClassLabel, BoundingBox, SegmentationMask, etc) that will be returned by evaluate().

version

A string that can be used to distinguish different versions of the same model. This is mostly useful for debugging and reporting problems to Xnor.

class_labels

The list of class label strings that can be returned by this model (for example, as ClassLabel.label). Only applicable for models with well-defined classes.

Input

In order to pass data to a model to be evaluated, it must first be wrapped in an Input. This allows multiple types of model inputs to be supported without complicating the model evaluation interface. Currently, only image inputs are supported, but multiple image formats are accepted:

Which one you use depends on the type of data available. For example, JPEG may be most convenient for reading images from files, but if you are interfacing directly with a camera, one of the YUV formats may be more useful.

class xnornet.Input
classmethod jpeg_image(data)

Creates a new Input representing a JPEG image.

Parameters

data – Must be a bytes object containing the data of the JPEG image.

Returns

The newly-created Input.

Example: Creating an input corresponding to dog.jpg as provided in the data directory of the XnorNet download:

import xnornet
jpeg_data = open('./test-images/dog.jpg', 'rb').read()
input = xnornet.Input.jpeg_image(jpeg_data)
classmethod rgb_image(size, data)

Creates a new Input representing an image created from raw RGB data.

Parameters
  • size – Tuple of (width, height) integers corresponding to the size of the image.

  • data – Must be a bytes object containing the raw data of the image, in the format described below.

Returns

The newly-created Input.

This function might be used if you have already decompressed an image, or received the image data from a connected device that natively outputs RGB data.

See RGB Images for a detailed description of the RGB image format.

Example: While the XnorNet library does not support reading from PNG files, you can use an external Python library like Pillow to do it for you, and then use Input.rgb_image() to make it available to XnorNet:

import xnornet
import PIL.Image
pil_image = PIL.Image.open('./samples/test-images/dog.png')
xnor_input = xnornet.Input.rgb_image(pil_image.size, pil_image.tobytes())
classmethod yuv422_image(size, data)

Creates an input corresponding to an image in raw YUV422 format.

Parameters
  • size – Tuple of (width, height) integers corresponding to the size of the image. The width must be even.

  • data – Must be a bytes object containing the raw data of the image, in the format described below.

Returns

The newly-created Input.

YUV422 is often used in camera outputs. If your camera natively outputs YUV422, it can be more efficient to pass the YUV422 input directly to XnorNet than to convert it to a different format first.

See YUV422 Images for a detailed description of the YUV422 format.

classmethod yuv420p_image(size, y_plane_data, u_plane_data, v_plane_data)

Creates an input corresponding to an image in raw YUV420p (planar) format.

Parameters
  • size – Tuple of (width, height) integers corresponding to the size of the image, both of which must be even.

  • data – Must be a bytes object containing the raw data of the image, in the format described below.

  • y_plane_data – Must be a bytes object containing the raw data of the Y plane of the image, with one byte per pixel in the image.

  • u_plane_data – Must be a bytes object containing the raw data of the U plane of the image, with one byte per 2x2-block of pixels in the image.

  • v_plane_data – As for u_plane_data, but for the V plane.

Returns

The newly-created Input.

YUV420p is often used in camera outputs. If your camera natively outputs YUV420p, it can be more efficient to pass the YUV420p input directly to XnorNet than to convert it to a different format first.

See YUV420p Images for a detailed description of the YUV420p format. Note that the U and V planes must not be interleaved; if they are, that is YUV420sp (semi-planar), see below.

classmethod yuv420sp_nv12_image(size, y_plane_data, uv_plane_data)

Creates an input corresponding to an image in raw YUV420sp (semi-planar) format, with U channel first (NV12).

Parameters
  • size – Tuple of (width, height) integers corresponding to the size of the image, both of which must be even.

  • data – Must be a bytes object containing the raw data of the image, in the format described below.

  • y_plane_data – Must be a bytes object containing the raw data of the Y plane of the image, with one byte per pixel in the image.

  • uv_plane_data – Must be a bytes object containing the raw data of the UV plane of the image, with two bytes per 2x2-block of pixels in the image.

Returns

The newly-created Input.

YUV420sp is often used in camera outputs. If your camera natively outputs YUV420p, it can be more efficient to pass the YUV420sp input directly to XnorNet than to convert it to a different format first.

See YUV420sp (NV12 and NV21) Images for a detailed description of the YUV420sp formats.

classmethod yuv420sp_nv21_image(size, y_plane_data, vu_plane_data)

Creates an input corresponding to an image in raw YUV420sp (semi-planar) format, with V channel first (NV21).

Same as above but with the chroma in VU order (NV21). See YUV420sp (NV12 and NV21) Images for a detailed description of the YUV420sp formats.

Evaluation Results

Depending on what model is in use, the type of the results from Model.evaluate() will differ.

class xnornet.EvaluationResultType

An enumeration type indicating what type of results a Model will return. Obtained via Model.result_type.

CLASS_LABELS

Indicates that Model.evaluate() will return a list of ClassLabels.

BOUNDING_BOXES

Indicates that Model.evaluate() will return a list of BoundingBoxes.

SEGMENTATION_MASKS

Indicates that Model.evaluate() will return a list of SegmentationMasks.

Classification

Classification models take an image as the input and try to determine what is in it, but not where within the image it is. Results are returned as a list of guesses sorted by decreasing confidence. For example, given a picture of a Siberian husky, the model might predict with high confidence that it is a husky and with lower confidence that it could be an Alaskan malamute or other type of dog.

Example: With a classification model, you could see what is in the dog.jpg image like this:

import xnornet
model = xnornet.Model.load_built_in()
dog_jpeg = open('./samples/test-images/dog.jpg', 'rb').read()
input = xnornet.Input.jpeg_image(dog_jpeg)
result = model.evaluate(input)
print(result)

You’ll get a list of ClassLabels back. It can be processed like any other list, e.g., to print out just the label strings of the class label while ignoring the other properties, you might use:

print([element.label for element in result])

Example: Say you had a folder of images transferred from a phone that you wanted to sort them into other folders (like “cat”, “dog”, “person”) based on the main object in the image. With XnorNet, that’s easy:

import os
import xnornet

model = xnornet.Model.load_built_in()
source_directory = "from_phone"

for filename in os.listdir(source_directory):
    source_path = os.path.join(source_directory, filename)
    if not os.path.isfile(source_path):
        print("Skipping", source_path, "because it is not a file")
        continue
    _, extension = os.path.splitext(filename)
    if extension.upper() not in [".JPG", ".JPEG"]:
        print("Skipping", source_path, "because it's not a JPEG")
        continue

    jpeg_data = open(source_path, 'rb').read()
    input = xnornet.Input.jpeg_image(jpeg_data)
    results = model.evaluate(input)
    if len(results) == 0:
        print("Skipping", source_path, "because no objects were found in it")
        continue
    category = results[0].label

    print("Moving", source_path, "to the", category, "folder")
    os.makedirs(category, exist_ok=True)
    os.rename(source_path, os.path.join(category, filename))

To run this, make sure you’re running XnorNet with a classification model (it can also be used with an object-detection model, if you adjust results[0].label to results[0].class_label.label), then put all your pictures in from_phone, and run it. You’ll find all your photos conveniently sorted into other folders, like cat and dog.

class xnornet.ClassLabel(class_id, label)

Represents a single prediction about the type of object in the image.

class_id
An integer representing the class of this object. Values may be arbitrary; however, for a given

model, this will always remain consistent with the label text.

label

String representing the type of object detected in the image, for example “vehicle”.

Object Detection

Unlike classification models, object detection models can find multiple objects at once and identify their locations within the input image.

Example: Here’s how you might write some code that looks at an image and determines the size and aspect ratio of each object in the image:

import xnornet
model = xnornet.Model.load_built_in()

def describe_area(area):
    if area > 0.8 * 0.8:
        return "very big"
    elif area > 0.5 * 0.5:
        return "big"
    elif area > 0.3 * 0.3:
        return "medium-sized"
    elif area > 0.15 * 0.15:
        return "small"
    else:
        return "tiny"

def describe_aspect_ratio(aspect_ratio):
    if aspect_ratio > 2:
        return "very flat"
    elif aspect_ratio > 1.2:
        return "slightly wider than tall"
    elif aspect_ratio > 1 / 1.2:
        return "squareish"
    elif aspect_ratio > 0.5:
        return "slightly taller than wide"
    else:
        return "very tall"

# Try it yourself on other images!
dog_jpeg = open('./samples/test-images/dog.jpg', 'rb').read()
input = xnornet.Input.jpeg_image(dog_jpeg)
boxes = model.evaluate(input)

for box in boxes:
    area = box.rectangle.width * box.rectangle.height
    aspect_ratio = box.rectangle.width / box.rectangle.height
    print("This {object} that I see is {size} and {aspect_ratio}"
        .format(object=box.class_label.label,
                size=describe_area(area),
                aspect_ratio=describe_aspect_ratio(aspect_ratio)))
class xnornet.BoundingBox(class_label, rectangle)

Represents a single object detected within an image, localized to a particular area of the image.

class_label

Type of object identified within the image; see ClassLabel for detail.

rectangle

Where within the image the object was located; see Rectangle for detail.

class xnornet.Rectangle(x, y, width, height)

Rectangles identify a portion of an image with floating-point coordinates. Regardless of the size of the input image, the left side is considered to have X coordinate 0 and the right side X coordinate 1. Similarly, the top has a Y coordinate of 0 and the bottom has a Y coordinate of 1.

x

The X coordinate of the left side of the rectangle, from 0 to 1.

y

The Y coordinate of the left side of the rectangle, from 0 to 1.

width

The width of the rectangle as a proportion of the total image width.

height

The height of the rectangle as a proportion of the total image height.

Example: If an object were detected in the upper-right-hand quadrant of an image, the rectangle would be {x = 0.5, y = 0.0, width = 0.5, height = 0.5}.

Example: To convert an xnor_rectangle to pixel coordinates within the original image, you would multiply the x and width values by the image’s width in pixels, and the y and height values by the image’s height in pixels.

Segmentation

Segmentation models allow you to identify which pixels of an image represent a particular class of object. They can be thought of as an automated version of the Lasso tool in many popular image manipulation programs. The masks they create are more detailed than bounding boxes and allow you to create visualizations of objects or process objects or backgrounds independently of each other.

Example: Say you want to blur the background of an image but leave the people alone. You can use a segmentation model that identifies people:

# This example uses PIL. To install a conformant implementation:
# pip install pillow
from PIL import Image, ImageFilter
import io

import xnornet
model = xnornet.Model.load_built_in()

# Try it yourself on other images!
person_jpeg = open('~/Pictures/selfie.jpg', 'rb').read()
input = xnornet.Input.jpeg_image(person_jpeg)
masks = model.evaluate(input)

person_image = Image.open(io.BytesIO(person_jpeg))
person_blurred = person_image.filter(ImageFilter.GaussianBlur(radius=5))

# Convert the segmentation mask into an alpha mask
person_mask = masks[0]
person_mask_data = bytes([
    255 if person_mask[x, y] else 0
    for y in range(person_mask.height)
    for x in range(person_mask.width)
])
person_mask_image = Image.frombytes('L',
    (person_mask.width, person_mask.height), person_mask_data)
# Resize the mask to fit the image, since the size of the segmentation mask
# might be different (smaller or larger) than the image we passed in
person_mask_image = person_mask_image.resize(person_image.size)

# Use the alpha mask to control whether we sample from the blurred image
# (background) or the original image (person)
masked_image = Image.composite(person_image, person_blurred, person_mask_image)
masked_image.show()
class xnornet.SegmentationMask

Segmentation masks associate arbitrarily shaped regions of an image with a particular class of object. A segmentation mask is a bitmap, that is, a 2D map where each pixel is either 1 (class is present) or 0 (class is absent). To determine whether a certain pixel contains a certain object class, sample the bitmap by translating the coordinates of points in the original image to the coordinate system of the bitmap (see __getitem__()).

class_label

Type of object identified by this mask; see ClassLabel for detail.

width

The width of the mask. May be smaller or larger than the width of the image that was passed into the model (see to_bytes()).

height

The height of the mask. May be smaller or larger than the height of the image that was passed into the model (see to_bytes()).

__getitem__((x, y))

Samples the bitmap at the given point and returns a boolean indicating whether the class given by class_label is present at the given point in the input image. Internally, this method samples the data returned by to_bytes() by referencing the bit corresponding to the given x and y coordinates, using stride to determine the row and byte within the row to sample. The corresponding C version of this is described in more detail at xnor_bitmap.

Example:

# ...obtain model
masks = model.evaluate()
person_mask = masks[0]
if person_mask[164, 144]:
    print("There's a person near the middle!")

The sample point coordinates are relative to the mask, which may be a different size and shape than the original image. Here’s how you might check if a class exists at a particular coordinate in the input image:

def is_class_at_pixel(mask, image_x, image_y,
                      image_width, image_height):
    # Perform a simple nearest-neighbor resample by first converting to a
    # 0..1 coordinate space and then to the mask space
    normalized_x = image_x / image_width
    normalized_y = image_y / image_height
    mask_x = int(normalized_x * mask.mask_width)
    mask_y = int(normalized_y * mask.mask_height)
    # Now use the rescaled coordinates to sample the mask
    return mask[mask_x, mask_y]
_stride

The stride of the underlying bitmap data array. This member is only relevant if you need to sample the raw bytes directly (see to_bytes())

to_bytes()

Returns the underlying bitmap data array. See xnor_bitmap for a description of how to sample this data at particular coordinates.

Error Handling

exception xnornet.Error

Raised on any error within XnorNet.