Swift API Reference

Model

Models represent a set of learned information that allows inferences to be drawn based on inputs. Models expect certain kinds of input (for example, some models work on images, while others only work on audio) and produce certain kinds of outputs (for example, some models can tell you where objects are in an image, while other models can tell you what is in the image but not where).

In order to use a model in your application, it must first be loaded with Model.init. To retrieve the list of names of models that can be loaded, use Model.enumerateBuiltIn().

class Model
func init(builtIn: String?, threadingModel: ThreadingModel? = nil)throws

Loads a built-in model. name (if present) allows selecting between multiple models present in the bundle; nil selects the default. threadingModel allows adjusting how the model uses threads; see Advanced Loading Options below for details.

Note

AI2GO currently only supports bundles with one model, so name can safely always be passed as nil. When multi-model bundles are supported, name will allow distinguishing between the bundled models.

class func enumerateBuiltIn() → [String]

Returns a list of all the names that can be passed to init. For single-model bundles, this array will only contain one item.

var name : String

A friendly name for the model. This is typically the same as the name of the folder that the model’s XnorNet.framework was found in.

var resultType : EvaluationResult

Indicates the type of result (ClassLabels, BoundingBoxes, SegmentationMasks, etc.) that will be returned by evaluate.

version

A string that can be used to distinguish different versions of the same model. This is mostly useful for debugging and reporting problems to Xnor.

class_labels

The list of class label strings that can be returned by this model (for example, as ClassLabel.label). Only applicable for models with well-defined classes.

func evaluate(input: Input)throws → EvaluationResult

Evaluate a model on an input, yielding a result. The type of the return value depends on the type of the model:

  • Classification models: A ClassLabels object is returned.

  • Object detection models: A BoundingBoxes object is returned.

  • Segmentation models: A SegmentationMasks object is returned.

Advanced Loading Options

Most usages of XnorNet should be fine with the defaults, but some applications that run their own thread pools or have other specialized requirements may want to control how XnorNet uses threads. To do so, pass a value of ThreadingModel to Model.init(builtIn:threadingModel:).

enum ThreadingModel
singleThreaded

The model should be run on only one thread if possible.

multiThreaded

The model should utilize as many threads as are necessary to maximize performance.

Input

In order to pass data to a model to be evaluated, it must first be wrapped in an Input. This allows multiple types of model inputs to be supported without complicating the model evaluation interface. Currently, only image inputs are supported, but multiple image formats are accepted. Which one you use depends on the type of data available. For example, JPEG may be most convenient for reading images from files, but if you have already-decoded data, RGB may be more useful.

class Input
func init(fromJpegImage: Data)throws

Creates a new Input representing a JPEG image.

func init(fromRgbImage: Data, width: Int, height: Int)throws

Creates a new Input representing an image created from raw RGB data.

Evaluation Results

Depending on what model is in use, the type of the results from Model.evaluate will differ.

  • Classification models produce ClassLabels encapsulated in a ClassLabels object

  • Object detection models produce BoundingBoxes encapsulated in a BoundingBoxes object

  • Segmentation models produce SegmentationMasks encapsulated in a SegmentationMasks object

Each of the latter “encapsulating” objects conform to the EvaluationResult protocol:

protocol EvaluationResult

This protocol has no methods, but all evaluation result wrapper types conform to this interface. You can extract individual types by using a Swift switch:

// ...
let results = try model.evaluate(input: input)
switch results {
  case let boundingBoxes as BoundingBoxes:
    print("Detected \(boundingBoxes.value.count) boxes")
  case let classLabels as ClassLabels:
    print("I see a \(classLabels.value[0].label)")
  default:
    // ...
}

Classification

Classification models take an image as the input and try to determine what is in it, but not where within the image it is. Results are returned as a list of guesses sorted by decreasing confidence. For example, given a picture of a Siberian husky, the model might predict with high confidence that it is a husky and with lower confidence that it could be an Alaskan malamute or other type of dog.

class ClassLabels : EvaluationResult

This exists solely to wrap an array of ClassLabels into a class to be part of the EvaluationResult protocol.

var value : [ClassLabel]

Array of possible predictions of classes for the image, ordered descending by confidence.

class ClassLabel

Represents a single prediction about the type of object in the image.

var label : String

String representing the type of object detected in the image, for example “vehicle”.

var classId : Int

An integer representing the class of this object. Values are arbitrary, but will remain consistent with the label text within a single model.

Object Detection

Unlike classification models, object detection models can find multiple objects at once and identify their locations within the input image.

class BoundingBoxes

This exists solely to wrap an array of BoundingBoxes into a class to be part of the EvaluationResult protocol.

var value : [BoundingBox]

Array of boxes detected in the image.

class BoundingBox

Represents a single object detected within an image, localized to a particular area of the image.

var classLabel : ClassLabel

Type of object identified within the image; see ClassLabel for detail.

var rectangle : Rectangle

Where within the image the object was located; see Rectangle for detail.

class Rectangle

Rectangles identify a portion of an image with floating-point coordinates. Regardless of the size of the input image, the left side is considered to have X coordinate 0 and the right side X coordinate 1. Similarly, the top has a Y coordinate of 0 and the bottom has a Y coordinate of 1.

var x : Float

The X coordinate of the left side of the rectangle, from 0 to 1.

var y : Float

The Y coordinate of the left side of the rectangle, from 0 to 1.

var width : Float

The width of the rectangle as a proportion of the total image width.

var height : Float

The height of the rectangle as a proportion of the total image height.

Segmentation

Segmentation models allow you to identify which pixels of an image represent a particular class of object. They can be thought of as an automated version of the Lasso tool in many popular image manipulation programs. The masks they create are more detailed than bounding boxes and allow you to create visualizations of objects or process objects or backgrounds independently of each other.

class SegmentationMasks

This exists solely to wrap an array of SegmentationMasks into a class to be part of the EvaluationResult protocol.

var value : [SegmentationMask]

Array of segmentation masks, one for each possible class the model can detect.

class SegmentationMask

Segmentation masks associate arbitrarily shaped regions of an image with a particular class of object. A segmentation mask is a bitmap, that is, a 2D map where each pixel is either 1 (class is present) or 0 (class is absent). To determine whether a certain pixel contains a certain object class, sample the bitmap by translating the coordinates of points in the original image to the coordinate system of the bitmap.

var classLabel : ClassLabel

Type of object identified by this mask; see ClassLabel for detail.

var width : Int

The width of the mask. Masks may be smaller or larger than the width of the image that was passed into the model; if so, you will need to scale any queries into the coordinate system of the mask before using subscript.

var height : Int

The height of the mask. Masks may be smaller or larger than the height of the image that was passed into the model; if so, you will need to scale any queries into the coordinate system of the mask before using subscript.

func subscript(x: Int, y: Int) → Bool

Samples the bitmap at the given point and returns a boolean indicating whether the class given by class_label is present at the given point in the input image.

Error Handling

enum Error

Thrown on any error within XnorNet.

error(message: String)

Generic error from the inference engine, either due to an internal problem or usage error. message describes the error.

wrongSizeDataBuffer

Input was given conflicting information with the stated size of the image vs. the number of bytes of data actually received.

unknownEvaluationResultType

The inference engine returned a type of evaluation result that is not known to the Swift bindings. This should not occur, but is included for completeness.