Android API Reference

Model

Models represent a set of learned information that allows inferences to be drawn based on inputs. Models expect certain kinds of input (for example, some models work on images, while others only work on audio) and produce certain kinds of outputs (for example, some models can tell you where objects are in an image, while other models can tell you what is in the image but not where).

In order to use a model in your application, it must first be loaded. Most distributions of the XnorNet library come bundled with one or more models baked into the library itself. In these cases, use Model.loadBuiltIn to load the model. To retrieve the list of names of models that can be loaded, use Model.enumerateBuiltIn.

public final class Model implements AutoCloseable
public static Model loadBuiltIn(Context context, String modelName)

Load a built-in model by name. Please see the documentation you received with this library to determine what model names are available.

Parameters
  • context – Your Android application context to load resources from. Often this will be your Activity.

  • modelName – The name of the model to load, or null to load the default model.

Throws
  • XnorException – If an error occurs while loading the model (e.g. model name is not recognized, etc.).

Returns

The loaded model.

public static Model loadBuiltIn(Context context, String modelName, ModelLoadOptions modelLoadOptions)

As above, but with additional options; see ModelLoadOptions for details.

public static String[] enumerateBuiltIn()

Lists the names of the models built into this library.

public EvaluationResult evaluate(Input input)

Run an Input through the model, producing an EvaluationResult.

Parameters
  • input – The input to run through the model.

Throws
Returns

The predictions produced by the model.

public Info getInfo()

Retrieve optional additional information about a model. This is not needed by most applications; it should only be needed by applications that need to deal generically with multiple models, or for diagnostic purposes.

public void close()

Release the native resources associated with this model. After calling this, most methods will throw an exception.

Tip

In many cases, you can use Java’s try-with-resources construct to avoid needing to explicitly call this:

try (Model model = Model.loadBuiltIn("person-pet-vehicle")) {
    // ... use model ...
}
public final class ModelLoadOptions

When you need additional control over how a model is loaded, you can use the extended variant of Model.loadBuiltIn that takes a ModelLoadOptions. You can instantiate a ModelLoadOptions with no parameters, and then set anything you want to change from the defaults. You can then pass this options object into Model.loadBuiltIn.

public ModelLoadOptions setThreadingModel(ThreadingModel threadingModel)

Lets you adjust the threading model used by XnorNet; see below.

public enum ThreadingModel
public static final ThreadingModel MULTI_THREADED

A constant for use in model loading (Model.loadBuiltIn). A model loaded with this as its threading model will run Model.evaluate using multiple threads (if available). This allows XnorNet to leverage the hardware concurrency of the system.

public static final ThreadingModel SINGLE_THREADED

A constant for use in model loading (load_built_in). A model loaded with this as its threading model will run Model.evaluate using only one thread (usually the calling threading). This option is useful if the calling application is managing one or more Xnor models with its own concurrency scheme.

public static final class Info

Contains various metadata information about the model. None of this is needed to use the model effectively, but is provided to allow your application to adapt to different types of models, or retrieve diagnostic information.

public String getName()

Retrieve a friendly name for the model. This should usually match the name you received in the model’s documentation. This is intended primarily for diagnostic purposes; it is not guaranteed to uniquely identify the model, may change across releases, and is not guaranteed to have any particular format.

public String getVersion()

Retrieve the version of XB you are using. This is intended solely for diagnostic purposes and no attempt should be made to parse it. If you are having problems with a model, the version returned by this method can help us to track down specifically which model and version of the model is at fault.

public EvaluationResult.Type getResultType()

Determine the type of evaluation results the model will produce.

Tip

This is provided for convenience if you need to handle different kinds of models generically, but for typical applications where you know what kind of result you expect, you should not need to use this. It is better to just call the appropriate get* method of EvaluationResult without checking the type first, since we may add new types in the future that cause getType to return a different value, but retain compatibility with the old get* method. Explicitly checking the type makes your application more fragile to these types of changes in the future.

public String[] getClassLabels()

Retrieve a list of all possible classes the model can produce.

Input

In order to pass data to a model to be evaluated, it must first be wrapped in an Input. This allows multiple types of model inputs to be supported without complicating the model evaluation interface. Currently, only image inputs are supported, but multiple image formats are accepted:

Which one you use depends on the type of data available. For example, JPEG may be most convenient for reading images from files, but if you are interfacing directly with a camera, one of the YUV formats may be more useful.

public final class Input implements AutoCloseable

Input to a model for inference.

public static Input jpegImage(ByteBuffer data)

Create an Input representing an image provided in JPEG format.

Parameters
Throws
Returns

An Input representing the image.

public static Input rgbImage(int width, int height, ByteBuffer data)

Create an Input representing an image provided as RGB triples. The data is arranged line-by-line, where each line is pixel-by-pixel, and each pixel is represented by three bytes (red, green, and blue intensities).

Parameters
  • width – The width of the image.

  • height – The height of the image.

  • data – A direct (see ByteBuffer.isDirect()) ByteBuffer containing the RGB data of the image as described above.

Throws
Returns

An Input representing the image.

public static Input yuv420pImage(int width, int height, ByteBuffer y_plane_data, ByteBuffer u_plane_data, ByteBuffer v_plane_data)

Create an Input representing an image provided in YUV420P format. Each of the buffers of data are arranged line-by-line, pixel-by-pixel, except that the U and V buffers are each one-quarter of the size (1/2 width, 1/2 height), and each pixel in the U and V buffers corresponds to four pixels in the image.

Parameters
Throws
Returns

An Input representing the image.

public static Input yuv420spNv12Image(int width, int height, ByteBuffer y_plane_data, ByteBuffer uv_plane_data)

Create an Input representing an image provided in YUV420SP NV12 format. The Y plane is arranged line-by-line, pixel-by-pixel, where each pixel is one byte. The UV plane is half the width and half the height of the image, and is arranged line-by-line, pixel-by-pixel, where each pixel is two bytes (U channel then V channel), and where each pixel in the UV plane corresponds to a 2x2 block of pixels in the Y plane.

Parameters
  • width – The width of the image, which must be a multiple of 2.

  • height – The height of the image, which must be a multiple of 2.

  • y_plane_data – The Y plane of data for the image, as a direct (see ByteBuffer.isDirect()) ByteBuffer.

  • uv_plane_data – The UV plane of data for the image, as a direct (see ByteBuffer.isDirect()) ByteBuffer.

Throws
Returns

An Input representing the image.

public static Input yuv420spNv21Image(int width, int height, ByteBuffer y_plane_data, ByteBuffer vu_plane_data)

Create an Input representing an image provided in YUV420SP NV21 format. The Y plane is arranged line-by-line, pixel-by-pixel, where each pixel is one byte. The VU plane is half the width and half the height of the image, and is arranged line-by-line, pixel-by-pixel, where each pixel is two bytes (V channel then U channel), and where each pixel in the VU plane corresponds to a 2x2 block of pixels in the Y plane.

Parameters
  • width – The width of the image, which must be a multiple of 2.

  • height – The height of the image, which must be a multiple of 2.

  • y_plane_data – The Y plane of data for the image, as a direct (see ByteBuffer.isDirect()) ByteBuffer.

  • vu_plane_data – The UV plane of data for the image, as a direct (see ByteBuffer.isDirect()) ByteBuffer.

Throws
Returns

An Input representing the image.

public static Input yuv422Image(int width, int height, ByteBuffer data)

Create an Input representing an image provided in YUV422 (YUYV) format. The data is arranged line-by-line, where each line is consecutive pairs of pixels. Each pair of pixels is represented by four bytes: the Y value of the first pixel, the shared U value of both pixels, the Y value of the second pixel, and the shared V value of both pixels.

Parameters
  • width – The width of the image, which must be a multiple of 2.

  • height – The height of the image.

  • data – A direct (see ByteBuffer.isDirect()) ByteBuffer containing the YUV422 data of the image as described above.

Throws
Returns

An Input representing the image.

public void close()

Dispose of the underlying native resources of this input.

Evaluation Results

Models all uniformly produce EvaluationResults, but depending on what model is in use, the kind of data inside that object may differ:

public final class EvaluationResult implements AutoCloseable

Result of evaluating a Model on an Input.

public List<BoundingBox> getBoundingBoxes()

Returns the immutable list of bounding boxes contained by the result.

Throws
public List<ClassLabel> getClassLabels()

Returns the immutable list of class labels contained by the result.

Throws
public List<SegmentationMask> getSegmentationMasks()

Returns the immutable list of segmentation masks contained by the result.

Throws
public Type getType()

Returns the type of result contained within the instance.

Throws
public void close()

Release the native resources associated with the evaluation result.

public enum Type

Type of result generated by evaluating a Model on an Input.

public static final EvaluationResult.Type BOUNDING_BOXES

Result contains BoundingBoxes.

public static final EvaluationResult.Type CLASS_LABELS

Result contains ClassLabels.

public static final EvaluationResult.Type SEGMENTATION_MASKS

Result contains SegmentationMasks.

public static final EvaluationResult.Type UNKNOWN

Unknown type of result. This should not occur in practice; if new result types are added, new Type enumerants will also be added.

Classification

Classification models take an image as the input and try to determine what is in it, but not where within the image it is. Results are returned as a list of guesses sorted by decreasing confidence. For example, given a picture of a Siberian husky, the model might predict with high confidence that it is a husky and with lower confidence that it could be an Alaskan malamute or other type of dog.

public final class ClassLabel

A single detected class within an image, not localized to a particular area. You can get these from EvaluationResult.getClassLabels().

public ClassLabel(int classId, String label)

Create a new ClassLabel with the given parameters.

public final int getClassId()

Returns the class ID of the class label. The class ID represents the label of the class in a more machine-comprehensible form than getLabel() returns.

public final String getLabel()

Returns the human-interpretable label of the detected class.

Object Detection

Unlike classification models, object detection models can find multiple objects at once and identify their locations within the input image.

public final class BoundingBox

A bounding box is an object identified within an image, localized to a specific part of the image. You can get these from EvaluationResult.getBoundingBoxes().

public BoundingBox(ClassLabel classLabel, Rectangle rectangle)

Create a new BoundingBox with the given parameters.

public final ClassLabel getClassLabel()

Returns information identifying what kind of object was detected.

public final Rectangle getRectangle()

Returns information localizing the detected object to a specific region of the image. The coordinates on both the X and Y axes range from 0 (left or top) to 1 (right or bottom) of the input image, not pixel values.

public final class Rectangle

A rectangle within the input image. Coordinates are floating point and range from 0 (left or top edge) to 1 (right or bottom edge).

public Rectangle(float x, float y, float width, float height)

Create a rectangle with the provided dimensions.

public final float getHeight()

Returns the height of the rectangle, from 0 to 1.

public final float getWidth()

Returns the width of the rectangle, from 0 to 1.

public final float getX()

Returns the X coordinate of the rectangle, from 0 to 1.

public final float getY()

Returns the Y coordinate of the rectangle, from 0 to 1.

Segmentation

Segmentation models allow you to identify which pixels of an image represent a particular class of object. They can be thought of as an automated version of the Lasso tool in many popular image manipulation programs. The masks they create are more detailed than bounding boxes and allow you to create visualizations of objects or process objects or backgrounds independently of each other.

public final class SegmentationMask

A mask representing whether a particular pixel is of the particular class. You can get these from EvaluationResult.getSegmentationMasks().

public SegmentationMask(ClassLabel classLabel, int maskWidth, int maskHeight, int maskStride, byte[] maskData)

Create a new SegmentationMask with the given parameters.

public final ClassLabel getClassLabel()

Returns information identifying what kind of object was detected.

public final int getMaskHeight()

Returns the height of the mask, in pixels.

public final byte getMaskValue(int x, int y)

Returns the value of the mask at the pixel with coordinates (x, y). The pixel value is either 0 or 1.

public final int getMaskWidth()

Returns the width of the mask, in pixels.

public final byte[] getRawMaskData()

Returns the raw bitmap data. Returns a byte array whose length is the mask stride times the mask height. Within each byte, moving from the least significant bit to the most significant bit corresponds to advancing by column through the image.

public final int getRawMaskDataStride()

Returns the stride of the mask, in bytes. May not be a multiple of the row alignment, in which case the high-order bits of the last byte will be “padded” with 0 and additional padding bytes filled with 0 may be present.

Error Handling

public final class XnorException extends RuntimeException

Thrown for errors occurring within the XNOR library.