Input types

Overview

The XnorNet API accepts several different input formats. Picking the right format may make your application faster by reducing unnecessary conversions. This page describes the input formats in detail; for information about using the relevant API functions, see:

How To Read This Page

Each input format is described at a high level, with an example of where an input in that format could be obtained. A diagram of the Memory Layout of that format is then given, using the following conventions:

  • Pixel: one final composited picture element as it would appear on a display.

  • Pixel block: a grouping of multiple pixels.

  • Address: An offset into a region of data (or, equivalently, an index into an array of bytes).

  • W: width of the image, in pixels.

  • H: height of the image, in pixels.

JPEG Images

XnorNet API functions:

This input method allows passing the contents of most typical JPEG files directly to XnorNet. If application inputs are already stored as JPEG files, the application will not have to decode the images.

Example: An image file downloaded from the Internet.

Memory Layout: See the JPEG standard for a description of the JPEG/JFIF data format.

RGB Images

XnorNet API functions:

Since raw RGB image data is a very common interchange format, this is effectively the “lowest common denominator” input format. If an image format isn’t supported by any other API function, the image data should be converted to RGB and passed in to this function for maximum efficiency.

Example: A decoded image provided by a file handling library (e.g. libpng).

Memory Layout: The red, green, and blue channels are interleaved in R, G, B order (1 byte per channel) as memory addresses increase.

Pixel 0

Pixel 1

Channel

R

G

B

R

G

B

Address

0

1

2

3

4

5

Pixel 0 is the top left of the image, and pixels appear in order first from left to right, then top to bottom as memory addresses increase.

Pixel (x,y)

(0,0)

(1,0)

(2,0)

(0,1)

(1,1)

(2,1)

Address

0

3

6

3W

3W+3

3W+6

YUV422 Images

XnorNet API functions:

YUV is a family of image formats that represent color as a luminance (Y) and two chroma (U, V) components. YUV is generally more compressible than RGB and is often the native output format of image sensing hardware.

YUV422 is an interleaved (channels between each other) variant of the YUV format that has half as many bytes of chroma (U and V channel) data as luminance (Y channel) data.

Example: An image returned by a mobile phone camera module.

Memory Layout: The Y, U, and V channels are interleaved in YUYV order. That is, each block of 4 bytes represents 2 pixels of the image, where the U and V chroma channel values are shared between the two pixels. As such, image width must be a multiple of 2.

Pixel 0/1

Pixel 2/3

Channel

Y

U

Y

V

Y

U

Y

V

Address

0

1

2

3

4

5

6

7

Pixel 0 is the top left of the image, and pixels appear in order first from left to right, then top to bottom as memory addresses increase. There must be an even number of pixels per row, but there may be an odd number of rows.

YUV420p Images

XnorNet API functions:

YUV is a family of image formats that represent color as a luminance (Y) and two chroma (U, V) components. YUV is generally more compressible than RGB and is often the native output format of image sensing hardware.

YUV420p is a planar (channels separated out) variant of the YUV format that halves the resolution of the chroma (U and V) channels. Note that this means each byte of chroma data applies to a 2x2 block of pixels in the image. As such, both the image width and height must be a multiple of 2. Unlike YUV422, the Y, U, and V channel data are separate. The Y, U, and V regions may be concatenated one after the other, or may be in different buffers.

Example: An image returned by a camera peripheral attached to an embedded device.

Memory Layout: The image is broken into three regions, or “planes”: the Y plane, the U plane, and the V plane. There are \(W \times H\) bytes of Y channel data, but only \(W/2 \times H/2 = (W \times H)/4\) bytes each of U and V channel data. Within a channel, pixel values are tightly packed, 1 per byte.

Luminance (Y channel) data

Pixel 0

Pixel 1

Pixel 2

Pixel 3

Channel

Y

Y

Y

Y

Address

0

1

2

3

Chroma U data

Pixel block 1

Pixel block 2

Channel

U

U

Address

0

1

Chroma V data

Pixel block 1

Pixel block 2

Channel

V

V

Address

0

1

Pixel 0 is the top left of the image, and pixels appear in order first from left to right, then top to bottom as memory addresses increase. There must be an even number of rows and an even number of pixels per row.

YUV420sp (NV12 and NV21) Images

XnorNet API functions:

YUV is a family of image formats that represent color as a luminance (Y) and two chroma (U, V) components. YUV is generally more compressible than RGB and is often the native output format of image sensing hardware.

YUV420sp is a semi-planar (some channels separated, some interleaved) variant of the YUV format that halves the resolution of the chroma (U and V) channels. Note that this means each byte of U or V channel data applies to a 2x2 block of pixels in the image. As such, both the image width and height must be a multiple of 2. Unlike YUV420p, the U and V channel data are interleaved within a single buffer.

In the NV12 variant of YUV420sp, U comes before V in the chroma buffer. In NV21, V comes before U.

Example: An image returned by the Android camera API.

Memory Layout: The image is broken into two regions, or “planes”: the Y plane, and the UV plane. There are \(W \times H\) bytes of Y channel data, but only \(W/2 \times H/2 \times 2 = (W \times H)/2\) bytes of U and V channel data.

The Y channel is the same for both NV12 and NV21 variants of YUV420sp:

Luminance (Y channel) data

Pixel 1

Pixel 2

Pixel 3

Pixel 4

Channel

Y

Y

Y

Y…

Address

0

1

2

3…

In NV12 format, the UV plane is interleaved in U, V order as memory addresses increase:

NV12 Chroma (UV) channel data

Pixel block 1

Pixel block 2

Channel

U

V

U

V

Address

0

1

2

3

In NV21 format, the UV (VU) plane is interleaved in V, U order as memory addresses increase:

NV21 Chroma (VU) channel data

Pixel block 1

Pixel block 2

Channel

V

U

V

U

Address

0

1

2

3

Pixel 0 is the top left of the image, and pixels appear in order first from left to right, then top to bottom as memory addresses increase. There must be an even number of rows and an even number of pixels per row.