KWCOCO Spaces

Recall that kwcoco has a list of image dictionaries. Each image may contain a “video_id” that references a video sequence it is part of. Each image can contain multiple auxiliary-images.

In the common case, images do not contain any auxiliary files. Auxiliary files are mainly used in multispectral or multimodal datasets. If an image has auxiliary files, these will be stored as a list of auxiliary dictionaries in the image dictionary.

For example a multispectral video with may look like this:

"videos": [
    {
        "name": "TheVideoName",
        "width": 300,
        "height": 400
    },
 ]

"images": [
    {
        "name": "TheImageName",
        "width": 600,
        "height": 800,
        "video_id": 1,
        "frame_index": 0,
        "warp_img_to_vid": {"scale": 0.5},
        "auxiliary": [
            {
                "file_name": "band1.tif",
                "warp_aux_to_img": {"scale": 2.0},
                "width": 300,
                "height": 400
                "channels": "band1",
                "num_bands": 1,
            },
            {
                "file_name": "band2.tif",
                "warp_aux_to_img": {"scale": 1.0},
                "channels": "band2",
                "num_bands": 1,
            },
        ],
    },
]

A rgb-video with no auxiliary images with may look like this:

"videos": [
    {
        "name": "TheVideoName",
        "width": 300,
        "height": 400
    },
 ]

"images": [
    {
        "name": "TheImageName",
        "width": 600,
        "height": 800,
        "video_id": 1,
        "frame_index": 0,
        "warp_img_to_vid": {"scale": 0.5},
        "file_name": "images/the_image.jpg"
        "channels": "r|g|b",
        "num_bands": 3,
    },
]

There are 3 coordinate spaces that we are concerned about in kwcoco.

  • Auxiliary Space - all the images are in their native on-disk resolutions.

  • Image Space - coordinates that belong to the base image or the coordinate that all “auxiliary” images are aligned to.

  • Video Spce - the coordinates that all images in a video sequence are aligned to.

Each “auxiliary” dictionary, if they exist, contains a warp_aux_to_img transform that aligns the auxiliary image into the common “image-space”.

Each “image” dictionary in a video sequence contains a warp_img_to_vid transform that aligns anything in image space into video space such that all frames in the video are aligned.

If these transforms are unspecified they are assumed to be the identity transform (i.e. no change), so they are only needed if your on-disk data is not already aligned (which can be desirable to save disk-space and reduce resampling artifacts).

The transform dictionaries can be in any format coercable by kwimage.Affine.

The kwcoco.CocoDataset also exposes the kwcoco.CocoDataset.delayed_load method, which can be used to access image information in image or video space.

import kwcoco
import ubelt as ub
gid = 1
self = kwcoco.CocoDataset.demo('vidshapes8-multispectral')

# Show the structure of the image and auxiliary dictionaries
print(ub.repr2(self.index.imgs[1], nl=-1, sort=0))

# The delayed object is a pointer to the image files that contains
# appropriate transformation. Additional transformations can be
# specified. These are all fused together to reduce resampling
# artifacts.
img_delayed = self.delayed_load(gid, space='image')
# Execture all transforms
img_final = img_delayed.finalize()

#
vid_delayed = self.delayed_load(gid, space='image')
# Execture all transforms
vid_final = vid_delayed.finalize()

Currently the warp_img_to_vid transform in the demo vidshapes8-multispectral dataset is the identity, but if it was different, then “vid_final” and “img_final” would be returned in different coordinate systems. (TODO: update demo data with an option such that the video and image space are different)