KWCOCO Views / Spaces

NOTE:

We are working on improving termonology and data structures when discussing this topic. Not all documentation may be up to date. There are several things to be aware of.

  • Moving forward we may replace the word “Space” with “View”

  • The old term is “Auxiliary”, which is being replaced with “Asset”

  • “Assets” are currently (as of 0.7.5) stored as part of the Image table, but we will likely move them to their own Asset table in the future.

Recall that kwcoco has a list of image dictionaries. Each image may contain a “video_id” that references a video sequence it is part of. Each image can contain multiple auxiliary-images (i.e. assets).

In the common case, images do not contain any auxiliary / asset files. Assets (or auxiliary files) are mainly used in multispectral or multimodal datasets. If an image has auxiliary files, these will be stored as a list of auxiliary dictionaries in the image dictionary.

For example a multispectral video with may look like this:

"videos": [
    {
        "name": "TheVideoName",
        "width": 300,
        "height": 400
    },
 ]

"images": [
    {
        "name": "TheImageName",
        "width": 600,
        "height": 800,
        "video_id": 1,
        "frame_index": 0,
        "warp_img_to_vid": {"scale": 0.5},
        "assets": [
            {
                "file_name": "band1.tif",
                "warp_aux_to_img": {"scale": 2.0},
                "width": 300,
                "height": 400
                "channels": "band1",
                "num_bands": 1,
            },
            {
                "file_name": "band2.tif",
                "warp_aux_to_img": {"scale": 1.0},
                "channels": "band2",
                "num_bands": 1,
            },
        ],
    },
]

A rgb-video with no assets with may look like this:

"videos": [
    {
        "name": "TheVideoName",
        "width": 300,
        "height": 400
    },
 ]

"images": [
    {
        "name": "TheImageName",
        "width": 600,
        "height": 800,
        "video_id": 1,
        "frame_index": 0,
        "warp_img_to_vid": {"scale": 0.5},
        "file_name": "images/the_image.jpg"
        "channels": "r|g|b",
        "num_bands": 3,
    },
]

There are 3 coordinate spaces that we are concerned about in kwcoco.

  • Asset Space - all the images are in their native on-disk resolutions.

  • Image Space - coordinates that belong to the base image or the coordinate that all “auxiliary” images are aligned to.

  • Video Spce - the coordinates that all images in a video sequence are aligned to.

The following visualizes these key spaces:

https://i.imgur.com/QuiSJwR.png

NOTE: we use the terms “auxiliary” and “asset” interchangably. The code was originally written using “auxiliary”, but we later switched to “asset”. Termonology will vary based on when something was written, eventually we will deprecate all instances of “auxiliary” and only refer to “assets”.

Each “asset” / “auxiliary” dictionary, if they exist, contains a warp_aux_to_img transform that aligns the auxiliary image into the common “image-space”.

Each “image” dictionary in a video sequence contains a warp_img_to_vid transform that aligns anything in image space into video space such that all frames in the video are aligned.

If these transforms are unspecified they are assumed to be the identity transform (i.e. no change), so they are only needed if your on-disk data is not already aligned (which can be desirable to save disk-space and reduce resampling artifacts).

The transform dictionaries can be in any format coercable by kwimage.Affine.

The kwcoco.CocoDataset also exposes the kwcoco.CocoDataset.delayed_load method, which can be used to access image information in image or video space.

import kwcoco
import ubelt as ub
gid = 1
self = kwcoco.CocoDataset.demo('vidshapes8-multispectral')

# Show the structure of the image and auxiliary dictionaries
print(ub.urepr(self.index.imgs[1], nl=-1, sort=0))

# The delayed object is a pointer to the image files that contains
# appropriate transformation. Additional transformations can be
# specified. These are all fused together to reduce resampling
# artifacts.
img_delayed = self.delayed_load(gid, space='image')
# Execture all transforms
img_final = img_delayed.finalize()

#
vid_delayed = self.delayed_load(gid, space='image')
# Execture all transforms
vid_final = vid_delayed.finalize()

Currently the warp_img_to_vid transform in the demo vidshapes8-multispectral dataset is the identity, but if it was different, then “vid_final” and “img_final” would be returned in different coordinate systems. (TODO: update demo data with an option such that the video and image space are different)