Getting Started With KW-COCO¶

This document is a work in progress, and does need to be updated and refactored.

FAQ¶

Q: What is kwcoco? A: An extension of the MS-COCO data format for storing a “manifest” of categories, images, and annotations.

Q: Why yet another data format? A: MS-COCO did not have support for video and multimodal imagery. These are important problems in computer vision and it seems reasonable (although challenging) that there could be a data format that could be used as an interchange for almost all vision problems.

Q: Why extend MS-COCO and not create something else? A: To draw on the existing adoption of the MS-COCO format.

Q: What’s so great about MS-COCO? A: It has an intuitive data structure that’s simple to interface with.

Q: Why not pycocotools? A: That module doesn’t allow you to edit the dataset programmatically, and requires C backend. This module allows dynamic modification addition and removal of images / categories / annotations / videos, in addition to other places where it goes beyond the functionality of the pycocotools module. We have a much more configurable / expressive way of computing and recording object detection metrics. If we are using an mscoco-compliant database (which can be verified / coerced from the kwcoco conform CLI tool), then we do call pycocotools for functionality not directly implemented here.

Q: Would you ever extend kwcoco to go beyond computer vision? A: Maybe, it would be something new though, and only use kwcoco as an inspiration. If extending past computer vision I would want to go back and rename / reorganize the spec.

Examples¶

These python files have a few example uses cases of kwcoco

Design Goals¶

Always be a strict superset of the original MS-COCO format
Extend the scope of MS-COCO to broader computer-vision domains.
Have a fast pure-Python API to perform lower level tasks. (Allow optional C backends for features that need speed boosts)
Have an easy-to-use command line interface to perform higher level tasks.

Use cases¶

KWCoco has been designed to work with these tasks in these image modalities.

Tasks¶

Captioning
Classification
Segmentation
Keypoint Detection / Pose Estimation
Object Detection

Modalities¶

Single Image
Video
Multispectral Imagery
Images with auxiliary information (2.5d, flow, disparity, stereo)
Combinations of the above.

KWCOCO Spec¶

A high level description of the kwcoco spec is given in kwcoco.coco_dataset.

A formal json-schema is defined in kwcoco.coco_schema and is shown here:

KWCOCO_SCHEMA¶

The formal kwcoco schema
type	object
properties
info
licenses
categories	type	array
	items	CATEGORY
		High level information about an annotation category
		type	object
		properties
		id	A unique internal category id
			type	integer
		name	A unique external category name or identifier
			type	string
			pattern	[^/]+
		alias	A list of alternate names that should be resolved to this category
			type	array
			items	type	string
				pattern	[^/]+
		supercategory	anyOf	A coarser category name
				type	string
				pattern	[^/]+
				type	null
		parents	Used for multiple inheritance
			type	array
			items	type	string
				pattern	[^/]+
		keypoints	deprecated
		skeleton	deprecated
keypoint_categories	type	array
	items	KEYPOINT_CATEGORY
		High level information about an annotation category
		type	object
		properties
		name	The name of the keypoint category
			type	string
			pattern	[^/]+
		id	type	integer
		supercategory	anyOf	type	string
				pattern	[^/]+
				type	null
		reflection_id	The keypoint category this should change to if the image is horizontally flipped
			anyOf	type	integer
				type	null
videos	type	array
	items	VIDEO
		High level information about a group of temporally ordered images
		type	object
		properties
		id	An internal video identifier
			type	integer
		name	A unique name for this video
			type	string
			pattern	[^/]+
		caption	A video level text caption
			type	string
		resolution	a unit representing the size of a pixel in video space
			anyOf	type	number
				type	string
				type	null
images	type	array
	items	IMAGE
		High level information about a image file or a collection of image files corresponding to a single point in (or small interval of) time
		type	object
		properties
		id	a unique internal image identifier
			type	integer
		file_name	anyOf	A relative or absolute path to the main image file. If this file_name is unspecified, then a name and auxiliary items or assets must be specified. Likewise this should be null if assets are used.
				type	string
				type	null
		name	anyOf	A unique name for the image. If unspecified the file_name should be used as the default value for the name property. Required if assets / auxiliary are specified.
				type	string
				pattern	[^/]+
				type	null
		width	The width of the image in image space pixels
			type	integer
		height	The height of the image in image space pixels
			type	integer
		video_id	The video this image belongs to
			type	integer
		timestamp	anyOf	An ISO-8601 timestamp
				type	string
				A UNIX timestamp
				type	number
		frame_index	Used to temporally order the images in a video
			type	integer
		channels	anyOf	CHANNEL_SPEC
				A human readable channel name. Must be compatible with kwcoco.ChannelSpec
				type	string
				pattern	[^/]*
				type	null
		resolution	a unit representing the size of a pixel in image space
			anyOf	type	number
				type	string
				type	null
		auxiliary	This will be deprecated for assets in the future
			type	array
			items	ASSET
				Information about a single file belonging to an image
				type	object
				properties
				file_name	type	string
				channels	CHANNEL_SPEC
					A human readable channel name. Must be compatible with kwcoco.ChannelSpec
					type	string
					pattern	[^/]*
				width	The width in asset-space pixels
					type	integer
				height	The height in asset-space pixels
					type	integer
		assets	A list of assets belonging to this image, used when image channels are split across multiple files
			type	array
			items	ASSET
				Information about a single file belonging to an image
				type	object
				properties
				file_name	type	string
				channels	CHANNEL_SPEC
					A human readable channel name. Must be compatible with kwcoco.ChannelSpec
					type	string
					pattern	[^/]*
				width	The width in asset-space pixels
					type	integer
				height	The height in asset-space pixels
type					integer
annotations	type	array
	items	ANNOTATION
		Metadata about some semantic attribute of an image.
		type	object
		properties
		id	A unique internal id for this annotation
			type	integer
		image_id	The image id this annotation belongs to
			type	integer
		bbox	BBOX
			[top-left x, top-left-y, width, height] in image-space pixels
			type	array
			items	type	number
			maxItems	4
			minItems	4
		category_id	The category id of this annotation
			type	integer
		track_id	An identifier used to group annotations belonging to the same object over multiple frames in a video
			anyOf	type	integer
				type	string
				type	string
		segmentation	A polygon or mask specifying the pixels in this annotation in image-space
			anyOf	anyOf	KWCOCO_POLYGON
					A new-style polygon format that supports holes
					type	object
					properties
					exterior	counter-clockwise xy exterior points
						type	array
						items	type	array
							items	type	number
							maxItems	2
							minItems	2
					interiors	type	array
						items	clockwise xy hole
							type	array
							items	type	array
								items	type	number
								maxItems	2
								minItems	2
					type	array
					items	KWCOCO_POLYGON
						A new-style polygon format that supports holes
						type	object
						properties
						exterior	counter-clockwise xy exterior points
							type	array
							items	type	array
								items	type	number
								maxItems	2
								minItems	2
						interiors	type	array
							items	clockwise xy hole
								type	array
								items	type	array
									items	type	number
									maxItems	2
									minItems	2
					MSCOCO_POLYGON
					an old-style polygon [x1,y1,v1,…,xk,yk,vk]
					type	array
					items	type	number
					type	array
					items	MSCOCO_POLYGON
						an old-style polygon [x1,y1,v1,…,xk,yk,vk]
type						array
items	type					number
A run-length-encoding mask format read by pycocotools
type	string
keypoints	A set of categorized points belonging to this annotation in image space
	anyOf			MSCOCO_KEYPOINTS
				An old-style set of keypoints (x1,y1,v1,…,xk,yk,vk)
		type	array
		items	type	integer
		type	array
		items	KWCOCO_KEYPOINT
			type	object
			properties
			xy	<x1, y1> in pixels
				type	array
				items	type	number
				maxItems	2
				minItems	2
			visible	choice(0, 1, 2)
				type	integer
			keypoint_category_id	type	integer
			keypoint_category	only to be used as a hint
				type	string
prob	This needs to be in the same order as categories. The probability order currently needs to be known a-priori, typically in order of the classes, but its hard to always keep that consistent. This SPEC is subject to change in the future.
	type	array
	items	type	number
score	Typically assigned to predicted annotations
	type	number
weight	Typically given to truth annotations to indicate quality.
	type	number
iscrowd	A legacy mscoco field used to indicate if an annotation contains multiple objects
	anyOf	type	integer
		type	boolean
caption	An annotation-level text caption
	type	string

The Python API¶

Creating a dataset¶

The Python API can be used to load an existing dataset or initialize an empty dataset. In both cases the dataset can be modified by adding/removing/editing categories, videos, images, and annotations.

You can load an existing dataset as such:

import kwcoco
dset = kwcoco.CocoDataset('path/to/data.kwcoco.json')

You can initialize an empty dataset as such:

import kwcoco
dset = kwcoco.CocoDataset()

In both cases you can add and remove data items. When you add an item, it returns the internal integer primary id used to refer to that item.

cid = dset.add_category(name='cat')

gid = dset.add_image(file_name='/path/to/limecat.jpg')

aid = dset.add_annotation(image_id=gid, category_id=cid, bbox=[0, 0, 100, 100])

The CocoDataset class has an instance variable dset.dataset which is the loaded JSON data structure. This dataset can be interacted with directly.

# Loop over all categories, images, and annotations

for img in dset.dataset['categories']:
    print(img)

for img in dset.dataset['images']:
    print(img)

for img in dset.dataset['annotations']:
    print(img)

This the above example, this will result in:

OrderedDict([('id', 1), ('name', 'cat')])
OrderedDict([('id', 1), ('file_name', '/path/to/limecat.jpg')])
OrderedDict([('id', 1), ('image_id', 1), ('category_id', 1), ('bbox', [0, 0, 100, 100])])

In the above example, you can display the underlying dataset structure as such

print(dset.dumps(indent='    ', newlines=True))

This results in

{
"info": [],
"licenses": [],
"categories": [
    {"id": 1, "name": "cat"}
],
"videos": [],
"images": [
    {"id": 1, "file_name": "/path/to/limecat.jpg"}
],
"annotations": [
    {"id": 1, "image_id": 1, "category_id": 1, "bbox": [0, 0, 100, 100]}
]
}

In addition to accessing dset.dataset directly, the CocoDataset object maintains an index which allows the user to quickly lookup objects by primary or secondary keys. A list of available indexes are:

dset.index.anns    # a mapping from annotation-ids to annotation dictionaries
dset.index.imgs    # a mapping from image-ids to image dictionaries
dset.index.videos  # a mapping from video-ids to video dictionaries
dset.index.cats    # a mapping from category-ids to category dictionaries

dset.index.gid_to_aids    # a mapping from an image id to annotation ids contained in the image
dset.index.cid_to_aids    # a mapping from an annotation id to annotation ids with that category
dset.index.vidid_to_gids  # a mapping from an video id to image ids contained in the video

dset.index.name_to_video  # a mapping from a video name to the video dictionary
dset.index.name_to_cat    # a mapping from a category name to the category dictionary
dset.index.name_to_img    # a mapping from an image name to the image dictionary
dset.index.file_name_to_img  # a mapping from an image file name to the image dictionary

These indexes are dynamically updated when items are added or removed.

Using kwcoco to write a torch dataset¶

The easiest way to write a torch dataset with kwcoco is to combine it with ndsampler

Examples of kwcoco + ndsampler being to write torch datasets to train deep networks can be found in netharn’s examples for: detection, classification, and segmentation

(Note: netharn is deprecated in favor of pytorch-lightning, but the dataset examples still hold)

Technical Debt¶

Based on design decisions made in the original MS-COCO and KW-COCO, there are a few weird things

The “bbox” field gives no indication it should be xywh format.
We can’t use “vid” as a variable name for “video-id” because “vid” is also an abbreviation for “video”. Hence, while category, image, and annotation all have a nice 1-letter prefix to their id in the standard variable names I use (i.e. cid, gid, aid). I have to use vidid to refer to “video-ids”.
I’m not in love with the way “keypoint_categories” are handled.
Are “images” always “images”? Are “videos” always “videos”?
Would we benefit from using JSON-LD?
The “prob” field needs to be better defined
The name “video” might be confusing. Its just a temporally ordered group of images.

Code Examples¶

See the README and the doctests.