Getting Started With KW-COCO¶
This document is a work in progress, and does need to be updated and refactored.
FAQ¶
Q: What is kwcoco
? A: An extension of the MS-COCO data format for
storing a “manifest” of categories, images, and annotations.
Q: Why yet another data format? A: MS-COCO did not have support for video and multimodal imagery. These are important problems in computer vision and it seems reasonable (although challenging) that there could be a data format that could be used as an interchange for almost all vision problems.
Q: Why extend MS-COCO and not create something else? A: To draw on the existing adoption of the MS-COCO format.
Q: What’s so great about MS-COCO? A: It has an intuitive data structure that’s simple to interface with.
Q: Why not pycocotools? A: That module doesn’t allow you to edit the
dataset programmatically, and requires C backend. This module allows
dynamic modification addition and removal of images / categories /
annotations / videos, in addition to other places where it goes beyond
the functionality of the pycocotools module. We have a much more
configurable / expressive way of computing and recording object
detection metrics. If we are using an mscoco-compliant database (which
can be verified / coerced from the kwcoco conform
CLI tool), then we
do call pycocotools for functionality not directly implemented here.
Q: Would you ever extend kwcoco to go beyond computer vision? A: Maybe, it would be something new though, and only use kwcoco as an inspiration. If extending past computer vision I would want to go back and rename / reorganize the spec.
Examples¶
These python files have a few example uses cases of kwcoco
Design Goals¶
Always be a strict superset of the original MS-COCO format
Extend the scope of MS-COCO to broader computer-vision domains.
Have a fast pure-Python API to perform lower level tasks. (Allow optional C backends for features that need speed boosts)
Have an easy-to-use command line interface to perform higher level tasks.
Use cases¶
KWCoco has been designed to work with these tasks in these image modalities.
Tasks¶
Captioning
Classification
Segmentation
Keypoint Detection / Pose Estimation
Object Detection
Modalities¶
Single Image
Video
Multispectral Imagery
Images with auxiliary information (2.5d, flow, disparity, stereo)
Combinations of the above.
KWCOCO Spec¶
A high level description of the kwcoco spec is given in kwcoco.coco_dataset
.
A formal json-schema is defined in kwcoco.coco_schema
and is shown
here:
KWCOCO_SCHEMA¶
The formal kwcoco schema |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
|||||||||||
|
|||||||||||
|
type |
array |
|||||||||
items |
CATEGORY |
||||||||||
High level information about an annotation category |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
A unique internal category id |
||||||||||
type |
integer |
||||||||||
|
A unique external category name or identifier |
||||||||||
type |
string |
||||||||||
pattern |
[^/]+ |
||||||||||
|
A list of alternate names that should be resolved to this category |
||||||||||
type |
array |
||||||||||
items |
type |
string |
|||||||||
pattern |
[^/]+ |
||||||||||
|
anyOf |
A coarser category name |
|||||||||
type |
string |
||||||||||
pattern |
[^/]+ |
||||||||||
type |
null |
||||||||||
|
Used for multiple inheritance |
||||||||||
type |
array |
||||||||||
items |
type |
string |
|||||||||
pattern |
[^/]+ |
||||||||||
|
deprecated |
||||||||||
|
deprecated |
||||||||||
|
type |
array |
|||||||||
items |
KEYPOINT_CATEGORY |
||||||||||
High level information about an annotation category |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
The name of the keypoint category |
||||||||||
type |
string |
||||||||||
pattern |
[^/]+ |
||||||||||
|
type |
integer |
|||||||||
|
anyOf |
type |
string |
||||||||
pattern |
[^/]+ |
||||||||||
type |
null |
||||||||||
|
The keypoint category this should change to if the image is horizontally flipped |
||||||||||
anyOf |
type |
integer |
|||||||||
type |
null |
||||||||||
|
type |
array |
|||||||||
items |
VIDEO |
||||||||||
High level information about a group of temporally ordered images |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
An internal video identifier |
||||||||||
type |
integer |
||||||||||
|
A unique name for this video |
||||||||||
type |
string |
||||||||||
pattern |
[^/]+ |
||||||||||
|
A video level text caption |
||||||||||
type |
string |
||||||||||
|
a unit representing the size of a pixel in video space |
||||||||||
anyOf |
type |
number |
|||||||||
type |
string |
||||||||||
type |
null |
||||||||||
|
type |
array |
|||||||||
items |
IMAGE |
||||||||||
High level information about a image file or a collection of image files corresponding to a single point in (or small interval of) time |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
a unique internal image identifier |
||||||||||
type |
integer |
||||||||||
|
anyOf |
A relative or absolute path to the main image file. If this file_name is unspecified, then a name and auxiliary items or assets must be specified. Likewise this should be null if assets are used. |
|||||||||
type |
string |
||||||||||
type |
null |
||||||||||
|
anyOf |
A unique name for the image. If unspecified the file_name should be used as the default value for the name property. Required if assets / auxiliary are specified. |
|||||||||
type |
string |
||||||||||
pattern |
[^/]+ |
||||||||||
type |
null |
||||||||||
|
The width of the image in image space pixels |
||||||||||
type |
integer |
||||||||||
|
The height of the image in image space pixels |
||||||||||
type |
integer |
||||||||||
|
The video this image belongs to |
||||||||||
type |
integer |
||||||||||
|
anyOf |
An ISO-8601 timestamp |
|||||||||
type |
string |
||||||||||
A UNIX timestamp |
|||||||||||
type |
number |
||||||||||
|
Used to temporally order the images in a video |
||||||||||
type |
integer |
||||||||||
|
anyOf |
CHANNEL_SPEC |
|||||||||
A human readable channel name. Must be compatible with kwcoco.ChannelSpec |
|||||||||||
type |
string |
||||||||||
pattern |
[^/]* |
||||||||||
type |
null |
||||||||||
|
a unit representing the size of a pixel in image space |
||||||||||
anyOf |
type |
number |
|||||||||
type |
string |
||||||||||
type |
null |
||||||||||
|
This will be deprecated for assets in the future |
||||||||||
type |
array |
||||||||||
items |
ASSET |
||||||||||
Information about a single file belonging to an image |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
type |
string |
|||||||||
|
CHANNEL_SPEC |
||||||||||
A human readable channel name. Must be compatible with kwcoco.ChannelSpec |
|||||||||||
type |
string |
||||||||||
pattern |
[^/]* |
||||||||||
|
The width in asset-space pixels |
||||||||||
type |
integer |
||||||||||
|
The height in asset-space pixels |
||||||||||
type |
integer |
||||||||||
|
A list of assets belonging to this image, used when image channels are split across multiple files |
||||||||||
type |
array |
||||||||||
items |
ASSET |
||||||||||
Information about a single file belonging to an image |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
type |
string |
|||||||||
|
CHANNEL_SPEC |
||||||||||
A human readable channel name. Must be compatible with kwcoco.ChannelSpec |
|||||||||||
type |
string |
||||||||||
pattern |
[^/]* |
||||||||||
|
The width in asset-space pixels |
||||||||||
type |
integer |
||||||||||
|
The height in asset-space pixels |
||||||||||
type |
integer |
||||||||||
|
type |
array |
|||||||||
items |
ANNOTATION |
||||||||||
Metadata about some semantic attribute of an image. |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
A unique internal id for this annotation |
||||||||||
type |
integer |
||||||||||
|
The image id this annotation belongs to |
||||||||||
type |
integer |
||||||||||
|
BBOX |
||||||||||
[top-left x, top-left-y, width, height] in image-space pixels |
|||||||||||
type |
array |
||||||||||
items |
type |
number |
|||||||||
maxItems |
4 |
||||||||||
minItems |
4 |
||||||||||
|
The category id of this annotation |
||||||||||
type |
integer |
||||||||||
|
An identifier used to group annotations belonging to the same object over multiple frames in a video |
||||||||||
anyOf |
type |
integer |
|||||||||
type |
string |
||||||||||
type |
string |
||||||||||
|
A polygon or mask specifying the pixels in this annotation in image-space |
||||||||||
anyOf |
anyOf |
KWCOCO_POLYGON |
|||||||||
A new-style polygon format that supports holes |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
counter-clockwise xy exterior points |
||||||||||
type |
array |
||||||||||
items |
type |
array |
|||||||||
items |
type |
number |
|||||||||
maxItems |
2 |
||||||||||
minItems |
2 |
||||||||||
|
type |
array |
|||||||||
items |
clockwise xy hole |
||||||||||
type |
array |
||||||||||
items |
type |
array |
|||||||||
items |
type |
number |
|||||||||
maxItems |
2 |
||||||||||
minItems |
2 |
||||||||||
type |
array |
||||||||||
items |
KWCOCO_POLYGON |
||||||||||
A new-style polygon format that supports holes |
|||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
counter-clockwise xy exterior points |
||||||||||
type |
array |
||||||||||
items |
type |
array |
|||||||||
items |
type |
number |
|||||||||
maxItems |
2 |
||||||||||
minItems |
2 |
||||||||||
|
type |
array |
|||||||||
items |
clockwise xy hole |
||||||||||
type |
array |
||||||||||
items |
type |
array |
|||||||||
items |
type |
number |
|||||||||
maxItems |
2 |
||||||||||
minItems |
2 |
||||||||||
MSCOCO_POLYGON |
|||||||||||
an old-style polygon [x1,y1,v1,…,xk,yk,vk] |
|||||||||||
type |
array |
||||||||||
items |
type |
number |
|||||||||
type |
array |
||||||||||
items |
MSCOCO_POLYGON |
||||||||||
an old-style polygon [x1,y1,v1,…,xk,yk,vk] |
|||||||||||
type |
array |
||||||||||
items |
type |
number |
|||||||||
A run-length-encoding mask format read by pycocotools |
|||||||||||
type |
string |
||||||||||
|
A set of categorized points belonging to this annotation in image space |
||||||||||
anyOf |
MSCOCO_KEYPOINTS |
||||||||||
An old-style set of keypoints (x1,y1,v1,…,xk,yk,vk) |
|||||||||||
type |
array |
||||||||||
items |
type |
integer |
|||||||||
type |
array |
||||||||||
items |
KWCOCO_KEYPOINT |
||||||||||
type |
object |
||||||||||
properties |
|||||||||||
|
<x1, y1> in pixels |
||||||||||
type |
array |
||||||||||
items |
type |
number |
|||||||||
maxItems |
2 |
||||||||||
minItems |
2 |
||||||||||
|
choice(0, 1, 2) |
||||||||||
type |
integer |
||||||||||
|
type |
integer |
|||||||||
|
only to be used as a hint |
||||||||||
type |
string |
||||||||||
|
This needs to be in the same order as categories. The probability order currently needs to be known a-priori, typically in order of the classes, but its hard to always keep that consistent. This SPEC is subject to change in the future. |
||||||||||
type |
array |
||||||||||
items |
type |
number |
|||||||||
|
Typically assigned to predicted annotations |
||||||||||
type |
number |
||||||||||
|
Typically given to truth annotations to indicate quality. |
||||||||||
type |
number |
||||||||||
|
A legacy mscoco field used to indicate if an annotation contains multiple objects |
||||||||||
anyOf |
type |
integer |
|||||||||
type |
boolean |
||||||||||
|
An annotation-level text caption |
||||||||||
type |
string |
The Python API¶
Creating a dataset¶
The Python API can be used to load an existing dataset or initialize an empty dataset. In both cases the dataset can be modified by adding/removing/editing categories, videos, images, and annotations.
You can load an existing dataset as such:
import kwcoco
dset = kwcoco.CocoDataset('path/to/data.kwcoco.json')
You can initialize an empty dataset as such:
import kwcoco
dset = kwcoco.CocoDataset()
In both cases you can add and remove data items. When you add an item, it returns the internal integer primary id used to refer to that item.
cid = dset.add_category(name='cat')
gid = dset.add_image(file_name='/path/to/limecat.jpg')
aid = dset.add_annotation(image_id=gid, category_id=cid, bbox=[0, 0, 100, 100])
The CocoDataset
class has an instance variable dset.dataset
which is the loaded JSON data structure. This dataset can be interacted
with directly.
# Loop over all categories, images, and annotations
for img in dset.dataset['categories']:
print(img)
for img in dset.dataset['images']:
print(img)
for img in dset.dataset['annotations']:
print(img)
This the above example, this will result in:
OrderedDict([('id', 1), ('name', 'cat')])
OrderedDict([('id', 1), ('file_name', '/path/to/limecat.jpg')])
OrderedDict([('id', 1), ('image_id', 1), ('category_id', 1), ('bbox', [0, 0, 100, 100])])
In the above example, you can display the underlying dataset
structure as such
print(dset.dumps(indent=' ', newlines=True))
This results in
{
"info": [],
"licenses": [],
"categories": [
{"id": 1, "name": "cat"}
],
"videos": [],
"images": [
{"id": 1, "file_name": "/path/to/limecat.jpg"}
],
"annotations": [
{"id": 1, "image_id": 1, "category_id": 1, "bbox": [0, 0, 100, 100]}
]
}
In addition to accessing dset.dataset
directly, the CocoDataset
object maintains an index
which allows the user to quickly lookup
objects by primary or secondary keys. A list of available indexes are:
dset.index.anns # a mapping from annotation-ids to annotation dictionaries
dset.index.imgs # a mapping from image-ids to image dictionaries
dset.index.videos # a mapping from video-ids to video dictionaries
dset.index.cats # a mapping from category-ids to category dictionaries
dset.index.gid_to_aids # a mapping from an image id to annotation ids contained in the image
dset.index.cid_to_aids # a mapping from an annotation id to annotation ids with that category
dset.index.vidid_to_gids # a mapping from an video id to image ids contained in the video
dset.index.name_to_video # a mapping from a video name to the video dictionary
dset.index.name_to_cat # a mapping from a category name to the category dictionary
dset.index.name_to_img # a mapping from an image name to the image dictionary
dset.index.file_name_to_img # a mapping from an image file name to the image dictionary
These indexes are dynamically updated when items are added or removed.
Using kwcoco to write a torch dataset¶
The easiest way to write a torch dataset with kwcoco is to combine it with ndsampler
Examples of kwcoco + ndsampler being to write torch datasets to train deep networks can be found in netharn’s examples for: detection, classification, and segmentation
(Note: netharn is deprecated in favor of pytorch-lightning, but the dataset examples still hold)
Technical Debt¶
Based on design decisions made in the original MS-COCO and KW-COCO, there are a few weird things
The “bbox” field gives no indication it should be xywh format.
We can’t use “vid” as a variable name for “video-id” because “vid” is also an abbreviation for “video”. Hence, while category, image, and annotation all have a nice 1-letter prefix to their id in the standard variable names I use (i.e. cid, gid, aid). I have to use vidid to refer to “video-ids”.
I’m not in love with the way “keypoint_categories” are handled.
Are “images” always “images”? Are “videos” always “videos”?
Would we benefit from using JSON-LD?
The “prob” field needs to be better defined
The name “video” might be confusing. Its just a temporally ordered group of images.
Code Examples¶
See the README and the doctests.