`kwcoco`¶

The Kitware COCO module defines a variant of the Microsoft COCO format, originally developed for the “collected images in context” object detection challenge. We are backwards compatible with the original module, but we also have improved implementations in several places, including segmentations, keypoints, annotation tracks, multi-spectral images, and videos (which represents a generic sequence of images).

A kwcoco file is a “manifest” that serves as a single reference that points to all images, categories, and annotations in a computer vision dataset. Thus, when applying an algorithm to a dataset, it is sufficient to have the algorithm take one dataset parameter: the path to the kwcoco file. Generally a kwcoco file will live in a “bundle” directory along with the data that it references, and paths in the kwcoco file will be relative to the location of the kwcoco file itself.

The main data structure in this model is largely based on the implementation in https://github.com/cocodataset/cocoapi It uses the same efficient core indexing data structures, but in our implementation the indexing can be optionally turned off, functions are silent by default (with the exception of long running processes, which optionally show progress by default). We support helper functions that add and remove images, categories, and annotations.

The kwcoco.CocoDataset class is capable of dynamic addition and removal of categories, images, and annotations. Has better support for keypoints and segmentation formats than the original COCO format. Despite being written in Python, this data structure is reasonably efficient.

>>> import kwcoco
>>> import json
>>> # Create demo data
>>> demo = CocoDataset.demo()
>>> # could also use demo.dump / demo.dumps, but this is more explicit
>>> text = json.dumps(demo.dataset)
>>> with open('demo.json', 'w') as file:
>>>    file.write(text)

>>> # Read from disk
>>> self = CocoDataset('demo.json')

>>> # Add data
>>> cid = self.add_category('Cat')
>>> gid = self.add_image('new-img.jpg')
>>> aid = self.add_annotation(image_id=gid, category_id=cid, bbox=[0, 0, 100, 100])

>>> # Remove data
>>> self.remove_annotations([aid])
>>> self.remove_images([gid])
>>> self.remove_categories([cid])

>>> # Look at data
>>> print(ub.repr2(self.basic_stats(), nl=1))
>>> print(ub.repr2(self.extended_stats(), nl=2))
>>> print(ub.repr2(self.boxsize_stats(), nl=3))
>>> print(ub.repr2(self.category_annotation_frequency()))


>>> # Inspect data
>>> import kwplot
>>> kwplot.autompl()
>>> self.show_image(gid=1)

>>> # Access single-item data via imgs, cats, anns
>>> cid = 1
>>> self.cats[cid]
{'id': 1, 'name': 'astronaut', 'supercategory': 'human'}

>>> gid = 1
>>> self.imgs[gid]
{'id': 1, 'file_name': 'astro.png', 'url': 'https://i.imgur.com/KXhKM72.png'}

>>> aid = 3
>>> self.anns[aid]
{'id': 3, 'image_id': 1, 'category_id': 3, 'line': [326, 369, 500, 500]}

>>> # Access multi-item data via the annots and images helper objects
>>> aids = self.index.gid_to_aids[2]
>>> annots = self.annots(aids)

>>> print('annots = {}'.format(ub.repr2(annots, nl=1, sv=1)))
annots = <Annots(num=2)>

>>> annots.lookup('category_id')
[6, 4]

>>> annots.lookup('bbox')
[[37, 6, 230, 240], [124, 96, 45, 18]]

>>> # built in conversions to efficient kwimage array DataStructures
>>> print(ub.repr2(annots.detections.data))
{
    'boxes': <Boxes(xywh,
                 array([[ 37.,   6., 230., 240.],
                        [124.,  96.,  45.,  18.]], dtype=float32))>,
    'class_idxs': np.array([5, 3], dtype=np.int64),
    'keypoints': <PointsList(n=2) at 0x7f07eda33220>,
    'segmentations': <PolygonList(n=2) at 0x7f086365aa60>,
}

>>> gids = list(self.imgs.keys())
>>> images = self.images(gids)
>>> print('images = {}'.format(ub.repr2(images, nl=1, sv=1)))
images = <Images(num=3)>

>>> images.lookup('file_name')
['astro.png', 'carl.png', 'stars.png']

>>> print('images.annots = {}'.format(images.annots))
images.annots = <AnnotGroups(n=3, m=3.7, s=3.9)>

>>> print('images.annots.cids = {!r}'.format(images.annots.cids))
images.annots.cids = [[1, 2, 3, 4, 5, 5, 5, 5, 5], [6, 4], []]

CocoDataset API¶

The following is a logical grouping of the public kwcoco.CocoDataset API attributes and methods. See the in-code documentation for further details.

CocoDataset classmethods (via MixinCocoExtras)¶

kwcoco.CocoDataset.coerce - Attempt to transform the input into the intended CocoDataset.

kwcoco.CocoDataset.demo - Create a toy coco dataset for testing and demo puposes

kwcoco.CocoDataset.random - Creates a random CocoDataset according to distribution parameters

CocoDataset classmethods (via CocoDataset)¶

kwcoco.CocoDataset.from_coco_paths - Constructor from multiple coco file paths.

kwcoco.CocoDataset.from_data - Constructor from a json dictionary

kwcoco.CocoDataset.from_image_paths - Constructor from a list of images paths.

CocoDataset slots¶

kwcoco.CocoDataset.index -

kwcoco.CocoDataset.hashid -

kwcoco.CocoDataset.hashid_parts -

kwcoco.CocoDataset.tag -

kwcoco.CocoDataset.dataset -

kwcoco.CocoDataset.bundle_dpath -

kwcoco.CocoDataset.assets_dpath -

kwcoco.CocoDataset.cache_dpath -

CocoDataset properties¶

kwcoco.CocoDataset.anns -

kwcoco.CocoDataset.cats -

kwcoco.CocoDataset.cid_to_aids -

kwcoco.CocoDataset.data_fpath -

kwcoco.CocoDataset.data_root -

kwcoco.CocoDataset.fpath -

kwcoco.CocoDataset.gid_to_aids -

kwcoco.CocoDataset.img_root -

kwcoco.CocoDataset.imgs -

kwcoco.CocoDataset.n_annots -

kwcoco.CocoDataset.n_cats -

kwcoco.CocoDataset.n_images -

kwcoco.CocoDataset.n_videos -

kwcoco.CocoDataset.name_to_cat -

CocoDataset methods (via MixinCocoAddRemove)¶

kwcoco.CocoDataset.add_annotation - Add an annotation to the dataset (dynamically updates the index)

kwcoco.CocoDataset.add_annotations - Faster less-safe multi-item alternative to add_annotation.

kwcoco.CocoDataset.add_category - Adds a category

kwcoco.CocoDataset.add_image - Add an image to the dataset (dynamically updates the index)

kwcoco.CocoDataset.add_images - Faster less-safe multi-item alternative

kwcoco.CocoDataset.add_video - Add a video to the dataset (dynamically updates the index)

kwcoco.CocoDataset.clear_annotations - Removes all annotations (but not images and categories)

kwcoco.CocoDataset.clear_images - Removes all images and annotations (but not categories)

kwcoco.CocoDataset.ensure_category - Like add_category(), but returns the existing category id if it already exists instead of failing. In this case all metadata is ignored.

kwcoco.CocoDataset.ensure_image - Like add_image(),, but returns the existing image id if it already exists instead of failing. In this case all metadata is ignored.

kwcoco.CocoDataset.remove_annotation - Remove a single annotation from the dataset

kwcoco.CocoDataset.remove_annotation_keypoints - Removes all keypoints with a particular category

kwcoco.CocoDataset.remove_annotations - Remove multiple annotations from the dataset.

kwcoco.CocoDataset.remove_categories - Remove categories and all annotations in those categories. Currently does not change any hierarchy information

kwcoco.CocoDataset.remove_images - Remove images and any annotations contained by them

kwcoco.CocoDataset.remove_keypoint_categories - Removes all keypoints of a particular category as well as all annotation keypoints with those ids.

kwcoco.CocoDataset.remove_videos - Remove videos and any images / annotations contained by them

kwcoco.CocoDataset.set_annotation_category - Sets the category of a single annotation

CocoDataset methods (via MixinCocoObjects)¶

kwcoco.CocoDataset.annots - Return vectorized annotation objects

kwcoco.CocoDataset.categories - Return vectorized category objects

kwcoco.CocoDataset.images - Return vectorized image objects

kwcoco.CocoDataset.videos - Return vectorized video objects

CocoDataset methods (via MixinCocoStats)¶

kwcoco.CocoDataset.basic_stats - Reports number of images, annotations, and categories.

kwcoco.CocoDataset.boxsize_stats - Compute statistics about bounding box sizes.

kwcoco.CocoDataset.category_annotation_frequency - Reports the number of annotations of each category

kwcoco.CocoDataset.category_annotation_type_frequency - Reports the number of annotations of each type for each category

kwcoco.CocoDataset.conform - Make the COCO file conform a stricter spec, infers attibutes where possible.

kwcoco.CocoDataset.extended_stats - Reports number of images, annotations, and categories.

kwcoco.CocoDataset.find_representative_images - Find images that have a wide array of categories. Attempt to find the fewest images that cover all categories using images that contain both a large and small number of annotations.

kwcoco.CocoDataset.keypoint_annotation_frequency -

kwcoco.CocoDataset.stats - This function corresponds to :module:`kwcoco.cli.coco_stats`.

kwcoco.CocoDataset.validate - Performs checks on this coco dataset.

CocoDataset methods (via MixinCocoAccessors)¶

kwcoco.CocoDataset.category_graph - Construct a networkx category hierarchy

kwcoco.CocoDataset.delayed_load - Experimental method

kwcoco.CocoDataset.get_auxiliary_fpath - Returns the full path to auxiliary data for an image

kwcoco.CocoDataset.get_image_fpath - Returns the full path to the image

kwcoco.CocoDataset.keypoint_categories - Construct a consistent CategoryTree representation of keypoint classes

kwcoco.CocoDataset.load_annot_sample - Reads the chip of an annotation. Note this is much less efficient than using a sampler, but it doesn’t require disk cache.

kwcoco.CocoDataset.load_image - Reads an image from disk and

kwcoco.CocoDataset.object_categories - Construct a consistent CategoryTree representation of object classes

CocoDataset methods (via CocoDataset)¶

kwcoco.CocoDataset.copy - Deep copies this object

kwcoco.CocoDataset.dump - Writes the dataset out to the json format

kwcoco.CocoDataset.dumps - Writes the dataset out to the json format

kwcoco.CocoDataset.subset - Return a subset of the larger coco dataset by specifying which images to port. All annotations in those images will be taken.

kwcoco.CocoDataset.union - Merges multiple CocoDataset items into one. Names and associations are retained, but ids may be different.

kwcoco.CocoDataset.view_sql - Create a cached SQL interface to this dataset suitable for large scale multiprocessing use cases.

CocoDataset methods (via MixinCocoExtras)¶

kwcoco.CocoDataset.corrupted_images - Check for images that don’t exist or can’t be opened

kwcoco.CocoDataset.missing_images - Check for images that don’t exist

kwcoco.CocoDataset.rename_categories - Rename categories with a potentially coarser categorization.

kwcoco.CocoDataset.reroot - Rebase image/data paths onto a new image/data root.

CocoDataset methods (via MixinCocoDraw)¶

kwcoco.CocoDataset.draw_image - Use kwimage to draw all annotations on an image and return the pixels as a numpy array.

kwcoco.CocoDataset.imread - Loads a particular image

kwcoco.CocoDataset.show_image - Use matplotlib to show an image with annotations overlaid

Subpackages¶

Submodules¶

Package Contents¶

Classes¶

AbstractCocoDataset

This is a common base for all variants of the Coco Dataset

CategoryTree

Wrapper that maintains flat or hierarchical category information.

CocoDataset

Notes

class kwcoco.AbstractCocoDataset[source]¶

Bases: abc.ABC

This is a common base for all variants of the Coco Dataset

At the time of writing there is kwcoco.CocoDataset (which is the dictionary-based backend), and the kwcoco.coco_sql_dataset.CocoSqlDataset, which is experimental.

class kwcoco.CategoryTree(graph=None)[source]¶

Bases: ubelt.NiceRepr

Wrapper that maintains flat or hierarchical category information.

Helps compute softmaxes and probabilities for tree-based categories where a directed edge (A, B) represents that A is a superclass of B.

Notes

There are three basic properties that this object maintains:

node:: Alphanumeric string names that should be generally descriptive. Using spaces and special characters in these names is discouraged, but can be done. This is the COCO category “name” attribute. For categories this may be denoted as (name, node, cname, catname).
id:: The integer id of a category should ideally remain consistent. These are often given by a dataset (e.g. a COCO dataset). This is the COCO category “id” attribute. For categories this is often denoted as (id, cid).
index:: Contigous zero-based indices that indexes the list of categories. These should be used for the fastest access in backend computation tasks. Typically corresponds to the ordering of the channels in the final linear layer in an associated model. For categories this is often denoted as (index, cidx, idx, or cx).

Variables

idx_to_node (List[str]) – a list of class names. Implicitly maps from index to category name.
id_to_node (Dict[int, str]) – maps integer ids to category names
node_to_id (Dict[str, int]) – maps category names to ids
node_to_idx (Dict[str, int]) – maps category names to indexes
graph (nx.Graph) – a Graph that stores any hierarchy information. For standard mutually exclusive classes, this graph is edgeless. Nodes in this graph can maintain category attributes / properties.
idx_groups (List[List[int]]) – groups of category indices that share the same parent category.

Example

>>> from kwcoco.category_tree import *
>>> graph = nx.from_dict_of_lists({
>>>     'background': [],
>>>     'foreground': ['animal'],
>>>     'animal': ['mammal', 'fish', 'insect', 'reptile'],
>>>     'mammal': ['dog', 'cat', 'human', 'zebra'],
>>>     'zebra': ['grevys', 'plains'],
>>>     'grevys': ['fred'],
>>>     'dog': ['boxer', 'beagle', 'golden'],
>>>     'cat': ['maine coon', 'persian', 'sphynx'],
>>>     'reptile': ['bearded dragon', 't-rex'],
>>> }, nx.DiGraph)
>>> self = CategoryTree(graph)
>>> print(self)
<CategoryTree(nNodes=22, maxDepth=6, maxBreadth=4...)>

Example

>>> # The coerce classmethod is the easiest way to create an instance
>>> import kwcoco
>>> kwcoco.CategoryTree.coerce(['a', 'b', 'c'])
<CategoryTree(nNodes=3, nodes=['a', 'b', 'c']) ...
>>> kwcoco.CategoryTree.coerce(4)
<CategoryTree(nNodes=4, nodes=['class_1', 'class_2', 'class_3', ...
>>> kwcoco.CategoryTree.coerce(4)

copy(self)¶

classmethod from_mutex(cls, nodes, bg_hack=True)¶

Parameters: nodes (List[str]) – or a list of class names (in which case they will all be assumed to be mutually exclusive)

Example

>>> print(CategoryTree.from_mutex(['a', 'b', 'c']))
<CategoryTree(nNodes=3, ...)>

classmethod from_json(cls, state)¶

Parameters: state (Dict) – see __getstate__ / __json__ for details

classmethod from_coco(cls, categories)¶

Create a CategoryTree object from coco categories

Parameters: List[Dict] – list of coco-style categories

classmethod coerce(cls, data, **kw)¶

Attempt to coerce data as a CategoryTree object.

This is primarily useful for when the software stack depends on categories being represent

This will work if the input data is a specially formatted json dict, a list of mutually exclusive classes, or if it is already a CategoryTree. Otherwise an error will be thrown.

Parameters

data (object) – a known representation of a category tree.
**kwargs – input type specific arguments

Returns

self

Return type

CategoryTree

Raises

TypeError - if the input format is unknown –
ValueError - if kwargs are not compatible with the input format –

Example

>>> import kwcoco
>>> classes1 = kwcoco.CategoryTree.coerce(3)  # integer
>>> classes2 = kwcoco.CategoryTree.coerce(classes1.__json__())  # graph dict
>>> classes3 = kwcoco.CategoryTree.coerce(['class_1', 'class_2', 'class_3'])  # mutex list
>>> classes4 = kwcoco.CategoryTree.coerce(classes1.graph)  # nx Graph
>>> classes5 = kwcoco.CategoryTree.coerce(classes1)  # cls
>>> # xdoctest: +REQUIRES(module:ndsampler)
>>> import ndsampler
>>> classes6 = ndsampler.CategoryTree.coerce(3)
>>> classes7 = ndsampler.CategoryTree.coerce(classes1)
>>> classes8 = kwcoco.CategoryTree.coerce(classes6)

classmethod demo(cls, key='coco', **kwargs)¶

Parameters

key (str) – specify which demo dataset to use. Can be ‘coco’ (which uses the default coco demo data). Can be ‘btree’ which creates a binary tree and accepts kwargs

‘r’ and ‘h’ for branching-factor and height.

Can be ‘btree2’, which is the same as btree but returns strings

CommandLine:: xdoctest -m ~/code/kwcoco/kwcoco/category_tree.py CategoryTree.demo

Example

>>> from kwcoco.category_tree import *
>>> self = CategoryTree.demo()
>>> print('self = {}'.format(self))
self = <CategoryTree(nNodes=10, maxDepth=2, maxBreadth=4...)>

to_coco(self)¶

Converts to a coco-style data structure

Yields: Dict – coco category dictionaries

id_to_idx(self)¶

Example

>>> import kwcoco
>>> self = kwcoco.CategoryTree.demo()
>>> self.id_to_idx[1]

idx_to_id(self)¶

Example

>>> import kwcoco
>>> self = kwcoco.CategoryTree.demo()
>>> self.idx_to_id[0]

idx_to_ancestor_idxs(self, include_self=True)¶

Mapping from a class index to its ancestors

Parameters: include_self (bool, default=True) – if True includes each node as its own ancestor.

idx_to_descendants_idxs(self, include_self=False)¶

Mapping from a class index to its descendants (including itself)

Parameters: include_self (bool, default=False) – if True includes each node as its own descendant.

idx_pairwise_distance(self)¶

Get a matrix encoding the distance from one class to another.

Distances

from parents to children are positive (descendants),
from children to parents are negative (ancestors),
between unreachable nodes (wrt to forward and reverse graph) are

nan.

__len__(self)¶

__iter__(self)¶

__getitem__(self, index)¶

__contains__(self, node)¶

__json__(self)¶

Example

>>> import pickle
>>> self = CategoryTree.demo()
>>> print('self = {!r}'.format(self.__json__()))

__getstate__(self)¶

Serializes information in this class

Example

>>> from kwcoco.category_tree import *
>>> import pickle
>>> self = CategoryTree.demo()
>>> state = self.__getstate__()
>>> serialization = pickle.dumps(self)
>>> recon = pickle.loads(serialization)
>>> assert recon.__json__() == self.__json__()

__setstate__(self, state)¶

__nice__(self)¶

is_mutex(self)¶

Returns True if all categories are mutually exclusive (i.e. flat)

If true, then the classes may be represented as a simple list of class names without any loss of information, otherwise the underlying category graph is necessary to preserve all knowledge.

Todo

[ ] what happens when we have a dummy root?

property num_classes(self)¶

property class_names(self)¶

property category_names(self)¶

property cats(self)¶

Returns a mapping from category names to category attributes.

If this category tree was constructed from a coco-dataset, then this will contain the coco category attributes.

Returns: Dict[str, Dict[str, object]]

Example

>>> from kwcoco.category_tree import *
>>> self = CategoryTree.demo()
>>> print('self.cats = {!r}'.format(self.cats))

index(self, node)¶: Return the index that corresponds to the category name

_build_index(self)¶: construct lookup tables

show(self)¶

Ignore:

>>> import kwplot
>>> kwplot.autompl()
>>> from kwcoco import category_tree
>>> self = category_tree.CategoryTree.demo()
>>> self.show()

python -c “import kwplot, kwcoco, graphid; kwplot.autompl(); graphid.util.show_nx(kwcoco.category_tree.CategoryTree.demo().graph); kwplot.show_if_requested()” –show

class kwcoco.CocoDataset(data=None, tag=None, bundle_dpath=None, img_root=None, fname=None, autobuild=True)[source]¶

Bases: kwcoco.abstract_coco_dataset.AbstractCocoDataset, MixinCocoAddRemove, MixinCocoStats, MixinCocoObjects, MixinCocoDraw, MixinCocoAccessors, MixinCocoExtras, MixinCocoIndex, MixinCocoDepricate, ubelt.NiceRepr

Notes

A keypoint annotation

{: “image_id” : int, “category_id” : int, “keypoints” : [x1,y1,v1,…,xk,yk,vk], “score” : float,

} Note that v[i] is a visibility flag, where v=0: not labeled,

v=1: labeled but not visible, and v=2: labeled and visible.

A bounding box annotation

{: “image_id” : int, “category_id” : int, “bbox” : [x,y,width,height], “score” : float,

}

We also define a non-standard “line” annotation (which

our fixup scripts will interpret as the diameter of a circle to convert into a bounding box)

A line* annotation (note this is a non-standard field)

{: “image_id” : int, “category_id” : int, “line” : [x1,y1,x2,y2], “score” : float,

}

Lastly, note that our datasets will sometimes specify multiple bbox, line, and/or, keypoints fields. In this case we may also specify a field roi_shape, which denotes which field is the “main” annotation type.

Variables

dataset (Dict) – raw json data structure. This is the base dictionary that contains {‘annotations’: List, ‘images’: List, ‘categories’: List}
index (CocoIndex) – an efficient lookup index into the coco data structure. The index defines its own attributes like anns, cats, imgs, etc. See CocoIndex for more details on which attributes are available.
fpath (PathLike | None) – if known, this stores the filepath the dataset was loaded from
tag (str) – A tag indicating the name of the dataset.
bundle_dpath (PathLike | None) – If known, this is the root path that all image file names are relative to. This can also be manually overwritten by the user.
hashid (str | None) – If computed, this will be a hash uniquely identifing the dataset. To ensure this is computed see _build_hashid().

References

http://cocodataset.org/#format http://cocodataset.org/#download

CommandLine:: python -m kwcoco.coco_dataset CocoDataset –show

Example

>>> dataset = demo_coco_data()
>>> self = CocoDataset(dataset, tag='demo')
>>> # xdoctest: +REQUIRES(--show)
>>> self.show_image(gid=2)
>>> from matplotlib import pyplot as plt
>>> plt.show()

property fpath(self)¶: In the future we will deprecate img_root for bundle_dpath

_infer_dirs(self)¶

Example

self = dset

classmethod from_data(CocoDataset, data, bundle_dpath=None, img_root=None)¶: Constructor from a json dictionary

classmethod from_image_paths(CocoDataset, gpaths, bundle_dpath=None, img_root=None)¶

Constructor from a list of images paths.

This is a convinience method.

Parameters: gpaths (List[str]) – list of image paths

Example

>>> coco_dset = CocoDataset.from_image_paths(['a.png', 'b.png'])
>>> assert coco_dset.n_images == 2

classmethod from_coco_paths(CocoDataset, fpaths, max_workers=0, verbose=1, mode='thread', union='try')¶

Constructor from multiple coco file paths.

Loads multiple coco datasets and unions the result

Notes

if the union operation fails, the list of individually loaded files is returned instead.

Parameters

fpaths (List[str]) – list of paths to multiple coco files to be loaded and unioned.
max_workers (int, default=0) – number of worker threads / processes
verbose (int) – verbosity level
mode (str) – thread, process, or serial
union (str | bool, default=’try’) – If True, unions the result datasets after loading. If False, just returns the result list. If ‘try’, then try to preform the union, but return the result list if it fails.

copy(self)¶

Deep copies this object

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> new = self.copy()
>>> assert new.imgs[1] is new.dataset['images'][0]
>>> assert new.imgs[1] == self.dataset['images'][0]
>>> assert new.imgs[1] is not self.dataset['images'][0]

__nice__(self)¶

dumps(self, indent=None, newlines=False)¶

Writes the dataset out to the json format

Parameters: newlines (bool) – if True, each annotation, image, category gets its own line

Notes

Using newlines=True is similar to:: print(ub.repr2(dset.dataset, nl=2, trailsep=False)) However, the above may not output valid json if it contains ndarrays.

Example

>>> from kwcoco.coco_dataset import *
>>> import json
>>> self = CocoDataset.demo()
>>> text = self.dumps(newlines=True)
>>> print(text)
>>> self2 = CocoDataset(json.loads(text), tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

>>> text = self.dumps(newlines=True)
>>> print(text)
>>> self2 = CocoDataset(json.loads(text), tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

dump(self, file, indent=None, newlines=False)¶

Writes the dataset out to the json format

Parameters

file (PathLike | FileLike) – Where to write the data. Can either be a path to a file or an open file pointer / stream.
newlines (bool) – if True, each annotation, image, category gets its own line.

Example

>>> import tempfile
>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> file = tempfile.NamedTemporaryFile('w')
>>> self.dump(file)
>>> file.seek(0)
>>> text = open(file.name, 'r').read()
>>> print(text)
>>> file.seek(0)
>>> dataset = json.load(open(file.name, 'r'))
>>> self2 = CocoDataset(dataset, tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

>>> file = tempfile.NamedTemporaryFile('w')
>>> self.dump(file, newlines=True)
>>> file.seek(0)
>>> text = open(file.name, 'r').read()
>>> print(text)
>>> file.seek(0)
>>> dataset = json.load(open(file.name, 'r'))
>>> self2 = CocoDataset(dataset, tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

_check_json_serializable(self, verbose=1)¶: Debug which part of a coco dataset might not be json serializable

_check_integrity(self)¶: perform all checks

_check_index(self)¶

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> self._check_index()
>>> # Force a failure
>>> self.index.anns.pop(1)
>>> self.index.anns.pop(2)
>>> import pytest
>>> with pytest.raises(AssertionError):
>>>     self._check_index()

_check_pointers(self, verbose=1)¶: Check that all category and image ids referenced by annotations exist

_build_index(self)¶

union(*others, disjoint_tracks=True, **kwargs)¶

Merges multiple CocoDataset items into one. Names and associations are retained, but ids may be different.

Parameters

*others – a series of CocoDatasets that we will merge. Note, if called as an instance method, the “self” instance will be the first item in the “others” list. But if called like a classmethod, “others” will be empty by default.
disjoint_tracks (bool, default=True) – if True, we will assume track-ids are disjoint and if two datasets share the same track-id, we will disambiguate them. Otherwise they will be copied over as-is.
**kwargs – constructor options for the new merged CocoDataset

Returns

a new merged coco dataset

Return type

CocoDataset

CommandLine:: xdoctest -m kwcoco.coco_dataset CocoDataset.union

Example

>>> # Test union works with different keypoint categories
>>> dset1 = CocoDataset.demo('shapes1')
>>> dset2 = CocoDataset.demo('shapes2')
>>> dset1.remove_keypoint_categories(['bot_tip', 'mid_tip', 'right_eye'])
>>> dset2.remove_keypoint_categories(['top_tip', 'left_eye'])
>>> dset_12a = CocoDataset.union(dset1, dset2)
>>> dset_12b = dset1.union(dset2)
>>> dset_21 = dset2.union(dset1)
>>> def add_hist(h1, h2):
>>>     return {k: h1.get(k, 0) + h2.get(k, 0) for k in set(h1) | set(h2)}
>>> kpfreq1 = dset1.keypoint_annotation_frequency()
>>> kpfreq2 = dset2.keypoint_annotation_frequency()
>>> kpfreq_want = add_hist(kpfreq1, kpfreq2)
>>> kpfreq_got1 = dset_12a.keypoint_annotation_frequency()
>>> kpfreq_got2 = dset_12b.keypoint_annotation_frequency()
>>> assert kpfreq_want == kpfreq_got1
>>> assert kpfreq_want == kpfreq_got2

>>> # Test disjoint gid datasets
>>> import kwcoco
>>> dset1 = kwcoco.CocoDataset.demo('shapes3')
>>> for new_gid, img in enumerate(dset1.dataset['images'], start=10):
>>>     for aid in dset1.gid_to_aids[img['id']]:
>>>         dset1.anns[aid]['image_id'] = new_gid
>>>     img['id'] = new_gid
>>> dset1.index.clear()
>>> dset1._build_index()
>>> # ------
>>> dset2 = kwcoco.CocoDataset.demo('shapes2')
>>> for new_gid, img in enumerate(dset2.dataset['images'], start=100):
>>>     for aid in dset2.gid_to_aids[img['id']]:
>>>         dset2.anns[aid]['image_id'] = new_gid
>>>     img['id'] = new_gid
>>> dset1.index.clear()
>>> dset2._build_index()
>>> others = [dset1, dset2]
>>> merged = kwcoco.CocoDataset.union(*others)
>>> print('merged = {!r}'.format(merged))
>>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1)))
>>> assert set(merged.imgs) & set([10, 11, 12, 100, 101]) == set(merged.imgs)

>>> # Test data is not preserved
>>> dset2 = kwcoco.CocoDataset.demo('shapes2')
>>> dset1 = kwcoco.CocoDataset.demo('shapes3')
>>> others = (dset1, dset2)
>>> cls = self = kwcoco.CocoDataset
>>> merged = cls.union(*others)
>>> print('merged = {!r}'.format(merged))
>>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1)))
>>> assert set(merged.imgs) & set([1, 2, 3, 4, 5]) == set(merged.imgs)

>>> # Test track-ids are mapped correctly
>>> dset1 = kwcoco.CocoDataset.demo('vidshapes1')
>>> dset2 = kwcoco.CocoDataset.demo('vidshapes2')
>>> dset3 = kwcoco.CocoDataset.demo('vidshapes3')
>>> others = (dset1, dset2, dset3)
>>> for dset in others:
>>>     [a.pop('segmentation', None) for a in dset.index.anns.values()]
>>>     [a.pop('keypoints', None) for a in dset.index.anns.values()]
>>> cls = self = kwcoco.CocoDataset
>>> merged = cls.union(*others, disjoint_tracks=1)
>>> print('dset1.anns = {}'.format(ub.repr2(dset1.anns, nl=1)))
>>> print('dset2.anns = {}'.format(ub.repr2(dset2.anns, nl=1)))
>>> print('dset3.anns = {}'.format(ub.repr2(dset3.anns, nl=1)))
>>> print('merged.anns = {}'.format(ub.repr2(merged.anns, nl=1)))

Example

>>> import kwcoco
>>> # Test empty union
>>> empty_union = kwcoco.CocoDataset.union()
>>> assert len(empty_union.index.imgs) == 0

Todo

[ ] are supercategories broken?
[ ] reuse image ids where possible
[ ] reuse annotation / category ids where possible
[X] handle case where no inputs are given
[x] disambiguate track-ids
[x] disambiguate video-ids

subset(self, gids, copy=False, autobuild=True)¶

Return a subset of the larger coco dataset by specifying which images to port. All annotations in those images will be taken.

Parameters

gids (List[int]) – image-ids to copy into a new dataset
copy (bool, default=False) – if True, makes a deep copy of all nested attributes, otherwise makes a shallow copy.
autobuild (bool, default=True) – if True will automatically build the fast lookup index.

Example

>>> self = CocoDataset.demo()
>>> gids = [1, 3]
>>> sub_dset = self.subset(gids)
>>> assert len(self.index.gid_to_aids) == 3
>>> assert len(sub_dset.gid_to_aids) == 2

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('vidshapes2')
>>> gids = [1, 2]
>>> sub_dset = self.subset(gids, copy=True)
>>> assert len(sub_dset.index.videos) == 1
>>> assert len(self.index.videos) == 2

Example

>>> self = CocoDataset.demo()
>>> sub1 = self.subset([1])
>>> sub2 = self.subset([2])
>>> sub3 = self.subset([3])
>>> others = [sub1, sub2, sub3]
>>> rejoined = CocoDataset.union(*others)
>>> assert len(sub1.anns) == 9
>>> assert len(sub2.anns) == 2
>>> assert len(sub3.anns) == 0
>>> assert rejoined.basic_stats() == self.basic_stats()

view_sql(self, force_rewrite=False, memory=False)¶

Create a cached SQL interface to this dataset suitable for large scale multiprocessing use cases.

Parameters

force_rewrite (bool, default=False) – if True, forces an update to any existing cache file on disk
memory (bool, default=False) – if True, the database is constructed in memory.

Note

This view cache is experimental and currently depends on the timestamp of the file pointed to by self.fpath. In other words dont use this on in-memory datasets.

kwcoco¶

CocoDataset API¶

CocoDataset classmethods (via MixinCocoExtras)¶

CocoDataset classmethods (via CocoDataset)¶

CocoDataset slots¶

CocoDataset properties¶

CocoDataset methods (via MixinCocoAddRemove)¶

CocoDataset methods (via MixinCocoObjects)¶

CocoDataset methods (via MixinCocoStats)¶

CocoDataset methods (via MixinCocoAccessors)¶

CocoDataset methods (via CocoDataset)¶

CocoDataset methods (via MixinCocoExtras)¶

CocoDataset methods (via MixinCocoDraw)¶

Subpackages¶

Submodules¶

Package Contents¶

Classes¶

`kwcoco`¶