kwcoco.coco_dataset
¶
An implementation and extension of the original MS-COCO API 1.
Extends the format to also include line annotations.
The following describes psuedo-code for the high level spec (some of which may
not be have full support in the Python API). A formal json-schema is defined in
kwcoco.coco_schema
.
An informal spec is as follows:
# All object categories are defined here.
category = {
'id': int,
'name': str, # unique name of the category
'supercategory': str, # parent category name
}
# Videos are used to manage collections or sequences of images.
# Frames do not necesarilly have to be aligned or uniform time steps
video = {
'id': int,
'name': str, # a unique name for this video.
'width': int # the base width of this video (all associated images must have this width)
'height': int # the base height of this video (all associated images must have this height)
# In the future this may be extended to allow pointing to video files
}
# Specifies how to find sensor data of a particular scene at a particular
# time. This is usually paths to rgb images, but auxiliary information
# can be used to specify multiple bands / etc...
image = {
'id': int,
'name': str, # an encouraged but optional unique name
'file_name': str, # relative path to the "base" image data
'width': int, # pixel width of "base" image
'height': int, # pixel height of "base" image
'channels': <ChannelSpec>, # a string encoding of the channels in the main image
'auxiliary': [ # information about any auxiliary channels / bands
{
'file_name': str, # relative path to associated file
'channels': <ChannelSpec>, # a string encoding
'width': <int> # pixel width of auxiliary image
'height': <int> # pixel height of auxiliary image
'warp_aux_to_img': <TransformSpec>, # tranform from "base" image space to auxiliary image space. (identity if unspecified)
}, ...
]
'video_id': str # if this image is a frame in a video sequence, this id is shared by all frames in that sequence.
'timestamp': str | int # a iso-string timestamp or an integer in flicks.
'frame_index': int # ordinal frame index which can be used if timestamp is unknown.
'warp_img_to_vid': <TransformSpec> # a transform image space to video space (identity if unspecified), can be used for sensor alignment or video stabilization
}
TransformSpec:
The spec can be anything coercable to a kwimage.Affine object.
This can be an explicit affine transform matrix like:
{'type': 'affine': 'matrix': <a-3x3 matrix>},
But it can also be a concise dict containing one or more of these keys
{
'scale': <float|Tuple[float, float]>,
'offset': <float|Tuple[float, float]>,
'skew': <float>,
'theta': <float>, # radians counter-clock-wise
}
ChannelSpec:
This is a string that describes the channel composition of an image.
For the purposes of kwcoco, separate different channel names with a
pipe ('|'). If the spec is not specified, methods may fall back on
grayscale or rgb processing. There are special string. For instance
'rgb' will expand into 'r|g|b'. In other applications you can "late
fuse" inputs by separating them with a "," and "early fuse" by
separating with a "|". Early fusion returns a solid array/tensor, late
fusion returns separated arrays/tensors.
# Ground truth is specified as annotations, each belongs to a spatial
# region in an image. This must reference a subregion of the image in pixel
# coordinates. Additional non-schma properties can be specified to track
# location in other coordinate systems. Annotations can be linked over time
# by specifying track-ids.
annotation = {
'id': int,
'image_id': int,
'category_id': int,
'track_id': <int | str | uuid> # indicates association between annotations across images
'bbox': [tl_x, tl_y, w, h], # xywh format)
'score' : float,
'prob' : List[float],
'weight' : float,
'caption': str, # a text caption for this annotation
'keypoints' : <Keypoints | List[int] > # an accepted keypoint format
'segmentation': <RunLengthEncoding | Polygon | MaskPath | WKT >, # an accepted segmentation format
}
# A dataset bundles a manifest of all aformentioned data into one structure.
dataset = {
'categories': [category, ...],
'videos': [video, ...]
'images': [image, ...]
'annotations': [annotation, ...]
'licenses': [],
'info': [],
}
Polygon:
A flattned list of xy coordinates.
[x1, y1, x2, y2, ..., xn, yn]
or a list of flattned list of xy coordinates if the CCs are disjoint
[[x1, y1, x2, y2, ..., xn, yn], [x1, y1, ..., xm, ym],]
Note: the original coco spec does not allow for holes in polygons.
We also allow a non-standard dictionary encoding of polygons
{'exterior': [(x1, y1)...],
'interiors': [[(x1, y1), ...], ...]}
TODO: Support WTK
RunLengthEncoding:
The RLE can be in a special bytes encoding or in a binary array
encoding. We reuse the original C functions are in [2]_ in
``kwimage.structs.Mask`` to provide a convinient way to abstract this
rather esoteric bytes encoding.
For pure python implementations see kwimage:
Converting from an image to RLE can be done via kwimage.run_length_encoding
Converting from RLE back to an image can be done via:
kwimage.decode_run_length
For compatibility with the COCO specs ensure the binary flags
for these functions are set to true.
Keypoints:
Annotation keypoints may also be specified in this non-standard (but
ultimately more general) way:
'annotations': [
{
'keypoints': [
{
'xy': <x1, y1>,
'visible': <0 or 1 or 2>,
'keypoint_category_id': <kp_cid>,
'keypoint_category': <kp_name, optional>, # this can be specified instead of an id
}, ...
]
}, ...
],
'keypoint_categories': [{
'name': <str>,
'id': <int>, # an id for this keypoint category
'supercategory': <kp_name> # name of coarser parent keypoint class (for hierarchical keypoints)
'reflection_id': <kp_cid> # specify only if the keypoint id would be swapped with another keypoint type
},...
]
In this scheme the "keypoints" property of each annotation (which used
to be a list of floats) is now specified as a list of dictionaries that
specify each keypoints location, id, and visibility explicitly. This
allows for things like non-unique keypoints, partial keypoint
annotations. This also removes the ordering requirement, which makes it
simpler to keep track of each keypoints class type.
We also have a new top-level dictionary to specify all the possible
keypoint categories.
TODO: Support WTK
Auxiliary Channels:
For multimodal or multispectral images it is possible to specify
auxiliary channels in an image dictionary as follows:
{
'id': int,
'file_name': str, # path to the "base" image (may be None)
'name': str, # a unique name for the image (must be given if file_name is None)
'channels': <spec>, # a spec code that indicates the layout of the "base" image channels.
'auxiliary': [ # information about auxiliary channels
{
'file_name': str,
'channels': <spec>
}, ... # can have many auxiliary channels with unique specs
]
}
Video Sequences:
For video sequences, we add the following video level index:
'videos': [
{ 'id': <int>, 'name': <video_name:str> },
]
Note that the videos might be given as encoded mp4/avi/etc.. files (in
which case the name should correspond to a path) or as a series of
frames in which case the images should be used to index the extracted
frames and information in them.
Then image dictionaries are augmented as follows:
{
'video_id': str # optional, if this image is a frame in a video sequence, this id is shared by all frames in that sequence.
'timestamp': int # optional, timestamp (ideally in flicks), used to identify the timestamp of the frame. Only applicable video inputs.
'frame_index': int # optional, ordinal frame index which can be used if timestamp is unknown.
}
And annotations are augmented as follows:
{
'track_id': <int | str | uuid> # optional, indicates association between annotations across frames
}
Note
The main object in this file is CocoDataset
, which is composed of
several mixin classes. See the class and method documentation for more
details.
Todo
[ ] Use ijson to lazilly load pieces of the dataset in the background or on demand. This will give us faster access to categories / images, whereas we will always have to wait for annotations etc…
[X] Should img_root be changed to bundle_dpath?
[ ] Read video data, return numpy arrays (requires API for images)
[ ] Spec for video URI, and convert to frames @ framerate function.
[ ] Document channel spec
[X] remove videos
References
- 1
- 2
https://github.com/nightrome/cocostuffapi/blob/master/PythonAPI/pycocotools/mask.py
- 3
https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch/#coco-dataset-format
Module Contents¶
Classes¶
These functions are marked for deprication and may be removed at any time |
|
TODO: better name |
|
Misc functions for coco |
|
Expose methods to construct object lists / groups. |
|
Methods for getting stats about the dataset |
|
Helper class to tracks unused ids for new items |
|
Helper to recycle ids for unions. |
|
helper to ensure names will be unique by appending suffixes |
|
Matplotlib / display functionality |
|
Mixin functions to dynamically add / remove annotations images and |
|
Sorted set is a sorted mutable set. |
|
Fast lookup index for the COCO dataset with dynamic modification |
|
Give the dataset top level access to index attributes |
|
The main coco dataset class with a json dataset backend. |
Functions¶
|
|
|
|
Simple data for testing. |
Attributes¶
- kwcoco.coco_dataset.SPEC_KEYS = ['info', 'licenses', 'categories', 'keypoint_categories', 'videos', 'images', 'annotations'][source]¶
- class kwcoco.coco_dataset.MixinCocoDepricate[source]¶
Bases:
object
These functions are marked for deprication and may be removed at any time
- class kwcoco.coco_dataset.MixinCocoAccessors[source]¶
Bases:
object
TODO: better name
- delayed_load(self, gid, channels=None, space='image')[source]¶
Experimental method
- Parameters
gid (int) – image id to load
channels (FusedChannelSpec) – specific channels to load. if unspecified, all channels are loaded.
space (str) – can either be “image” for loading in image space, or “video” for loading in video space.
Todo
- [X] Currently can only take all or none of the channels from each
base-image / auxiliary dict. For instance if the main image is r|g|b you can’t just select g|b at the moment.
- [X] The order of the channels in the delayed load should
match the requested channel order.
[X] TODO: add nans to bands that don’t exist or throw an error
Example
>>> import kwcoco >>> gid = 1 >>> # >>> self = kwcoco.CocoDataset.demo('vidshapes8-multispectral') >>> delayed = self.delayed_load(gid) >>> print('delayed = {!r}'.format(delayed)) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize())) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize(as_xarray=True))) >>> # >>> self = kwcoco.CocoDataset.demo('shapes8') >>> delayed = self.delayed_load(gid) >>> print('delayed = {!r}'.format(delayed)) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize())) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize(as_xarray=True)))
>>> crop = delayed.delayed_crop((slice(0, 3), slice(0, 3))) >>> crop.finalize() >>> crop.finalize(as_xarray=True)
>>> # TODO: should only select the "red" channel >>> self = kwcoco.CocoDataset.demo('shapes8') >>> delayed = self.delayed_load(gid, channels='r')
>>> import kwcoco >>> gid = 1 >>> # >>> self = kwcoco.CocoDataset.demo('vidshapes8-multispectral') >>> delayed = self.delayed_load(gid, channels='B1|B2', space='image') >>> print('delayed = {!r}'.format(delayed)) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize(as_xarray=True))) >>> delayed = self.delayed_load(gid, channels='B1|B2|B11', space='image') >>> print('delayed = {!r}'.format(delayed)) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize(as_xarray=True))) >>> delayed = self.delayed_load(gid, channels='B8|B1', space='video') >>> print('delayed = {!r}'.format(delayed)) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize(as_xarray=True)))
>>> delayed = self.delayed_load(gid, channels='B8|foo|bar|B1', space='video') >>> print('delayed = {!r}'.format(delayed)) >>> print('delayed.finalize() = {!r}'.format(delayed.finalize(as_xarray=True)))
- load_image(self, gid_or_img, channels=None)[source]¶
Reads an image from disk and
- Parameters
gid_or_img (int or dict) – image id or image dict
channels (str | None) – if specified, load data from auxiliary channels instead
- Returns
the image
- Return type
np.ndarray
Todo
- [ ] allow specification of multiple channels - use delayed image
for this.
- get_image_fpath(self, gid_or_img, channels=None)[source]¶
Returns the full path to the image
- Parameters
gid_or_img (int or dict) – image id or image dict
channels (str, default=None) – if specified, return a path to data containing auxiliary channels instead
- Returns
full path to the image
- Return type
PathLike
- _get_img_auxiliary(self, gid_or_img, channels)[source]¶
returns the auxiliary dictionary for a specific channel
- get_auxiliary_fpath(self, gid_or_img, channels)[source]¶
Returns the full path to auxiliary data for an image
- Parameters
gid_or_img (int | dict) – an image or its id
channels (str) – the auxiliary channel to load (e.g. disparity)
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes8', aux=True) >>> self.get_auxiliary_fpath(1, 'disparity')
- load_annot_sample(self, aid_or_ann, image=None, pad=None)[source]¶
Reads the chip of an annotation. Note this is much less efficient than using a sampler, but it doesn’t require disk cache.
Maybe depricate?
- Parameters
aid_or_int (int or dict) – annot id or dict
image (ArrayLike, default=None) – preloaded image (note: this process is inefficient unless image is specified)
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> sample = self.load_annot_sample(2, pad=100) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(sample['im']) >>> kwplot.show_if_requested()
- _resolve_to_cid(self, id_or_name_or_dict)[source]¶
Ensures output is an category id
Note
this does not resolve aliases (yet), for that see _alias_to_cat
Todo
we could maintain an alias index to make this fast
- _resolve_to_kpcat(self, kp_identifier)[source]¶
Lookup a keypoint-category dict via its name or id
- Parameters
kp_identifier (int | str | dict) – either the keypoint category name, alias, or its keypoint_category_id.
- Returns
keypoint category dictionary
- Return type
Dict
Example
>>> self = CocoDataset.demo('shapes') >>> kpcat1 = self._resolve_to_kpcat(1) >>> kpcat2 = self._resolve_to_kpcat('left_eye') >>> assert kpcat1 is kpcat2 >>> import pytest >>> with pytest.raises(KeyError): >>> self._resolve_to_cat('human')
- _resolve_to_cat(self, cat_identifier)[source]¶
Lookup a coco-category dict via its name, alias, or id.
- Parameters
cat_identifier (int | str | dict) – either the category name, alias, or its category_id.
- Raises
KeyError – if the category doesn’t exist.
Note
If the index is not built, the method will work but may be slow.
Example
>>> self = CocoDataset.demo() >>> cat = self._resolve_to_cat('human') >>> import pytest >>> assert self._resolve_to_cat(cat['id']) is cat >>> assert self._resolve_to_cat(cat) is cat >>> with pytest.raises(KeyError): >>> self._resolve_to_cat(32) >>> self.index.clear() >>> assert self._resolve_to_cat(cat['id']) is cat >>> with pytest.raises(KeyError): >>> self._resolve_to_cat(32)
- _alias_to_cat(self, alias_catname)[source]¶
Lookup a coco-category via its name or an “alias” name. In production code, use
_resolve_to_cat()
instead.- Parameters
alias_catname (str) – category name or alias
- Returns
coco category dictionary
- Return type
Example
>>> self = CocoDataset.demo() >>> cat = self._alias_to_cat('human') >>> import pytest >>> with pytest.raises(KeyError): >>> self._alias_to_cat('person') >>> cat['alias'] = ['person'] >>> self._alias_to_cat('person') >>> cat['alias'] = 'person' >>> self._alias_to_cat('person') >>> assert self._alias_to_cat(None) is None
- category_graph(self)[source]¶
Construct a networkx category hierarchy
- Returns
graph: a directed graph where category names are the nodes, supercategories define edges, and items in each category dict (e.g. category id) are added as node properties.
- Return type
Example
>>> self = CocoDataset.demo() >>> graph = self.category_graph() >>> assert 'astronaut' in graph.nodes() >>> assert 'keypoints' in graph.nodes['human']
- object_categories(self)[source]¶
Construct a consistent CategoryTree representation of object classes
- Returns
category data structure
- Return type
Example
>>> self = CocoDataset.demo() >>> classes = self.object_categories() >>> print('classes = {}'.format(classes))
- keypoint_categories(self)[source]¶
Construct a consistent CategoryTree representation of keypoint classes
- Returns
category data structure
- Return type
Example
>>> self = CocoDataset.demo() >>> classes = self.keypoint_categories() >>> print('classes = {}'.format(classes))
- _keypoint_category_names(self)[source]¶
Construct keypoint categories names.
Uses new-style if possible, otherwise this falls back on old-style.
- Returns
names: list of keypoint category names
- Return type
List[str]
Example
>>> self = CocoDataset.demo() >>> names = self._keypoint_category_names() >>> print(names)
- class kwcoco.coco_dataset.MixinCocoExtras[source]¶
Bases:
object
Misc functions for coco
- classmethod coerce(cls, key, **kw)[source]¶
Attempt to transform the input into the intended CocoDataset.
- Parameters
key – this can either be an instance of a CocoDataset, a string URI pointing to an on-disk dataset, or a special key for creating demodata.
**kw – passed to whatever constructor is chosen (if any)
Example
>>> # test coerce for various input methods >>> import kwcoco >>> from kwcoco.coco_sql_dataset import assert_dsets_allclose >>> dct_dset = kwcoco.CocoDataset.coerce('special:shapes8') >>> copy1 = kwcoco.CocoDataset.coerce(dct_dset) >>> copy2 = kwcoco.CocoDataset.coerce(dct_dset.fpath) >>> assert assert_dsets_allclose(dct_dset, copy1) >>> assert assert_dsets_allclose(dct_dset, copy2) >>> # xdoctest: +REQUIRES(module:sqlalchemy) >>> sql_dset = dct_dset.view_sql() >>> copy3 = kwcoco.CocoDataset.coerce(sql_dset) >>> copy4 = kwcoco.CocoDataset.coerce(sql_dset.fpath) >>> assert assert_dsets_allclose(dct_dset, sql_dset) >>> assert assert_dsets_allclose(dct_dset, copy3) >>> assert assert_dsets_allclose(dct_dset, copy4)
- classmethod demo(cls, key='photos', **kw)[source]¶
Create a toy coco dataset for testing and demo puposes
- Parameters
key (str) – either ‘photos’, ‘shapes’, or ‘vidshapes’. There are also undocumented sufixes that can control behavior. TODO: better documentation for these demo datasets.
**kw – if key is shapes, these arguments are passed to toydata generation. The Kwargs section of this docstring documents a subset of the available options. For full details, see
demodata_toy_dset()
andrandom_video_dset()
.
- Kwargs:
image_size (Tuple[int, int]): width / height size of the images
- dpath (str): path to the output image directory, defaults to using
kwcoco cache dir.
aux (bool): if True generates dummy auxiliary channels
- rng (int | RandomState, default=0):
random number generator or seed
verbose (int, default=3): verbosity mode
Example
>>> print(CocoDataset.demo('photos')) >>> print(CocoDataset.demo('shapes', verbose=1)) >>> print(CocoDataset.demo('vidshapes', verbose=1))
>>> print(CocoDataset.demo('shapes256', verbose=0)) >>> print(CocoDataset.demo('shapes8', verbose=0))
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, >>> verbose=0, rng=None) >>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, >>> num_tracks=4, verbose=0, rng=44) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> pnums = kwplot.PlotNums(nSubplots=len(dset.index.imgs)) >>> fnum = 1 >>> for gx, gid in enumerate(dset.index.imgs.keys()): >>> canvas = dset.draw_image(gid=gid) >>> kwplot.imshow(canvas, pnum=pnums[gx], fnum=fnum) >>> #dset.show_image(gid=gid, pnum=pnums[gx]) >>> kwplot.show_if_requested()
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo('vidshapes5-aux', num_frames=1, >>> verbose=0, rng=None)
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo('vidshapes1-multispectral', num_frames=5, >>> verbose=0, rng=None) >>> # This is the first use-case of image names >>> assert len(dset.index.file_name_to_img) == 0, ( >>> 'the multispectral demo case has no "base" image') >>> assert len(dset.index.name_to_img) == len(dset.index.imgs) == 5 >>> dset.remove_images([1]) >>> assert len(dset.index.name_to_img) == len(dset.index.imgs) == 4 >>> dset.remove_videos([1]) >>> assert len(dset.index.name_to_img) == len(dset.index.imgs) == 0
- classmethod random(cls, rng=None)[source]¶
Creates a random CocoDataset according to distribution parameters
Todo
[ ] parametarize
- _build_hashid(self, hash_pixels=False, verbose=0)[source]¶
Construct a hash that uniquely identifies the state of this dataset.
- Parameters
hash_pixels (bool, default=False) – If False the image data is not included in the hash, which can speed up computation, but is not 100% robust.
verbose (int) – verbosity level
Example
>>> self = CocoDataset.demo() >>> self._build_hashid(hash_pixels=True, verbose=3) ... >>> # Note: kwimage has changes the name of carl.png to carl.jpg >>> # in 0.7.0, so that modifies some of the hash. Once 0.7.0 >>> # is landed, we can update this test to re-check for >>> # those hashes. >>> print('self.hashid_parts = ' + ub.repr2(self.hashid_parts)) >>> print('self.hashid = {!r}'.format(self.hashid)) self.hashid_parts = { 'annotations': { 'json': 'e573f49da7b76e27d0...', 'num': 11, }, 'images': { 'pixels': '67d741fefc8...', 'json': '...', 'num': 3, }, 'categories': { 'json': '82d22e0079...', 'num': 8, }, } self.hashid = '...
# Old ‘json’: ‘6a446126490aa…’, self.hashid = ‘4769119614e921…
# New json’: ‘2221c71496a0… self.hashid = ‘77d445f05…
- Doctest:
>>> self = CocoDataset.demo() >>> self._build_hashid(hash_pixels=True, verbose=3) >>> self.hashid_parts >>> # Test that when we modify the dataset only the relevant >>> # hashid parts are recomputed. >>> orig = self.hashid_parts['categories']['json'] >>> self.add_category('foobar') >>> assert 'categories' not in self.hashid_parts >>> self.hashid_parts >>> self.hashid_parts['images']['json'] = 'should not change' >>> self._build_hashid(hash_pixels=True, verbose=3) >>> assert self.hashid_parts['categories']['json'] >>> assert self.hashid_parts['categories']['json'] != orig >>> assert self.hashid_parts['images']['json'] == 'should not change'
- _invalidate_hashid(self, parts=None)[source]¶
Called whenever the coco dataset is modified. It is possible to specify which parts were modified so unmodified parts can be reused next time the hash is constructed.
- _ensure_imgsize(self, workers=0, verbose=1, fail=False)[source]¶
Populate the imgsize field if it does not exist.
- Parameters
workers (int, default=0) – number of workers for parallel processing.
verbose (int, default=1) – verbosity level
fail (bool, default=False) – if True, raises an exception if anything size fails to load.
- Returns
- a list of “bad” image dictionaries where the size could
not be determined. Typically these are corrupted images and should be removed.
- Return type
List[dict]
Example
>>> # Normal case >>> self = CocoDataset.demo() >>> bad_imgs = self._ensure_imgsize() >>> assert len(bad_imgs) == 0 >>> assert self.imgs[1]['width'] == 512 >>> assert self.imgs[2]['width'] == 300 >>> assert self.imgs[3]['width'] == 256
>>> # Fail cases >>> self = CocoDataset() >>> self.add_image('does-not-exist.jpg') >>> bad_imgs = self._ensure_imgsize() >>> assert len(bad_imgs) == 1 >>> import pytest >>> with pytest.raises(Exception): >>> self._ensure_imgsize(fail=True)
- _ensure_image_data(self, gids=None, verbose=1)[source]¶
Download data from “url” fields if specified.
- Parameters
gids (List) – subset of images to download
- corrupted_images(self, check_aux=False, verbose=0)[source]¶
Check for images that don’t exist or can’t be opened
- rename_categories(self, mapper, rebuild=True, merge_policy='ignore')[source]¶
Rename categories with a potentially coarser categorization.
- Parameters
mapper (dict or Function) – maps old names to new names. If multiple names are mapped to the same category, those categories will be merged.
merge_policy (str) – How to handle multiple categories that map to the same name. Can be update or ignore.
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> self.rename_categories({'astronomer': 'person', >>> 'astronaut': 'person', >>> 'mouth': 'person', >>> 'helmet': 'hat'}) >>> assert 'hat' in self.name_to_cat >>> assert 'helmet' not in self.name_to_cat >>> # Test merge case >>> self = kwcoco.CocoDataset.demo() >>> mapper = { >>> 'helmet': 'rocket', >>> 'astronomer': 'rocket', >>> 'human': 'rocket', >>> 'mouth': 'helmet', >>> 'star': 'gas' >>> } >>> self.rename_categories(mapper)
- _aspycoco(self)[source]¶
Converts to the official pycocotools.coco.COCO object
Todo
[ ] Maybe expose as a public API?
- reroot(self, new_root=None, old_prefix=None, new_prefix=None, absolute=False, check=True, safe=True, smart=False)[source]¶
Modify the prefix of the image/data paths onto a new image/data root.
- Parameters
new_root (str, default=None) – New image root. If unspecified the current
self.bundle_dpath
is used. If old_prefix and new_prefix are unspecified, they will attempt to be determined based on the current root (which assumes the file paths exist at that root) and this new root.old_prefix (str, default=None) – If specified, removes this prefix from file names. This also prevents any inferences that might be made via “new_root”.
new_prefix (str, default=None) – If specified, adds this prefix to the file names. This also prevents any inferences that might be made via “new_root”.
absolute (bool, default=False) – if True, file names are stored as absolute paths, otherwise they are relative to the new image root.
check (bool, default=True) – if True, checks that the images all exist.
safe (bool, default=True) – if True, does not overwrite values until all checks pass
smart (bool, default=False) – If True, we can try different reroot strategies and choose the one that works. Note, always be wary when algorithms try to be smart. NOT IMPLEMENTED. DEPRECATE or TODO?
CommandLine
xdoctest -m kwcoco.coco_dataset MixinCocoExtras.reroot
Todo
[ ] Incorporate maximum ordered subtree embedding?
Example
>>> import kwcoco >>> def report(dset, name): >>> gid = 1 >>> abs_fpath = dset.get_image_fpath(gid) >>> rel_fpath = dset.index.imgs[gid]['file_name'] >>> color = 'green' if exists(abs_fpath) else 'red' >>> print('strategy_name = {!r}'.format(name)) >>> print(ub.color_text('abs_fpath = {!r}'.format(abs_fpath), color)) >>> print('rel_fpath = {!r}'.format(rel_fpath)) >>> dset = self = kwcoco.CocoDataset.demo() >>> # Change base relative directory >>> bundle_dpath = ub.expandpath('~') >>> print('ORIG self.imgs = {!r}'.format(self.imgs)) >>> print('ORIG dset.bundle_dpath = {!r}'.format(dset.bundle_dpath)) >>> print('NEW bundle_dpath = {!r}'.format(bundle_dpath)) >>> self.reroot(bundle_dpath) >>> report(self, 'self') >>> print('NEW self.imgs = {!r}'.format(self.imgs)) >>> assert self.imgs[1]['file_name'].startswith('.cache')
>>> # Use absolute paths >>> self.reroot(absolute=True) >>> assert self.imgs[1]['file_name'].startswith(bundle_dpath)
>>> # Switch back to relative paths >>> self.reroot() >>> assert self.imgs[1]['file_name'].startswith('.cache')
Example
>>> # demo with auxiliary data >>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes8', aux=True) >>> bundle_dpath = ub.expandpath('~') >>> print(self.imgs[1]['file_name']) >>> print(self.imgs[1]['auxiliary'][0]['file_name']) >>> self.reroot(new_root=bundle_dpath) >>> print(self.imgs[1]['file_name']) >>> print(self.imgs[1]['auxiliary'][0]['file_name']) >>> assert self.imgs[1]['file_name'].startswith('.cache') >>> assert self.imgs[1]['auxiliary'][0]['file_name'].startswith('.cache')
- class kwcoco.coco_dataset.MixinCocoObjects[source]¶
Bases:
object
Expose methods to construct object lists / groups.
This is an alternative vectorized ORM-like interface to the coco dataset
- annots(self, aids=None, gid=None, trackid=None)[source]¶
Return vectorized annotation objects
- Parameters
aids (List[int]) – annotation ids to reference, if unspecified all annotations are returned.
gid (int) – return all annotations that belong to this image id. mutually exclusive with other arguments.
trackid (int) – return all annotations that belong to this track. mutually exclusive with other arguments.
- Returns
vectorized annotation object
- Return type
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> annots = self.annots() >>> print(annots) <Annots(num=11)> >>> sub_annots = annots.take([1, 2, 3]) >>> print(sub_annots) <Annots(num=3)> >>> print(ub.repr2(sub_annots.get('bbox', None))) [ [350, 5, 130, 290], None, None, ]
- images(self, gids=None, vidid=None)[source]¶
Return vectorized image objects
- Parameters
gids (List[int]) – image ids to reference, if unspecified all images are returned.
vidid (int) – returns all images that belong to this video id. mutually exclusive with gids arg.
- Returns
vectorized image object
- Return type
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> images = self.images() >>> print(images) <Images(num=3)>
>>> self = kwcoco.CocoDataset.demo('vidshapes2') >>> vidid = 1 >>> images = self.images(vidid=vidid) >>> assert all(v == vidid for v in images.lookup('video_id')) >>> print(images) <Images(num=2)>
- categories(self, cids=None)[source]¶
Return vectorized category objects
- Parameters
cids (List[int]) – category ids to reference, if unspecified all categories are returned.
- Returns
vectorized category object
- Return type
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> categories = self.categories() >>> print(categories) <Categories(num=8)>
- videos(self, vidids=None)[source]¶
Return vectorized video objects
- Parameters
vidids (List[int]) – video ids to reference, if unspecified all videos are returned.
- Returns
vectorized video object
- Return type
Todo
- [ ] This conflicts with what should be the property that
should redirect to
index.videos
, we should resolve this somehow. E.g. all other main members of the index (anns, imgs, cats) have a toplevel dataset property, we don’t have one for videos because the name we would pick conflicts with this.
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('vidshapes2') >>> videos = self.videos() >>> print(videos) >>> videos.lookup('name') >>> videos.lookup('id') >>> print('videos.objs = {}'.format(ub.repr2(videos.objs[0:2], nl=1)))
- class kwcoco.coco_dataset.MixinCocoStats[source]¶
Bases:
object
Methods for getting stats about the dataset
- keypoint_annotation_frequency(self)[source]¶
DEPRECATED
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo('shapes', rng=0) >>> hist = self.keypoint_annotation_frequency() >>> hist = ub.odict(sorted(hist.items())) >>> # FIXME: for whatever reason demodata generation is not determenistic when seeded >>> print(ub.repr2(hist)) # xdoc: +IGNORE_WANT { 'bot_tip': 6, 'left_eye': 14, 'mid_tip': 6, 'right_eye': 14, 'top_tip': 6, }
- category_annotation_frequency(self)[source]¶
Reports the number of annotations of each category
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> hist = self.category_annotation_frequency() >>> print(ub.repr2(hist)) { 'astroturf': 0, 'human': 0, 'astronaut': 1, 'astronomer': 1, 'helmet': 1, 'rocket': 1, 'mouth': 2, 'star': 5, }
- category_annotation_type_frequency(self)[source]¶
DEPRECATED
Reports the number of annotations of each type for each category
Example
>>> self = CocoDataset.demo() >>> hist = self.category_annotation_frequency() >>> print(ub.repr2(hist))
- conform(self, **config)[source]¶
Make the COCO file conform a stricter spec, infers attibutes where possible.
Corresponds to the
kwcoco conform
CLI tool.- KWArgs:
- **config :
pycocotools_info (default=True): returns info required by pycocotools ensure_imgsize (default=True): ensure image size is populated legacy (default=False): if true tries to convert data
structures to items compatible with the original pycocotools spec
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo('shapes8') >>> dset.index.imgs[1].pop('width') >>> dset.conform(legacy=True) >>> assert 'width' in dset.index.imgs[1] >>> assert 'area' in dset.index.anns[1]
- validate(self, **config)[source]¶
Performs checks on this coco dataset.
Corresponds to the
kwcoco validate
CLI tool.- KWArgs:
- **config :
schema (default=True): validates the json-schema unique (default=True): validates unique secondary keys missing (default=True): validates registered files exist corrupted (default=False): validates data in registered files verbose (default=1): verbosity flag fastfail (default=False): if True raise errors immediately
- Returns
- result containing keys:
status (bool): False if any errors occurred errors (List[str]): list of all error messages missing (List): List of any missing images corrupted (List): List of any corrupted images
- Return type
- SeeAlso:
_check_integrity()
- performs internal checks
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> import pytest >>> with pytest.warns(UserWarning): >>> result = self.validate() >>> assert not result['errors'] >>> assert result['warnings']
- stats(self, **kwargs)[source]¶
Compute summary statistics to describe the dataset at a high level
This function corresponds to
kwcoco.cli.coco_stats
.- KWargs:
basic(bool, default=True): return basic stats’ extended(bool, default=True): return extended stats’ catfreq(bool, default=True): return category frequency stats’ boxes(bool, default=False): return bounding box stats’
annot_attrs(bool, default=True): return annotation attribute information’ image_attrs(bool, default=True): return image attribute information’
- Returns
info
- Return type
- basic_stats(self)[source]¶
Reports number of images, annotations, and categories.
- SeeAlso:
kwcoco.coco_dataset.MixinCocoStats.basic_stats()
kwcoco.coco_dataset.MixinCocoStats.extended_stats()
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> print(ub.repr2(self.basic_stats())) { 'n_anns': 11, 'n_imgs': 3, 'n_videos': 0, 'n_cats': 8, }
>>> from kwcoco.demo.toydata import * # NOQA >>> dset = random_video_dset(render=True, num_frames=2, num_tracks=10, rng=0) >>> print(ub.repr2(dset.basic_stats())) { 'n_anns': 20, 'n_imgs': 2, 'n_videos': 1, 'n_cats': 3, }
- extended_stats(self)[source]¶
Reports number of images, annotations, and categories.
- SeeAlso:
kwcoco.coco_dataset.MixinCocoStats.basic_stats()
kwcoco.coco_dataset.MixinCocoStats.extended_stats()
Example
>>> self = CocoDataset.demo() >>> print(ub.repr2(self.extended_stats()))
- boxsize_stats(self, anchors=None, perclass=True, gids=None, aids=None, verbose=0, clusterkw={}, statskw={})[source]¶
Compute statistics about bounding box sizes.
Also computes anchor boxes using kmeans if
anchors
is specified.- Parameters
anchors (int) – if specified also computes box anchors via KMeans clustering
perclass (bool) – if True also computes stats for each category
gids (List[int], default=None) – if specified only compute stats for these image ids.
aids (List[int], default=None) – if specified only compute stats for these annotation ids.
verbose (int) – verbosity level
clusterkw (dict) – kwargs for
sklearn.cluster.KMeans
used if computing anchors.statskw (dict) – kwargs for
kwarray.stats_dict()
- Returns
Stats are returned in width-height format.
- Return type
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes32') >>> infos = self.boxsize_stats(anchors=4, perclass=False) >>> print(ub.repr2(infos, nl=-1, precision=2))
>>> infos = self.boxsize_stats(gids=[1], statskw=dict(median=True)) >>> print(ub.repr2(infos, nl=-1, precision=2))
- find_representative_images(self, gids=None)[source]¶
Find images that have a wide array of categories.
Attempt to find the fewest images that cover all categories using images that contain both a large and small number of annotations.
- Parameters
gids (None | List) – Subset of image ids to consider when finding representative images. Uses all images if unspecified.
- Returns
list of image ids determined to be representative
- Return type
List
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> gids = self.find_representative_images() >>> print('gids = {!r}'.format(gids)) >>> gids = self.find_representative_images([3]) >>> print('gids = {!r}'.format(gids))
>>> self = kwcoco.CocoDataset.demo('shapes8') >>> gids = self.find_representative_images() >>> print('gids = {!r}'.format(gids)) >>> valid = {7, 1} >>> gids = self.find_representative_images(valid) >>> assert valid.issuperset(gids) >>> print('gids = {!r}'.format(gids))
- class kwcoco.coco_dataset._NextId(parent)[source]¶
Bases:
object
Helper class to tracks unused ids for new items
- class kwcoco.coco_dataset._ID_Remapper(reuse=False)[source]¶
Bases:
object
Helper to recycle ids for unions.
For each dataset we create a mapping between each old id and a new id. If possible and reuse=True we allow the new id to match the old id. After each dataset is finished we mark all those ids as used and subsequent new-ids cannot be chosen from that pool.
- Parameters
reuse (bool) – if True we are allowed to reuse ids as long as they haven’t been used before.
Example
>>> video_trackids = [[1, 1, 3, 3, 200, 4], [204, 1, 2, 3, 3, 4, 5, 9]] >>> self = _ID_Remapper(reuse=True) >>> for tids in video_trackids: >>> new_tids = [self.remap(old_tid) for old_tid in tids] >>> self.block_seen() >>> print('new_tids = {!r}'.format(new_tids)) new_tids = [1, 1, 3, 3, 200, 4] new_tids = [204, 205, 2, 206, 206, 207, 5, 9] >>> # >>> self = _ID_Remapper(reuse=False) >>> for tids in video_trackids: >>> new_tids = [self.remap(old_tid) for old_tid in tids] >>> self.block_seen() >>> print('new_tids = {!r}'.format(new_tids)) new_tids = [0, 0, 1, 1, 2, 3] new_tids = [4, 5, 6, 7, 7, 8, 9, 10]
- remap(self, old_id)[source]¶
Convert a old-id into a new-id. If self.reuse is True then we will return the same id if it hasn’t been blocked yet.
- class kwcoco.coco_dataset.UniqueNameRemapper[source]¶
Bases:
object
helper to ensure names will be unique by appending suffixes
Example
>>> from kwcoco.coco_dataset import * # NOQA >>> self = UniqueNameRemapper() >>> assert self.remap('foo') == 'foo' >>> assert self.remap('foo') == 'foo_v001' >>> assert self.remap('foo') == 'foo_v002' >>> assert self.remap('foo_v001') == 'foo_v003'
- class kwcoco.coco_dataset.MixinCocoDraw[source]¶
Bases:
object
Matplotlib / display functionality
- draw_image(self, gid, channels=None)[source]¶
Use kwimage to draw all annotations on an image and return the pixels as a numpy array.
- Parameters
gid (int) – image id to draw
channels (ChannelSpec) – the channel to draw on
- Returns
canvas
- Return type
ndarray
- SeeAlso
kwcoco.coco_dataset.MixinCocoDraw.draw_image()
kwcoco.coco_dataset.MixinCocoDraw.show_image()
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes8') >>> self.draw_image(1) >>> # Now you can dump the annotated image to disk / whatever >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas)
- show_image(self, gid=None, aids=None, aid=None, channels=None, **kwargs)[source]¶
Use matplotlib to show an image with annotations overlaid
- Parameters
gid (int) – image to show
aids (list) – aids to highlight within the image
aid (int) – a specific aid to focus on. If gid is not give, look up gid based on this aid.
**kwargs – show_annots, show_aid, show_catname, show_kpname, show_segmentation, title, show_gid, show_filename, show_boxes,
- class kwcoco.coco_dataset.MixinCocoAddRemove[source]¶
Bases:
object
Mixin functions to dynamically add / remove annotations images and categories while maintaining lookup indexes.
- add_video(self, name, id=None, **kw)[source]¶
Register a new video with the dataset
- Parameters
name (str) – Unique name for this video.
id (None or int) – ADVANCED. Force using this image id.
**kw – stores arbitrary key/value pairs in this new video
- Returns
the video id assigned to the new video
- Return type
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset() >>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1))) >>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1))) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> vidid1 = self.add_video('foo', id=3) >>> vidid2 = self.add_video('bar') >>> vidid3 = self.add_video('baz') >>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1))) >>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1))) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> gid1 = self.add_image('foo1.jpg', video_id=vidid1, frame_index=0) >>> gid2 = self.add_image('foo2.jpg', video_id=vidid1, frame_index=1) >>> gid3 = self.add_image('foo3.jpg', video_id=vidid1, frame_index=2) >>> gid4 = self.add_image('bar1.jpg', video_id=vidid2, frame_index=0) >>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1))) >>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1))) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> self.remove_images([gid2]) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
- add_image(self, file_name=None, id=None, **kw)[source]¶
Register a new image with the dataset
- Parameters
file_name (str) – relative or absolute path to image
id (None or int) – ADVANCED. Force using this image id.
name (str) – a unique key to identify this image
width (int) – base width of the image
height (int) – base height of the image
channels (ChannelSpec) – specification of base channels
auxiliary (List[Dict]) – specification of auxiliary information
video_id (int) – parent video, if applicable
frame_index (int) – frame index in parent video
timestamp (number | str) – timestamp of frame index
**kw – stores arbitrary key/value pairs in this new image
- Returns
the image id assigned to the new image
- Return type
- SeeAlso:
Example
>>> self = CocoDataset.demo() >>> import kwimage >>> gname = kwimage.grab_test_image_fpath('paraview') >>> gid = self.add_image(gname) >>> assert self.imgs[gid]['file_name'] == gname
- add_annotation(self, image_id, category_id=None, bbox=ub.NoParam, segmentation=ub.NoParam, keypoints=ub.NoParam, id=None, **kw)[source]¶
Register a new annotation with the dataset
- Parameters
image_id (int) – image_id the annoatation is added to.
category_id (int | None) – category_id for the new annotaiton
bbox (list | kwimage.Boxes) – bounding box in xywh format
segmentation (MaskLike | MultiPolygonLike) – keypoints in some accepted format, see
kwimage.Mask.to_coco()
andkwimage.MultiPolygon.to_coco()
.keypoints (KeypointsLike) – keypoints in some accepted format, see
kwimage.Keypoints.to_coco()
.id (None | int) – Force using this annotation id. Typically you should NOT specify this. A new unused id will be chosen and returned.
**kw – stores arbitrary key/value pairs in this new image, Common respected key/values include but are not limited to the following:
- track_id (int | str): some value used to associate annotations
that belong to the same “track”.
score : float
prob : List[float]
- weight (float): a weight, usually used to indicate if a ground
truth annotation is difficult / important. This generalizes standard “is_hard” or “ignore” attributes in other formats.
caption (str): a text caption for this annotation
- Returns
the annotation id assigned to the new annotation
- Return type
- SeeAlso:
kwcoco.coco_dataset.MixinCocoAddRemove.add_annotation()
kwcoco.coco_dataset.MixinCocoAddRemove.add_annotations()
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> image_id = 1 >>> cid = 1 >>> bbox = [10, 10, 20, 20] >>> aid = self.add_annotation(image_id, cid, bbox) >>> assert self.anns[aid]['bbox'] == bbox
Example
>>> import kwimage >>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> new_det = kwimage.Detections.random(1, segmentations=True, keypoints=True) >>> # kwimage datastructures have methods to convert to coco recognized formats >>> new_ann_data = list(new_det.to_coco(style='new'))[0] >>> image_id = 1 >>> aid = self.add_annotation(image_id, **new_ann_data) >>> # Lookup the annotation we just added >>> ann = self.index.anns[aid] >>> print('ann = {}'.format(ub.repr2(ann, nl=-2)))
Example
>>> # Attempt to add annot without a category or bbox >>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> image_id = 1 >>> aid = self.add_annotation(image_id) >>> assert None in self.index.cid_to_aids
Example
>>> # Attempt to add annot using various styles of kwimage structures >>> import kwcoco >>> import kwimage >>> self = kwcoco.CocoDataset.demo() >>> image_id = 1 >>> #-- >>> kw = {} >>> kw['segmentation'] = kwimage.Polygon.random() >>> kw['keypoints'] = kwimage.Points.random() >>> aid = self.add_annotation(image_id, **kw) >>> ann = self.index.anns[aid] >>> print('ann = {}'.format(ub.repr2(ann, nl=2))) >>> #-- >>> kw = {} >>> kw['segmentation'] = kwimage.Mask.random() >>> aid = self.add_annotation(image_id, **kw) >>> ann = self.index.anns[aid] >>> assert ann.get('segmentation', None) is not None >>> print('ann = {}'.format(ub.repr2(ann, nl=2))) >>> #-- >>> kw = {} >>> kw['segmentation'] = kwimage.Mask.random().to_array_rle() >>> aid = self.add_annotation(image_id, **kw) >>> ann = self.index.anns[aid] >>> assert ann.get('segmentation', None) is not None >>> print('ann = {}'.format(ub.repr2(ann, nl=2))) >>> #-- >>> kw = {} >>> kw['segmentation'] = kwimage.Polygon.random().to_coco() >>> kw['keypoints'] = kwimage.Points.random().to_coco() >>> aid = self.add_annotation(image_id, **kw) >>> ann = self.index.anns[aid] >>> assert ann.get('segmentation', None) is not None >>> assert ann.get('keypoints', None) is not None >>> print('ann = {}'.format(ub.repr2(ann, nl=2)))
- add_category(self, name, supercategory=None, id=None, **kw)[source]¶
Register a new category with the dataset
- Parameters
name (str) – name of the new category
supercategory (str, optional) – parent of this category
id (int, optional) – use this category id, if it was not taken
**kw – stores arbitrary key/value pairs in this new image
- Returns
the category id assigned to the new category
- Return type
- SeeAlso:
kwcoco.coco_dataset.MixinCocoAddRemove.add_category()
kwcoco.coco_dataset.MixinCocoAddRemove.ensure_category()
Example
>>> self = CocoDataset.demo() >>> prev_n_cats = self.n_cats >>> cid = self.add_category('dog', supercategory='object') >>> assert self.cats[cid]['name'] == 'dog' >>> assert self.n_cats == prev_n_cats + 1 >>> import pytest >>> with pytest.raises(ValueError): >>> self.add_category('dog', supercategory='object')
- ensure_image(self, file_name, id=None, **kw)[source]¶
Register an image if it is new or returns an existing id.
Like
kwcoco.coco_dataset.MixinCocoAddRemove.add_image()
, but returns the existing image id if it already exists instead of failing. In this case all metadata is ignored.- Parameters
file_name (str) – relative or absolute path to image
id (None or int) – ADVANCED. Force using this image id.
**kw – stores arbitrary key/value pairs in this new image
- Returns
the existing or new image id
- Return type
- ensure_category(self, name, supercategory=None, id=None, **kw)[source]¶
Register a category if it is new or returns an existing id.
Like
kwcoco.coco_dataset.MixinCocoAddRemove.add_category()
, but returns the existing category id if it already exists instead of failing. In this case all metadata is ignored.- Returns
the existing or new category id
- Return type
- add_annotations(self, anns)[source]¶
Faster less-safe multi-item alternative to add_annotation.
We assume the annotations are well formatted in kwcoco compliant dictionaries, including the “id” field. No validation checks are made when calling this function.
- Parameters
anns (List[Dict]) – list of annotation dictionaries
- SeeAlso:
Example
>>> self = CocoDataset.demo() >>> anns = [self.anns[aid] for aid in [2, 3, 5, 7]] >>> self.remove_annotations(anns) >>> assert self.n_annots == 7 and self._check_index() >>> self.add_annotations(anns) >>> assert self.n_annots == 11 and self._check_index()
- add_images(self, imgs)[source]¶
Faster less-safe multi-item alternative
We assume the images are well formatted in kwcoco compliant dictionaries, including the “id” field. No validation checks are made when calling this function.
Note
THIS FUNCTION WAS DESIGNED FOR SPEED, AS SUCH IT DOES NOT CHECK IF THE IMAGE-IDs or FILE_NAMES ARE DUPLICATED AND WILL BLINDLY ADD DATA EVEN IF IT IS BAD. THE SINGLE IMAGE VERSION IS SLOWER BUT SAFER.
- Parameters
imgs (List[Dict]) – list of image dictionaries
- SeeAlso:
kwcoco.coco_dataset.MixinCocoAddRemove.add_image()
kwcoco.coco_dataset.MixinCocoAddRemove.add_images()
kwcoco.coco_dataset.MixinCocoAddRemove.ensure_image()
Example
>>> imgs = CocoDataset.demo().dataset['images'] >>> self = CocoDataset() >>> self.add_images(imgs) >>> assert self.n_images == 3 and self._check_index()
- clear_images(self)[source]¶
Removes all images and annotations (but not categories)
Example
>>> self = CocoDataset.demo() >>> self.clear_images() >>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1)) n_anns: 0, n_imgs: 0, n_videos: 0, n_cats: 8
- clear_annotations(self)[source]¶
Removes all annotations (but not images and categories)
Example
>>> self = CocoDataset.demo() >>> self.clear_annotations() >>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1)) n_anns: 0, n_imgs: 3, n_videos: 0, n_cats: 8
- remove_annotation(self, aid_or_ann)[source]¶
Remove a single annotation from the dataset
If you have multiple annotations to remove its more efficient to remove them in batch with
kwcoco.coco_dataset.MixinCocoAddRemove.remove_annotations()
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]] >>> self.remove_annotations(aids_or_anns) >>> assert len(self.dataset['annotations']) == 7 >>> self._check_index()
- remove_annotations(self, aids_or_anns, verbose=0, safe=True)[source]¶
Remove multiple annotations from the dataset.
- Parameters
anns_or_aids (List) – list of annotation dicts or ids
safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
- Returns
num_removed: information on the number of items removed
- Return type
Dict
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> prev_n_annots = self.n_annots >>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]] >>> self.remove_annotations(aids_or_anns) # xdoc: +IGNORE_WANT {'annotations': 4} >>> assert len(self.dataset['annotations']) == prev_n_annots - 4 >>> self._check_index()
- remove_categories(self, cat_identifiers, keep_annots=False, verbose=0, safe=True)[source]¶
Remove categories and all annotations in those categories.
Currently does not change any hierarchy information
- Parameters
cat_identifiers (List) – list of category dicts, names, or ids
keep_annots (bool, default=False) – if True, keeps annotations, but removes category labels.
safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
- Returns
num_removed: information on the number of items removed
- Return type
Dict
Example
>>> self = CocoDataset.demo() >>> cat_identifiers = [self.cats[1], 'rocket', 3] >>> self.remove_categories(cat_identifiers) >>> assert len(self.dataset['categories']) == 5 >>> self._check_index()
- remove_images(self, gids_or_imgs, verbose=0, safe=True)[source]¶
Remove images and any annotations contained by them
- Parameters
gids_or_imgs (List) – list of image dicts, names, or ids
safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
- Returns
num_removed: information on the number of items removed
- Return type
Dict
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> assert len(self.dataset['images']) == 3 >>> gids_or_imgs = [self.imgs[2], 'astro.png'] >>> self.remove_images(gids_or_imgs) # xdoc: +IGNORE_WANT {'annotations': 11, 'images': 2} >>> assert len(self.dataset['images']) == 1 >>> self._check_index() >>> gids_or_imgs = [3] >>> self.remove_images(gids_or_imgs) >>> assert len(self.dataset['images']) == 0 >>> self._check_index()
- remove_videos(self, vidids_or_videos, verbose=0, safe=True)[source]¶
Remove videos and any images / annotations contained by them
- Parameters
vidids_or_videos (List) – list of video dicts, names, or ids
safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
- Returns
num_removed: information on the number of items removed
- Return type
Dict
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo('vidshapes8') >>> assert len(self.dataset['videos']) == 8 >>> vidids_or_videos = [self.dataset['videos'][0]['id']] >>> self.remove_videos(vidids_or_videos) # xdoc: +IGNORE_WANT {'annotations': 4, 'images': 2, 'videos': 1} >>> assert len(self.dataset['videos']) == 7 >>> self._check_index()
- remove_annotation_keypoints(self, kp_identifiers)[source]¶
Removes all keypoints with a particular category
- Parameters
kp_identifiers (List) – list of keypoint category dicts, names, or ids
- Returns
num_removed: information on the number of items removed
- Return type
Dict
- remove_keypoint_categories(self, kp_identifiers)[source]¶
Removes all keypoints of a particular category as well as all annotation keypoints with those ids.
- Parameters
kp_identifiers (List) – list of keypoint category dicts, names, or ids
- Returns
num_removed: information on the number of items removed
- Return type
Dict
Example
>>> self = CocoDataset.demo('shapes', rng=0) >>> kp_identifiers = ['left_eye', 'mid_tip'] >>> remove_info = self.remove_keypoint_categories(kp_identifiers) >>> print('remove_info = {!r}'.format(remove_info)) >>> # FIXME: for whatever reason demodata generation is not determenistic when seeded >>> # assert remove_info == {'keypoint_categories': 2, 'annotation_keypoints': 16, 'reflection_ids': 1} >>> assert self._resolve_to_kpcat('right_eye')['reflection_id'] is None
- set_annotation_category(self, aid_or_ann, cid_or_cat)[source]¶
Sets the category of a single annotation
- Parameters
aid_or_ann (dict | int) – annotation dict or id
cid_or_cat (dict | int) – category dict or id
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> old_freq = self.category_annotation_frequency() >>> aid_or_ann = aid = 2 >>> cid_or_cat = new_cid = self.ensure_category('kitten') >>> self.set_annotation_category(aid, new_cid) >>> new_freq = self.category_annotation_frequency() >>> print('new_freq = {}'.format(ub.repr2(new_freq, nl=1))) >>> print('old_freq = {}'.format(ub.repr2(old_freq, nl=1))) >>> assert sum(new_freq.values()) == sum(old_freq.values()) >>> assert new_freq['kitten'] == 1
- class kwcoco.coco_dataset.SortedSetQuiet(iterable=None, key=None)[source]¶
Bases:
sortedcontainers.SortedSet
Sorted set is a sorted mutable set.
Sorted set values are maintained in sorted order. The design of sorted set is simple: sorted set uses a set for set-operations and maintains a sorted list of values.
Sorted set values must be hashable and comparable. The hash and total ordering of values must not change while they are stored in the sorted set.
Mutable set methods:
SortedSet.__contains__()
SortedSet.__iter__()
SortedSet.__len__()
SortedSet.add()
SortedSet.discard()
Sequence methods:
SortedSet.__getitem__()
SortedSet.__delitem__()
SortedSet.__reversed__()
Methods for removing values:
SortedSet.clear()
SortedSet.pop()
SortedSet.remove()
Set-operation methods:
SortedSet.difference()
SortedSet.difference_update()
SortedSet.intersection()
SortedSet.intersection_update()
SortedSet.symmetric_difference()
SortedSet.symmetric_difference_update()
SortedSet.union()
SortedSet.update()
Methods for miscellany:
SortedSet.copy()
SortedSet.count()
SortedSet.__repr__()
SortedSet._check()
Sorted list methods available:
SortedList.bisect_left()
SortedList.bisect_right()
SortedList.index()
SortedList.irange()
SortedList.islice()
SortedList._reset()
Additional sorted list methods available, if key-function used:
SortedKeyList.bisect_key_left()
SortedKeyList.bisect_key_right()
SortedKeyList.irange_key()
Sorted set comparisons use subset and superset relations. Two sorted sets are equal if and only if every element of each sorted set is contained in the other (each is a subset of the other). A sorted set is less than another sorted set if and only if the first sorted set is a proper subset of the second sorted set (is a subset, but is not equal). A sorted set is greater than another sorted set if and only if the first sorted set is a proper superset of the second sorted set (is a superset, but is not equal).
- class kwcoco.coco_dataset.CocoIndex(index)[source]¶
Bases:
object
Fast lookup index for the COCO dataset with dynamic modification
- Variables
imgs (Dict[int, dict]) – mapping between image ids and the image dictionaries
anns (Dict[int, dict]) – mapping between annotation ids and the annotation dictionaries
cats (Dict[int, dict]) – mapping between category ids and the category dictionaries
kpcats (Dict[int, dict]) – mapping between keypoint category ids and keypoint category dictionaries
gid_to_aids (Dict[int, List[int]]) – mapping between an image-id and annotation-ids that belong to it
cid_to_aids (Dict[int, List[int]]) – mapping between an category-id and annotation-ids that belong to it
cid_to_gids (Dict[int, List[int]]) – mapping between an category-id and image-ids that contain at least one annotation with this cateogry id.
trackid_to_aids (Dict[int, List[int]]) – mapping between a track-id and annotation-ids that belong to it
vidid_to_gids (Dict[int, List[int]]) – mapping between an video-id and images-ids that belong to it
name_to_video (Dict[str, dict]) – mapping between a video name and the video dictionary.
name_to_cat (Dict[str, dict]) – mapping between a category name and the category dictionary.
name_to_img (Dict[str, dict]) – mapping between a image name and the image dictionary.
file_name_to_img (Dict[str, dict]) – mapping between a image file_name and the image dictionary.
- _set_sorted_by_frame_index(index, gids=None)[source]¶
Helper for ensuring that vidid_to_gids returns image ids ordered by frame index.
- property cid_to_gids(index)[source]¶
Example
>>> import kwcoco >>> self = dset = kwcoco.CocoDataset() >>> self.index.cid_to_gids
- _add_image(index, gid, img)[source]¶
Example
>>> # Test adding image to video that doesnt exist >>> import kwcoco >>> self = dset = kwcoco.CocoDataset() >>> dset.add_image(file_name='frame1', video_id=1, frame_index=0) >>> dset.add_image(file_name='frame2', video_id=1, frame_index=0) >>> dset._check_pointers() >>> dset._check_index() >>> print('dset.index.vidid_to_gids = {!r}'.format(dset.index.vidid_to_gids)) >>> assert len(dset.index.vidid_to_gids) == 1 >>> dset.add_video(name='foo-vid', id=1) >>> assert len(dset.index.vidid_to_gids) == 1 >>> dset._check_pointers() >>> dset._check_index()
- _add_images(index, imgs)[source]¶
See ../dev/bench/bench_add_image_check.py
Note
THIS FUNCTION WAS DESIGNED FOR SPEED, AS SUCH IT DOES NOT CHECK IF THE IMAGE-IDs or FILE_NAMES ARE DUPLICATED AND WILL BLINDLY ADD DATA EVEN IF IT IS BAD. THE SINGLE IMAGE VERSION IS SLOWER BUT SAFER.
- build(index, parent)[source]¶
Build all id-to-obj reverse indexes from scratch.
- Parameters
parent (CocoDataset) – the dataset to index
- Notation:
aid - Annotation ID gid - imaGe ID cid - Category ID vidid - Video ID
Example
>>> from kwcoco.demo.toydata import * # NOQA >>> parent = CocoDataset.demo('vidshapes1', num_frames=4, rng=1) >>> index = parent.index >>> index.build(parent)
- class kwcoco.coco_dataset.MixinCocoIndex[source]¶
Bases:
object
Give the dataset top level access to index attributes
- class kwcoco.coco_dataset.CocoDataset(data=None, tag=None, bundle_dpath=None, img_root=None, fname=None, autobuild=True)[source]¶
Bases:
kwcoco.abstract_coco_dataset.AbstractCocoDataset
,MixinCocoAddRemove
,MixinCocoStats
,MixinCocoObjects
,MixinCocoDraw
,MixinCocoAccessors
,MixinCocoExtras
,MixinCocoIndex
,MixinCocoDepricate
,ubelt.NiceRepr
The main coco dataset class with a json dataset backend.
- Variables
dataset (Dict) – raw json data structure. This is the base dictionary that contains {‘annotations’: List, ‘images’: List, ‘categories’: List}
index (CocoIndex) – an efficient lookup index into the coco data structure. The index defines its own attributes like
anns
,cats
,imgs
,gid_to_aids
,file_name_to_img
, etc. SeeCocoIndex
for more details on which attributes are available.fpath (PathLike | None) – if known, this stores the filepath the dataset was loaded from
tag (str) – A tag indicating the name of the dataset.
bundle_dpath (PathLike | None) – If known, this is the root path that all image file names are relative to. This can also be manually overwritten by the user.
hashid (str | None) – If computed, this will be a hash uniquely identifing the dataset. To ensure this is computed see
kwcoco.coco_dataset.MixinCocoExtras._build_hashid()
.
References
http://cocodataset.org/#format http://cocodataset.org/#download
CommandLine
python -m kwcoco.coco_dataset CocoDataset --show
Example
>>> from kwcoco.coco_dataset import demo_coco_data >>> import kwcoco >>> import ubelt as ub >>> # Returns a coco json structure >>> dataset = demo_coco_data() >>> # Pass the coco json structure to the API >>> self = kwcoco.CocoDataset(dataset, tag='demo') >>> # Now you can access the data using the index and helper methods >>> # >>> # Start by looking up an image by it's COCO id. >>> image_id = 1 >>> img = self.index.imgs[image_id] >>> print(ub.repr2(img, nl=1)) { 'file_name': 'astro.png', 'id': 1, 'url': 'https://i.imgur.com/KXhKM72.png', } >>> # >>> # Use the (gid_to_aids) index to lookup annotations in the iamge >>> annotation_id = sorted(self.index.gid_to_aids[image_id])[0] >>> ann = self.index.anns[annotation_id] >>> print(ub.repr2(ub.dict_diff(ann, {'segmentation'}), nl=1)) { 'bbox': [10, 10, 360, 490], 'category_id': 1, 'id': 1, 'image_id': 1, 'keypoints': [247, 101, 2, 202, 100, 2], } >>> # >>> # Use annotation category id to look up that information >>> category_id = ann['category_id'] >>> cat = self.index.cats[category_id] >>> print('cat = {}'.format(ub.repr2(cat, nl=1))) cat = { 'id': 1, 'name': 'astronaut', 'supercategory': 'human', } >>> # >>> # Now play with some helper functions, like extended statistics >>> extended_stats = self.extended_stats() >>> print('extended_stats = {}'.format(ub.repr2(extended_stats, nl=1, precision=2))) extended_stats = { 'annots_per_img': {'mean': 3.67, 'std': 3.86, 'min': 0.00, 'max': 9.00, 'nMin': 1, 'nMax': 1, 'shape': (3,)}, 'imgs_per_cat': {'mean': 0.88, 'std': 0.60, 'min': 0.00, 'max': 2.00, 'nMin': 2, 'nMax': 1, 'shape': (8,)}, 'cats_per_img': {'mean': 2.33, 'std': 2.05, 'min': 0.00, 'max': 5.00, 'nMin': 1, 'nMax': 1, 'shape': (3,)}, 'annots_per_cat': {'mean': 1.38, 'std': 1.49, 'min': 0.00, 'max': 5.00, 'nMin': 2, 'nMax': 1, 'shape': (8,)}, 'imgs_per_video': {'empty_list': True}, } >>> # You can "draw" a raster of the annotated image with cv2 >>> canvas = self.draw_image(2) >>> # Or if you have matplotlib you can "show" the image with mpl objects >>> # xdoctest: +REQUIRES(--show) >>> from matplotlib import pyplot as plt >>> fig = plt.figure() >>> ax1 = fig.add_subplot(1, 2, 1) >>> self.show_image(gid=2) >>> ax2 = fig.add_subplot(1, 2, 2) >>> ax2.imshow(canvas) >>> ax1.set_title('show with matplotlib') >>> ax2.set_title('draw with cv2') >>> plt.show()
- classmethod from_data(CocoDataset, data, bundle_dpath=None, img_root=None)[source]¶
Constructor from a json dictionary
- classmethod from_image_paths(CocoDataset, gpaths, bundle_dpath=None, img_root=None)[source]¶
Constructor from a list of images paths.
This is a convinience method.
- Parameters
gpaths (List[str]) – list of image paths
Example
>>> coco_dset = CocoDataset.from_image_paths(['a.png', 'b.png']) >>> assert coco_dset.n_images == 2
- classmethod from_coco_paths(CocoDataset, fpaths, max_workers=0, verbose=1, mode='thread', union='try')[source]¶
Constructor from multiple coco file paths.
Loads multiple coco datasets and unions the result
Note
if the union operation fails, the list of individually loaded files is returned instead.
- Parameters
fpaths (List[str]) – list of paths to multiple coco files to be loaded and unioned.
max_workers (int, default=0) – number of worker threads / processes
verbose (int) – verbosity level
mode (str) – thread, process, or serial
union (str | bool, default=’try’) – If True, unions the result datasets after loading. If False, just returns the result list. If ‘try’, then try to preform the union, but return the result list if it fails.
- copy(self)[source]¶
Deep copies this object
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> new = self.copy() >>> assert new.imgs[1] is new.dataset['images'][0] >>> assert new.imgs[1] == self.dataset['images'][0] >>> assert new.imgs[1] is not self.dataset['images'][0]
- dumps(self, indent=None, newlines=False)[source]¶
Writes the dataset out to the json format
- Parameters
newlines (bool) – if True, each annotation, image, category gets its own line
Note
- Using newlines=True is similar to:
print(ub.repr2(dset.dataset, nl=2, trailsep=False)) However, the above may not output valid json if it contains ndarrays.
Example
>>> from kwcoco.coco_dataset import * >>> import json >>> self = CocoDataset.demo() >>> text = self.dumps(newlines=True) >>> print(text) >>> self2 = CocoDataset(json.loads(text), tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
>>> text = self.dumps(newlines=True) >>> print(text) >>> self2 = CocoDataset(json.loads(text), tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
- dump(self, file, indent=None, newlines=False)[source]¶
Writes the dataset out to the json format
- Parameters
file (PathLike | FileLike) – Where to write the data. Can either be a path to a file or an open file pointer / stream.
newlines (bool) – if True, each annotation, image, category gets its own line.
Example
>>> import tempfile >>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> file = tempfile.NamedTemporaryFile('w') >>> self.dump(file) >>> file.seek(0) >>> text = open(file.name, 'r').read() >>> print(text) >>> file.seek(0) >>> dataset = json.load(open(file.name, 'r')) >>> self2 = CocoDataset(dataset, tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
>>> file = tempfile.NamedTemporaryFile('w') >>> self.dump(file, newlines=True) >>> file.seek(0) >>> text = open(file.name, 'r').read() >>> print(text) >>> file.seek(0) >>> dataset = json.load(open(file.name, 'r')) >>> self2 = CocoDataset(dataset, tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
- _check_json_serializable(self, verbose=1)[source]¶
Debug which part of a coco dataset might not be json serializable
- _check_index(self)[source]¶
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> self._check_index() >>> # Force a failure >>> self.index.anns.pop(1) >>> self.index.anns.pop(2) >>> import pytest >>> with pytest.raises(AssertionError): >>> self._check_index()
- _check_pointers(self, verbose=1)[source]¶
Check that all category and image ids referenced by annotations exist
- union(*others, disjoint_tracks=True, **kwargs)[source]¶
Merges multiple
CocoDataset
items into one. Names and associations are retained, but ids may be different.- Parameters
*others – a series of CocoDatasets that we will merge. Note, if called as an instance method, the “self” instance will be the first item in the “others” list. But if called like a classmethod, “others” will be empty by default.
disjoint_tracks (bool, default=True) – if True, we will assume track-ids are disjoint and if two datasets share the same track-id, we will disambiguate them. Otherwise they will be copied over as-is.
**kwargs – constructor options for the new merged CocoDataset
- Returns
a new merged coco dataset
- Return type
CommandLine
xdoctest -m kwcoco.coco_dataset CocoDataset.union
Example
>>> # Test union works with different keypoint categories >>> dset1 = CocoDataset.demo('shapes1') >>> dset2 = CocoDataset.demo('shapes2') >>> dset1.remove_keypoint_categories(['bot_tip', 'mid_tip', 'right_eye']) >>> dset2.remove_keypoint_categories(['top_tip', 'left_eye']) >>> dset_12a = CocoDataset.union(dset1, dset2) >>> dset_12b = dset1.union(dset2) >>> dset_21 = dset2.union(dset1) >>> def add_hist(h1, h2): >>> return {k: h1.get(k, 0) + h2.get(k, 0) for k in set(h1) | set(h2)} >>> kpfreq1 = dset1.keypoint_annotation_frequency() >>> kpfreq2 = dset2.keypoint_annotation_frequency() >>> kpfreq_want = add_hist(kpfreq1, kpfreq2) >>> kpfreq_got1 = dset_12a.keypoint_annotation_frequency() >>> kpfreq_got2 = dset_12b.keypoint_annotation_frequency() >>> assert kpfreq_want == kpfreq_got1 >>> assert kpfreq_want == kpfreq_got2
>>> # Test disjoint gid datasets >>> import kwcoco >>> dset1 = kwcoco.CocoDataset.demo('shapes3') >>> for new_gid, img in enumerate(dset1.dataset['images'], start=10): >>> for aid in dset1.gid_to_aids[img['id']]: >>> dset1.anns[aid]['image_id'] = new_gid >>> img['id'] = new_gid >>> dset1.index.clear() >>> dset1._build_index() >>> # ------ >>> dset2 = kwcoco.CocoDataset.demo('shapes2') >>> for new_gid, img in enumerate(dset2.dataset['images'], start=100): >>> for aid in dset2.gid_to_aids[img['id']]: >>> dset2.anns[aid]['image_id'] = new_gid >>> img['id'] = new_gid >>> dset1.index.clear() >>> dset2._build_index() >>> others = [dset1, dset2] >>> merged = kwcoco.CocoDataset.union(*others) >>> print('merged = {!r}'.format(merged)) >>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1))) >>> assert set(merged.imgs) & set([10, 11, 12, 100, 101]) == set(merged.imgs)
>>> # Test data is not preserved >>> dset2 = kwcoco.CocoDataset.demo('shapes2') >>> dset1 = kwcoco.CocoDataset.demo('shapes3') >>> others = (dset1, dset2) >>> cls = self = kwcoco.CocoDataset >>> merged = cls.union(*others) >>> print('merged = {!r}'.format(merged)) >>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1))) >>> assert set(merged.imgs) & set([1, 2, 3, 4, 5]) == set(merged.imgs)
>>> # Test track-ids are mapped correctly >>> dset1 = kwcoco.CocoDataset.demo('vidshapes1') >>> dset2 = kwcoco.CocoDataset.demo('vidshapes2') >>> dset3 = kwcoco.CocoDataset.demo('vidshapes3') >>> others = (dset1, dset2, dset3) >>> for dset in others: >>> [a.pop('segmentation', None) for a in dset.index.anns.values()] >>> [a.pop('keypoints', None) for a in dset.index.anns.values()] >>> cls = self = kwcoco.CocoDataset >>> merged = cls.union(*others, disjoint_tracks=1) >>> print('dset1.anns = {}'.format(ub.repr2(dset1.anns, nl=1))) >>> print('dset2.anns = {}'.format(ub.repr2(dset2.anns, nl=1))) >>> print('dset3.anns = {}'.format(ub.repr2(dset3.anns, nl=1))) >>> print('merged.anns = {}'.format(ub.repr2(merged.anns, nl=1)))
Example
>>> import kwcoco >>> # Test empty union >>> empty_union = kwcoco.CocoDataset.union() >>> assert len(empty_union.index.imgs) == 0
Todo
[ ] are supercategories broken?
[ ] reuse image ids where possible
[ ] reuse annotation / category ids where possible
[X] handle case where no inputs are given
[x] disambiguate track-ids
[x] disambiguate video-ids
- subset(self, gids, copy=False, autobuild=True)[source]¶
Return a subset of the larger coco dataset by specifying which images to port. All annotations in those images will be taken.
- Parameters
gids (List[int]) – image-ids to copy into a new dataset
copy (bool, default=False) – if True, makes a deep copy of all nested attributes, otherwise makes a shallow copy.
autobuild (bool, default=True) – if True will automatically build the fast lookup index.
Example
>>> self = CocoDataset.demo() >>> gids = [1, 3] >>> sub_dset = self.subset(gids) >>> assert len(self.index.gid_to_aids) == 3 >>> assert len(sub_dset.gid_to_aids) == 2
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('vidshapes2') >>> gids = [1, 2] >>> sub_dset = self.subset(gids, copy=True) >>> assert len(sub_dset.index.videos) == 1 >>> assert len(self.index.videos) == 2
Example
>>> self = CocoDataset.demo() >>> sub1 = self.subset([1]) >>> sub2 = self.subset([2]) >>> sub3 = self.subset([3]) >>> others = [sub1, sub2, sub3] >>> rejoined = CocoDataset.union(*others) >>> assert len(sub1.anns) == 9 >>> assert len(sub2.anns) == 2 >>> assert len(sub3.anns) == 0 >>> assert rejoined.basic_stats() == self.basic_stats()
- view_sql(self, force_rewrite=False, memory=False)[source]¶
Create a cached SQL interface to this dataset suitable for large scale multiprocessing use cases.
- Parameters
force_rewrite (bool, default=False) – if True, forces an update to any existing cache file on disk
memory (bool, default=False) – if True, the database is constructed in memory.
Note
This view cache is experimental and currently depends on the timestamp of the file pointed to by
self.fpath
. In other words dont use this on in-memory datasets.
- kwcoco.coco_dataset._delitems(items, remove_idxs, thresh=750)[source]¶
- Parameters
items (List) – list which will be modified
remove_idxs (List[int]) – integers to remove (MUST BE UNIQUE)
- kwcoco.coco_dataset.demo_coco_data()[source]¶
Simple data for testing.
This contains several non-standard fields, which help ensure robustness of functions tested with this data. For more compliant demodata see the
kwcoco.demodata
submoduleExample
>>> # xdoctest: +REQUIRES(--show) >>> from kwcoco.coco_dataset import demo_coco_data, CocoDataset >>> dataset = demo_coco_data() >>> self = CocoDataset(dataset, tag='demo') >>> import kwplot >>> kwplot.autompl() >>> self.show_image(gid=1) >>> kwplot.show_if_requested()