kwcoco.coco_dataset module¶

An implementation and extension of the original MS-COCO API [1].

Extends the format to also include line annotations.

Dataset Spec:

Note: a formal spec has been defined in

category = {
    'id': int,
    'name': str,
    'supercategory': Optional[str],
    'keypoints': Optional(List[str]),
    'skeleton': Optional(List[Tuple[Int, Int]]),
}

image = {
    'id': int,
    'file_name': str
}

dataset = {
    # these are object level categories
    'categories': [category],
    'images': [image]
        ...
    ],
    'annotations': [
        {
            'id': Int,
            'image_id': Int,
            'category_id': Int,
            'track_id': Optional[Int],

            'bbox': [tl_x, tl_y, w, h],  # optional (xywh format)
            "score" : float,  # optional
            "prob" : List[float],  # optional
            "weight" : float,  # optional

            "caption": str,  # an optional text caption for this annotation
            "iscrowd" : <0 or 1>,  # denotes if the annotation covers a single object (0) or multiple objects (1)
            "keypoints" : [x1,y1,v1,...,xk,yk,vk], # or new dict-based format
            'segmentation': <RunLengthEncoding | Polygon>,  # formats are defined bellow
        },
        ...
    ],
    'licenses': [],
    'info': [],
}

Polygon:
    A flattned list of xy coordinates.
    [x1, y1, x2, y2, ..., xn, yn]

    or a list of flattned list of xy coordinates if the CCs are disjoint
    [[x1, y1, x2, y2, ..., xn, yn], [x1, y1, ..., xm, ym],]

    Note: the original coco spec does not allow for holes in polygons.

    We also allow a non-standard dictionary encoding of polygons
        {'exterior': [(x1, y1)...],
         'interiors': [[(x1, y1), ...], ...]}

RunLengthEncoding:
    The RLE can be in a special bytes encoding or in a binary array
    encoding. We reuse the original C functions are in [2]_ in
    ``kwimage.structs.Mask`` to provide a convinient way to abstract this
    rather esoteric bytes encoding.

    For pure python implementations see kwimage:
        Converting from an image to RLE can be done via kwimage.run_length_encoding
        Converting from RLE back to an image can be done via:
            kwimage.decode_run_length

        For compatibility with the COCO specs ensure the binary flags
        for these functions are set to true.

Keypoints:
    Annotation keypoints may also be specified in this non-standard (but
    ultimately more general) way:

    'annotations': [
        {
            'keypoints': [
                {
                    'xy': <x1, y1>,
                    'visible': <0 or 1 or 2>,
                    'keypoint_category_id': <kp_cid>,
                    'keypoint_category': <kp_name, optional>,  # this can be specified instead of an id
                }, ...
            ]
        }, ...
    ],
    'keypoint_categories': [{
        'name': <str>,
        'id': <int>,  # an id for this keypoint category
        'supercategory': <kp_name>  # name of coarser parent keypoint class (for hierarchical keypoints)
        'reflection_id': <kp_cid>  # specify only if the keypoint id would be swapped with another keypoint type
    },...
    ]

    In this scheme the "keypoints" property of each annotation (which used
    to be a list of floats) is now specified as a list of dictionaries that
    specify each keypoints location, id, and visibility explicitly. This
    allows for things like non-unique keypoints, partial keypoint
    annotations. This also removes the ordering requirement, which makes it
    simpler to keep track of each keypoints class type.

    We also have a new top-level dictionary to specify all the possible
    keypoint categories.

Auxillary Channels:
    For multimodal or multispectral images it is possible to specify
    auxillary channels in an image dictionary as follows:

    {
        'id': int, 'file_name': str
        'channels': <spec>,  # a spec code that indicates the layout of these channels.
        'auxillary': [  # information about auxillary channels
            {
                'file_name':
                'channels': <spec>
            }, ... # can have many auxillary channels with unique specs
        ]
    }

Video Sequences:
    For video sequences, we add the following video level index:

    "videos": [
        { "id": <int>, "name": <video_name:str> },
    ]

    Note that the videos might be given as encoded mp4/avi/etc.. files (in
    which case the name should correspond to a path) or as a series of
    frames in which case the images should be used to index the extracted
    frames and information in them.

    Then image dictionaries are augmented as follows:

    {
        'video_id': str  # optional, if this image is a frame in a video sequence, this id is shared by all frames in that sequence.
        'timestamp': int  # optional, timestamp (ideally in flicks), used to identify the timestamp of the frame. Only applicable video inputs.
        'frame_index': int  # optional, ordinal frame index which can be used if timestamp is unknown.
    }

    And annotations are augmented as follows:

    {
        "track_id": <int | str | uuid>  # optional, indicates association between annotations across frames
    }

Notes

The main object in this file is class:CocoDataset, which is composed of several mixin classes. See the class and method documentation for more details.

Todo

[ ] Use ijson to lazilly load pieces of the dataset in the background or on demand. This will give us faster access to categories / images, whereas we will always have to wait for annotations etc…
[ ] Should img_root be changed to data root?
[ ] Read video data, return numpy arrays (requires API for images)
[ ] Spec for video URI, and convert to frames @ framerate function.
[ ] remove videos

References

[1]	http://cocodataset.org/#format-data

[2]	https://github.com/nightrome/cocostuffapi/blob/master/PythonAPI/pycocotools/mask.py

[3]	https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch/#coco-dataset-format

class kwcoco.coco_dataset.ObjectList1D(ids, dset, key)[source]¶

Bases: ubelt.util_mixins.NiceRepr

Vectorized access to lists of dictionary objects

Lightweight reference to a set of object (e.g. annotations, images) that allows for convenient property access.

Parameters:	ids (List[int]) – list of ids dset (CocoDataset) – parent dataset key (str) – main object name (e.g. ‘images’, ‘annotations’)

Types:: ObjT = Ann | Img | Cat # can be one of these types ObjectList1D gives us access to a List[ObjT]

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> # Both annots and images are object lists
>>> self = dset.annots()
>>> self = dset.images()
>>> # can call with a list of ids or not, for everything
>>> self = dset.annots([1, 2, 11])
>>> self = dset.images([1, 2, 3])
>>> self.lookup('id')
>>> self.lookup(['id'])

objs¶

all object dictionaries

Type:	Returns
Type:	List

take(idxs)[source]¶

Take a subset by index

Example

>>> self = CocoDataset.demo().annots()
>>> assert len(self.take([0, 2, 3])) == 3

compress(flags)[source]¶

Take a subset by flags

Example

>>> self = CocoDataset.demo().images()
>>> assert len(self.compress([True, False, True])) == 2

peek()[source]¶: Return the first object dictionary

lookup(key, default=NoParam, keepid=False)[source]¶

Lookup a list of object attributes

Parameters:	key (str \| Iterable) – name of the property you want to lookup can also be a list of names, in which case we return a dict default – if specified, uses this value if it doesn’t exist in an ObjT. keepid – if True, return a mapping from ids to the property
Returns:	a list of whatever type the object is Dict[str, ObjT]
Return type:	List[ObjT]

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> self = dset.annots()
>>> self.lookup('id')
>>> key = ['id']
>>> default = None
>>> self.lookup(key=['id', 'image_id'])
>>> self.lookup(key=['id', 'image_id'])
>>> self.lookup(key='foo', default=None, keepid=True)
>>> self.lookup(key=['foo'], default=None, keepid=True)
>>> self.lookup(key=['id', 'image_id'], keepid=True)

get(key, default=NoParam, keepid=False)[source]¶

Lookup a list of object attributes

Parameters:	key (str) – name of the property you want to lookup default – if specified, uses this value if it doesn’t exist in an ObjT. keepid – if True, return a mapping from ids to the property
Returns:	a list of whatever type the object is Dict[str, ObjT]
Return type:	List[ObjT]

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> self = dset.annots()
>>> self.get('id')
>>> self.get(key='foo', default=None, keepid=True)

set(key, values)[source]¶

Assign a value to each annotation

Parameters:	key (str) – the annotation property to modify values (Iterable \| scalar) – an iterable of values to set for each annot in the dataset. If the item is not iterable, it is assigned to all objects.

Example

>>> dset = CocoDataset.demo()
>>> self = dset.annots()
>>> self.set('my-key1', 'my-scalar-value')
>>> self.set('my-key2', np.random.rand(len(self)))
>>> print('dset.imgs = {}'.format(ub.repr2(dset.imgs, nl=1)))
>>> self.get('my-key2')

class kwcoco.coco_dataset.ObjectGroups(groups, dset)[source]¶

Bases: ubelt.util_mixins.NiceRepr

An object for holding a groups of ObjectList1D objects

lookup(key, default=NoParam)[source]¶

class kwcoco.coco_dataset.Categories(ids, dset)[source]¶

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to category attributes

Example

>>> from kwcoco.coco_dataset import Categories  # NOQA
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> ids = list(dset.cats.keys())
>>> self = Categories(ids, dset)
>>> print('self.name = {!r}'.format(self.name))
>>> print('self.supercategory = {!r}'.format(self.supercategory))

cids¶

name¶

supercategory¶

class kwcoco.coco_dataset.Videos(ids, dset)[source]¶

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to video attributes

Example

>>> from kwcoco.coco_dataset import Videos  # NOQA
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('vidshapes5')
>>> ids = list(dset.index.videos.keys())
>>> self = Videos(ids, dset)
>>> print('self = {!r}'.format(self))

class kwcoco.coco_dataset.Images(ids, dset)[source]¶

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to image attributes

gids¶

gname¶

gpath¶

width¶

height¶

size¶

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo().images()
>>> self._dset._ensure_imgsize()
>>> print(self.size)
[(512, 512), (300, 250), (256, 256)]

Type:	Example

area¶

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo().images()
>>> self._dset._ensure_imgsize()
>>> print(self.area)
[262144, 75000, 65536]

Type:	Example

n_annots¶

>>> self = CocoDataset.demo().images()
>>> print(ub.repr2(self.n_annots, nl=0))
[9, 2, 0]

Type:	Example

aids¶

>>> self = CocoDataset.demo().images()
>>> print(ub.repr2(list(map(list, self.aids)), nl=0))
[[1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11], []]

Type:	Example

annots¶

>>> self = CocoDataset.demo().images()
>>> print(self.annots)
<AnnotGroups(n=3, m=3.7, s=3.9)>

Type:	Example

class kwcoco.coco_dataset.Annots(ids, dset)[source]¶

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to annotation attributes

aids¶: The annotation ids of this column of annotations

images¶

Get the column of images

Returns:	Images

image_id¶

category_id¶

gids¶

Get the column of image-ids

Returns:	list of image ids
Return type:	List[int]

cids¶

Get the column of category-ids

Returns:	List[int]

cnames¶

Get the column of category names

Returns:	List[int]

detections¶

Get the kwimage-style detection objects

Returns:	kwimage.Detections

Example

>>> # xdoctest: +REQUIRES(module:kwimage)
>>> from kwcoco.coco_dataset import *  # NOQA
>>> self = CocoDataset.demo('shapes32').annots([1, 2, 11])
>>> dets = self.detections
>>> print('dets.data = {!r}'.format(dets.data))
>>> print('dets.meta = {!r}'.format(dets.meta))

boxes¶

Get the column of kwimage-style bounding boxes

Example

>>> self = CocoDataset.demo().annots([1, 2, 11])
>>> print(self.boxes)
<Boxes(xywh,
    array([[ 10,  10, 360, 490],
           [350,   5, 130, 290],
           [124,  96,  45,  18]]))>

xywh¶

Returns raw boxes

Example

>>> self = CocoDataset.demo().annots([1, 2, 11])
>>> print(self.xywh)

class kwcoco.coco_dataset.AnnotGroups(groups, dset)[source]¶

Bases: kwcoco.coco_dataset.ObjectGroups

cids¶

class kwcoco.coco_dataset.ImageGroups(groups, dset)[source]¶: Bases: kwcoco.coco_dataset.ObjectGroups

class kwcoco.coco_dataset.MixinCocoDepricate[source]¶

Bases: object

These functions are marked for deprication and may be removed at any time

lookup_imgs(filename=None)[source]¶

Linear search for an images with specific attributes

# DEPRICATE

Ignore:: filename = ‘201503.20150525.101841191.573975.png’ list(self.lookup_imgs(filename)) gid = 64940 img = self.imgs[gid] img[‘file_name’] = filename

lookup_anns(has=None)[source]¶

Linear search for an annotations with specific attributes

# DEPRICATE

Ignore:: list(self.lookup_anns(has=’radius’)) gid = 112888 img = self.imgs[gid] img[‘file_name’] = filename

class kwcoco.coco_dataset.MixinCocoExtras[source]¶

Bases: object

Misc functions for coco

load_image(gid_or_img)[source]¶

Reads an image from disk and

Parameters:	gid_or_img (int or dict) – image id or image dict
Returns:	the image
Return type:	np.ndarray

load_image_fpath(gid_or_img)[source]¶

get_image_fpath(gid_or_img)[source]¶

Returns the full path to the image

Parameters:	gid_or_img (int or dict) – image id or image dict
Returns:	full path to the image
Return type:	PathLike

get_auxillary_fpath(gid_or_img, channels)[source]¶

Returns the full path to auxillary data for an image

Parameters:	gid_or_img (int \| dict) – an image or its id channels (str) – the auxillary channel to load (e.g. disparity)

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes8', aux=True)
>>> self.get_auxillary_fpath(1, 'disparity')

load_annot_sample(aid_or_ann, image=None, pad=None)[source]¶

Reads the chip of an annotation. Note this is much less efficient than using a sampler, but it doesn’t require disk cache.

Parameters:	aid_or_int (int or dict) – annot id or dict image (ArrayLike, default=None) – preloaded image (note: this process is inefficient unless image is specified)

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> sample = self.load_annot_sample(2, pad=100)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(sample['im'])
>>> kwplot.show_if_requested()

classmethod coerce(key, **kw)[source]¶

classmethod demo(key='photos', **kw)[source]¶

Create a toy coco dataset for testing and demo puposes

Parameters:	key (str) – either photos or shapes **kw – if key is shapes, these arguments are passed to toydata generation

Example

>>> print(CocoDataset.demo('photos'))
>>> print(CocoDataset.demo('shapes', verbose=0))
>>> print(CocoDataset.demo('shapes256', verbose=0))
>>> print(CocoDataset.demo('shapes8', verbose=0))

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, verbose=0, rng=None)
>>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, num_tracks=4, verbose=0, rng=44)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> pnums = kwplot.PlotNums(nSubplots=len(dset.imgs))
>>> fnum = 1
>>> for gx, gid in enumerate(dset.imgs.keys()):
>>>     canvas = dset.draw_image(gid=gid)
>>>     kwplot.imshow(canvas, pnum=pnums[gx], fnum=fnum)
>>>     #dset.show_image(gid=gid, pnum=pnums[gx])
>>> kwplot.show_if_requested()

category_graph()[source]¶

Construct a networkx category hierarchy

Returns:	graph: a directed graph where category names are the nodes, supercategories define edges, and items in each category dict (e.g. category id) are added as node properties.
Return type:	network.DiGraph

Example

>>> self = CocoDataset.demo()
>>> graph = self.category_graph()
>>> assert 'astronaut' in graph.nodes()
>>> assert 'keypoints' in graph.nodes['human']

import graphid graphid.util.show_nx(graph)

object_categories()[source]¶

Construct a consistent CategoryTree representation of object classes

Returns:	category data structure
Return type:	kwcoco.CategoryTree

Example

>>> self = CocoDataset.demo()
>>> classes = self.object_categories()
>>> print('classes = {}'.format(classes))

keypoint_categories()[source]¶

Construct a consistent CategoryTree representation of keypoint classes

Returns:	category data structure
Return type:	kwcoco.CategoryTree

Example

>>> self = CocoDataset.demo()
>>> classes = self.keypoint_categories()
>>> print('classes = {}'.format(classes))

missing_images(check_aux=False, verbose=0)[source]¶

Check for images that don’t exist

Parameters:	check_aux (bool, default=Fasle) – if specified also checks auxillary images
Returns:	bad indexes and paths
Return type:	List[Tuple[int, str]]

corrupted_images(verbose=0)[source]¶

Check for images that don’t exist or can’t be opened

Returns:	bad indexes and paths
Return type:	List[Tuple[int, str]]

rename_categories(mapper, strict=False, preserve=False, rebuild=True, simple=True, merge_policy='ignore')[source]¶

Create a coarser categorization

Note: this function has been unstable in the past, and has not yet been properly stabalized. Either avoid or use with care. Ensuring simple=True should result in newer saner behavior that will likely be backwards compatible.

Todo

[X] Simple case where we relabel names with no conflicts
[ ] Case where annotation labels need to change to be coarser
- dev note: see internal libraries for work on this
[ ] Other cases

Parameters:

mapper (dict or Function) – maps old names to new names.
strict (bool) – DEPRICATED IGNORE. if True, fails if mapper doesnt map all classes
preserve (bool) – DEPRICATED IGNORE. if True, preserve old categories as supercatgories. Broken.
simple (bool, default=True) – defaults to the new way of doing this. The old way is depricated.
merge_policy (str) – How to handle multiple categories that map to the same name. Can be update or ignore.

Example

>>> self = CocoDataset.demo()
>>> self.rename_categories({'astronomer': 'person',
>>>                         'astronaut': 'person',
>>>                         'mouth': 'person',
>>>                         'helmet': 'hat'}, preserve=0)
>>> assert 'hat' in self.name_to_cat
>>> assert 'helmet' not in self.name_to_cat
>>> # Test merge case
>>> self = CocoDataset.demo()
>>> mapper = {
>>>     'helmet': 'rocket',
>>>     'astronomer': 'rocket',
>>>     'human': 'rocket',
>>>     'mouth': 'helmet',
>>>     'star': 'gas'
>>> }
>>> self.rename_categories(mapper)

rebase(*args, **kw)[source]¶: Deprecated use reroot instead

reroot(new_root=None, old_root=None, absolute=False, check=True, safe=True, smart=False)[source]¶

Rebase image/data paths onto a new image/data root.

Parameters:

new_root (str, default=None) – New image root. If unspecified the current self.img_root is used.
old_root (str, default=None) – If specified, removes the root from file names. If unspecified, then the existing paths MUST be relative to new_root.
absolute (bool, default=False) – if True, file names are stored as absolute paths, otherwise they are relative to the new image root.
check (bool, default=True) – if True, checks that the images all exist.
safe (bool, default=True) – if True, does not overwrite values until all checks pass
smart (bool, default=False) – If True, we can try different reroot strategies and choose the one that works. Note, always be wary when algorithms try to be smart.

CommandLine:: xdoctest -m /home/joncrall/code/kwcoco/kwcoco/coco_dataset.py MixinCocoExtras.reroot

Todo

[ ] Incorporate maximum ordered subtree embedding once completed?

Ignore:

>>> # There might not be a way to easily handle the cases that I
>>> # want to here. Might need to discuss this.
>>> import kwcoco
>>> import os
>>> gname = 'images/foo.png'
>>> remote = '/remote/path'
>>> host = ub.ensure_app_cache_dir('kwcoco/tests/reroot')
>>> fpath = join(host, gname)
>>> ub.ensuredir(dirname(fpath))
>>> # In this test the image exists on the host path
>>> import kwimage
>>> kwimage.imwrite(fpath, np.random.rand(8, 8))
>>> #
>>> cases = {}
>>> # * given absolute paths on current machine
>>> cases['abs_curr'] = kwcoco.CocoDataset.from_image_paths([join(host, gname)])
>>> # * given "remote" rooted relative paths on current machine
>>> cases['rel_remoterooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname], img_root=remote)
>>> # * given "host" rooted relative paths on current machine
>>> cases['rel_hostrooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname], img_root=host)
>>> # * given unrooted relative paths on current machine
>>> cases['rel_unrooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname])
>>> # * given absolute paths on another machine
>>> cases['abs_remote'] = kwcoco.CocoDataset.from_image_paths([join(remote, gname)])
>>> def report(dset, name):
>>>     gid = 1
>>>     rel_fpath = dset.imgs[gid]['file_name']
>>>     abs_fpath = dset.get_image_fpath(gid)
>>>     color = 'green' if exists(abs_fpath) else 'red'
>>>     print('   * strategy_name = {!r}'.format(name))
>>>     print('       * rel_fpath = {!r}'.format(rel_fpath))
>>>     print('       * ' + ub.color_text('abs_fpath = {!r}'.format(abs_fpath), color))
>>> for key, dset in cases.items():
>>>     print('----')
>>>     print('case key = {!r}'.format(key))
>>>     print('ORIG = {!r}'.format(dset.imgs[1]['file_name']))
>>>     print('dset.img_root = {!r}'.format(dset.img_root))
>>>     print('missing_gids = {!r}'.format(dset.missing_images()))
>>>     print('cwd = {!r}'.format(os.getcwd()))
>>>     print('host = {!r}'.format(host))
>>>     print('remote = {!r}'.format(remote))
>>>     #
>>>     dset_None_rel = dset.copy().reroot(absolute=False, check=0)
>>>     report(dset_None_rel, 'dset_None_rel')
>>>     #
>>>     dset_None_abs = dset.copy().reroot(absolute=True, check=0)
>>>     report(dset_None_abs, 'dset_None_abs')
>>>     #
>>>     dset_host_rel = dset.copy().reroot(host, absolute=False, check=0)
>>>     report(dset_host_rel, 'dset_host_rel')
>>>     #
>>>     dset_host_abs = dset.copy().reroot(host, absolute=True, check=0)
>>>     report(dset_host_abs, 'dset_host_abs')
>>>     #
>>>     dset_remote_rel = dset.copy().reroot(host, old_root=remote, absolute=False, check=0)
>>>     report(dset_remote_rel, 'dset_remote_rel')
>>>     #
>>>     dset_remote_abs = dset.copy().reroot(host, old_root=remote, absolute=True, check=0)
>>>     report(dset_remote_abs, 'dset_remote_abs')

Example

>>> import kwcoco
>>> def report(dset, name):
>>>     gid = 1
>>>     abs_fpath = dset.get_image_fpath(gid)
>>>     rel_fpath = dset.imgs[gid]['file_name']
>>>     color = 'green' if exists(abs_fpath) else 'red'
>>>     print('strategy_name = {!r}'.format(name))
>>>     print(ub.color_text('abs_fpath = {!r}'.format(abs_fpath), color))
>>>     print('rel_fpath = {!r}'.format(rel_fpath))
>>> dset = self = kwcoco.CocoDataset.demo()
>>> # Change base relative directory
>>> img_root = ub.expandpath('~')
>>> print('ORIG self.imgs = {!r}'.format(self.imgs))
>>> print('ORIG dset.img_root = {!r}'.format(dset.img_root))
>>> print('NEW img_root       = {!r}'.format(img_root))
>>> self.reroot(img_root)
>>> report(self, 'self')
>>> print('NEW self.imgs = {!r}'.format(self.imgs))
>>> assert self.imgs[1]['file_name'].startswith('.cache')

>>> # Use absolute paths
>>> self.reroot(absolute=True)
>>> assert self.imgs[1]['file_name'].startswith(img_root)

>>> # Switch back to relative paths
>>> self.reroot()
>>> assert self.imgs[1]['file_name'].startswith('.cache')

Example

>>> # demo with auxillary data
>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes8', aux=True)
>>> img_root = ub.expandpath('~')
>>> print(self.imgs[1]['file_name'])
>>> print(self.imgs[1]['auxillary'][0]['file_name'])
>>> self.reroot(img_root)
>>> print(self.imgs[1]['file_name'])
>>> print(self.imgs[1]['auxillary'][0]['file_name'])
>>> assert self.imgs[1]['file_name'].startswith('.cache')
>>> assert self.imgs[1]['auxillary'][0]['file_name'].startswith('.cache')

data_root¶: In the future we may deprecate img_root for data_root

find_representative_images(gids=None)[source]¶

Find images that have a wide array of categories. Attempt to find the fewest images that cover all categories using images that contain both a large and small number of annotations.

Parameters:	gids (None \| List) – Subset of image ids to consider when finding representative images. Uses all images if unspecified.
Returns:	list of image ids determined to be representative
Return type:	List

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> gids = self.find_representative_images()
>>> print('gids = {!r}'.format(gids))
>>> gids = self.find_representative_images([3])
>>> print('gids = {!r}'.format(gids))

>>> self = kwcoco.CocoDataset.demo('shapes8')
>>> gids = self.find_representative_images()
>>> print('gids = {!r}'.format(gids))
>>> valid = {7, 1}
>>> gids = self.find_representative_images(valid)
>>> assert valid.issuperset(gids)
>>> print('gids = {!r}'.format(gids))

class kwcoco.coco_dataset.MixinCocoAttrs[source]¶

Bases: object

Expose methods to construct object lists / groups

annots(aids=None, gid=None)[source]¶

Return vectorized annotation objects

Parameters:	aids (List[int]) – annotation ids to reference, if unspecified all annotations are returned. gid (int) – return all annotations that belong to this image id. mutually exclusive with aids arg.
Returns:	vectorized annotation object
Return type:	Annots

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> annots = self.annots()
>>> print(annots)
<Annots(num=11)>
>>> sub_annots = annots.take([1, 2, 3])
>>> print(sub_annots)
<Annots(num=3)>
>>> print(ub.repr2(sub_annots.get('bbox', None)))
[
    [350, 5, 130, 290],
    None,
    None,
]

images(gids=None)[source]¶

Return vectorized image objects

Parameters:	gids (List[int]) – image ids to reference, if unspecified all images are returned.
Returns:	vectorized images object
Return type:	Images

Example

>>> self = CocoDataset.demo()
>>> images = self.images()
>>> print(images)
<Images(num=3)>

categories(cids=None)[source]¶

Return vectorized category objects

Example

>>> self = CocoDataset.demo()
>>> categories = self.categories()
>>> print(categories)
<Categories(num=8)>

videos(vidids=None)[source]¶

Return vectorized video objects

Example

>>> self = CocoDataset.demo('vidshapes2')
>>> videos = self.videos()
>>> print(videos)
>>> videos.lookup('name')
>>> videos.lookup('id')
>>> print('videos.objs = {}'.format(ub.repr2(videos.objs[0:2], nl=1)))

class kwcoco.coco_dataset.MixinCocoStats[source]¶

Bases: object

Methods for getting stats about the dataset

n_annots¶

n_images¶

n_cats¶

n_videos¶

keypoint_annotation_frequency()[source]¶

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo('shapes', rng=0)
>>> hist = self.keypoint_annotation_frequency()
>>> hist = ub.odict(sorted(hist.items()))
>>> # FIXME: for whatever reason demodata generation is not determenistic when seeded
>>> print(ub.repr2(hist))  # xdoc: +IGNORE_WANT
{
    'bot_tip': 6,
    'left_eye': 14,
    'mid_tip': 6,
    'right_eye': 14,
    'top_tip': 6,
}

category_annotation_frequency()[source]¶

Reports the number of annotations of each category

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> hist = self.category_annotation_frequency()
>>> print(ub.repr2(hist))
{
    'astroturf': 0,
    'human': 0,
    'astronaut': 1,
    'astronomer': 1,
    'helmet': 1,
    'rocket': 1,
    'mouth': 2,
    'star': 5,
}

category_annotation_type_frequency()[source]¶

Reports the number of annotations of each type for each category

Example

>>> self = CocoDataset.demo()
>>> hist = self.category_annotation_frequency()
>>> print(ub.repr2(hist))

basic_stats()[source]¶

Reports number of images, annotations, and categories.

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> print(ub.repr2(self.basic_stats()))
{
    'n_anns': 11,
    'n_imgs': 3,
    'n_videos': 0,
    'n_cats': 8,
}

>>> from kwcoco.demo.toydata import *  # NOQA
>>> dset = random_video_dset(render=True, num_frames=2, num_tracks=10, rng=0)
>>> print(ub.repr2(dset.basic_stats()))
{
    'n_anns': 20,
    'n_imgs': 2,
    'n_videos': 1,
    'n_cats': 3,
}

extended_stats()[source]¶

Reports number of images, annotations, and categories.

Example

>>> self = CocoDataset.demo()
>>> print(ub.repr2(self.extended_stats()))

boxsize_stats(anchors=None, perclass=True, gids=None, aids=None, verbose=0, clusterkw={}, statskw={})[source]¶

Compute statistics about bounding box sizes.

Also computes anchor boxes using kmeans if anchors is specified.

Parameters:

anchors (int) – if specified also computes box anchors
perclass (bool) – if True also computes stats for each category
gids (List[int], default=None) – if specified only compute stats for these image ids.
aids (List[int], default=None) – if specified only compute stats for these annotation ids.
verbose (int) – verbosity level
clusterkw (dict) – kwargs for sklearn.cluster.KMeans used if computing anchors.
statskw (dict) – kwargs for kwarray.stats_dict()

Returns:

Dict[str, Dict[str, Dict | ndarray]

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes32')
>>> infos = self.boxsize_stats(anchors=4, perclass=False)
>>> print(ub.repr2(infos, nl=-1, precision=2))

>>> infos = self.boxsize_stats(gids=[1], statskw=dict(median=True))
>>> print(ub.repr2(infos, nl=-1, precision=2))

class kwcoco.coco_dataset.MixinCocoDraw[source]¶

Bases: object

Matplotlib / display functionality

imread(gid)[source]¶: Loads a particular image

draw_image(gid)[source]¶

Use kwimage to draw all annotations on an image and return the pixels as a numpy array.

Returns:	canvas
Return type:	ndarray

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes8')
>>> self.draw_image(1)
>>> # Now you can dump the annotated image to disk / whatever
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(canvas)

show_image(gid=None, aids=None, aid=None, **kwargs)[source]¶

Use matplotlib to show an image with annotations overlaid

Parameters:	gid (int) – image to show aids (list) – aids to highlight within the image aid (int) – a specific aid to focus on. If gid is not give, look up gid based on this aid. **kwargs – show_annots, show_aid, show_catname, show_kpname, show_segmentation, title, show_gid, show_filename, show_boxes,

Ignore:: # Programatically collect the kwargs for docs generation import xinspect import kwcoco kwargs = xinspect.get_kwargs(kwcoco.CocoDataset.show_image) print(ub.repr2(list(kwargs.keys()), nl=1, si=1))

class kwcoco.coco_dataset.MixinCocoAddRemove[source]¶

Bases: object

Mixin functions to dynamically add / remove annotations images and categories while maintaining lookup indexes.

add_video(name, id=None, **kw)[source]¶

Add a video to the dataset (dynamically updates the index)

Parameters:	name (str) – Unique name for this video. id (None or int) – ADVANCED. Force using this image id. **kw – stores arbitrary key/value pairs in this new video

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset()
>>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1)))
>>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1)))
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))

>>> vidid1 = self.add_video('foo', id=3)
>>> vidid2 = self.add_video('bar')
>>> vidid3 = self.add_video('baz')
>>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1)))
>>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1)))
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))

>>> gid1 = self.add_image('foo1.jpg', video_id=vidid1)
>>> gid2 = self.add_image('foo2.jpg', video_id=vidid1)
>>> gid3 = self.add_image('foo3.jpg', video_id=vidid1)
>>> self.add_image('bar1.jpg', video_id=vidid2)
>>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1)))
>>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1)))
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))

>>> self.remove_images([gid2])
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))

add_image(file_name, id=None, **kw)[source]¶

Add an image to the dataset (dynamically updates the index)

Parameters:	file_name (str) – relative or absolute path to image id (None or int) – ADVANCED. Force using this image id. **kw – stores arbitrary key/value pairs in this new image

Example

>>> self = CocoDataset.demo()
>>> import kwimage
>>> gname = kwimage.grab_test_image_fpath('paraview')
>>> gid = self.add_image(gname)
>>> assert self.imgs[gid]['file_name'] == gname

add_annotation(image_id, category_id=None, bbox=None, id=None, **kw)[source]¶

Add an annotation to the dataset (dynamically updates the index)

Parameters:	image_id (int) – image_id to add to category_id (int) – category_id to add to bbox (list or kwimage.Boxes) – bounding box in xywh format id (None or int) – ADVANCED. Force using this annotation id. **kw – stores arbitrary key/value pairs in this new image

Example

>>> self = CocoDataset.demo()
>>> image_id = 1
>>> cid = 1
>>> bbox = [10, 10, 20, 20]
>>> aid = self.add_annotation(image_id, cid, bbox)
>>> assert self.anns[aid]['bbox'] == bbox

Example

>>> # Attempt to annot without a category or bbox
>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> image_id = 1
>>> aid = self.add_annotation(image_id)
>>> assert None in self.index.cid_to_aids

add_category(name, supercategory=None, id=None, **kw)[source]¶

Adds a category

Parameters:	name (str) – name of the new category supercategory (str, optional) – parent of this category id (int, optional) – use this category id, if it was not taken **kw – stores arbitrary key/value pairs in this new image

Example

>>> self = CocoDataset.demo()
>>> prev_n_cats = self.n_cats
>>> cid = self.add_category('dog', supercategory='object')
>>> assert self.cats[cid]['name'] == 'dog'
>>> assert self.n_cats == prev_n_cats + 1
>>> import pytest
>>> with pytest.raises(ValueError):
>>>     self.add_category('dog', supercategory='object')

ensure_image(file_name, id=None, **kw)[source]¶

Like add_image, but returns the existing image id if it already exists instead of failing. In this case all metadata is ignored.

Parameters:	file_name (str) – relative or absolute path to image id (None or int) – ADVANCED. Force using this image id. **kw – stores arbitrary key/value pairs in this new image
Returns:	the existing or new image id
Return type:	int

ensure_category(name, supercategory=None, id=None, **kw)[source]¶

Like add_category, but returns the existing category id if it already exists instead of failing. In this case all metadata is ignored.

Returns:	the existing or new category id
Return type:	int

add_annotations(anns)[source]¶

Faster less-safe multi-item alternative

Parameters:	anns (List[Dict]) – list of annotation dictionaries

Example

>>> self = CocoDataset.demo()
>>> anns = [self.anns[aid] for aid in [2, 3, 5, 7]]
>>> self.remove_annotations(anns)
>>> assert self.n_annots == 7 and self._check_index()
>>> self.add_annotations(anns)
>>> assert self.n_annots == 11 and self._check_index()

add_images(imgs)[source]¶

Faster less-safe multi-item alternative

Note

THIS FUNCTION WAS DESIGNED FOR SPEED, AS SUCH IT DOES NOT CHECK IF THE IMAGE-IDs or FILE_NAMES ARE DUPLICATED AND WILL BLINDLY ADD DATA EVEN IF IT IS BAD. THE SINGLE IMAGE VERSION IS SLOWER BUT SAFER.

Parameters:	imgs (List[Dict]) – list of image dictionaries

Example

>>> imgs = CocoDataset.demo().dataset['images']
>>> self = CocoDataset()
>>> self.add_images(imgs)
>>> assert self.n_images == 3 and self._check_index()

clear_images()[source]¶

Removes all images and annotations (but not categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_images()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 0, n_videos: 0, n_cats: 8

clear_annotations()[source]¶

Removes all annotations (but not images and categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_annotations()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 3, n_videos: 0, n_cats: 8

remove_all_images()¶

Removes all images and annotations (but not categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_images()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 0, n_videos: 0, n_cats: 8

remove_all_annotations()¶

Removes all annotations (but not images and categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_annotations()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 3, n_videos: 0, n_cats: 8

remove_annotation(aid_or_ann)[source]¶

Remove a single annotation from the dataset

If you have multiple annotations to remove its more efficient to remove them in batch with self.remove_annotations

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]]
>>> self.remove_annotations(aids_or_anns)
>>> assert len(self.dataset['annotations']) == 7
>>> self._check_index()

remove_annotations(aids_or_anns, verbose=0, safe=True)[source]¶

Remove multiple annotations from the dataset.

Parameters:	anns_or_aids (List) – list of annotation dicts or ids safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns:	num_removed: information on the number of items removed
Return type:	Dict

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> prev_n_annots = self.n_annots
>>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]]
>>> self.remove_annotations(aids_or_anns)  # xdoc: +IGNORE_WANT
{'annotations': 4}
>>> assert len(self.dataset['annotations']) == prev_n_annots - 4
>>> self._check_index()

remove_categories(cat_identifiers, keep_annots=False, verbose=0, safe=True)[source]¶

Remove categories and all annotations in those categories. Currently does not change any hierarchy information

Parameters:	cat_identifiers (List) – list of category dicts, names, or ids keep_annots (bool, default=False) – if True, keeps annotations, but removes category labels. safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns:	num_removed: information on the number of items removed
Return type:	Dict

Example

>>> self = CocoDataset.demo()
>>> cat_identifiers = [self.cats[1], 'rocket', 3]
>>> self.remove_categories(cat_identifiers)
>>> assert len(self.dataset['categories']) == 5
>>> self._check_index()

remove_images(gids_or_imgs, verbose=0, safe=True)[source]¶

Parameters:	gids_or_imgs (List) – list of image dicts, names, or ids safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns:	num_removed: information on the number of items removed
Return type:	Dict

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> assert len(self.dataset['images']) == 3
>>> gids_or_imgs = [self.imgs[2], 'astro.png']
>>> self.remove_images(gids_or_imgs)  # xdoc: +IGNORE_WANT
{'annotations': 11, 'images': 2}
>>> assert len(self.dataset['images']) == 1
>>> self._check_index()
>>> gids_or_imgs = [3]
>>> self.remove_images(gids_or_imgs)
>>> assert len(self.dataset['images']) == 0
>>> self._check_index()

remove_annotation_keypoints(kp_identifiers)[source]¶

Removes all keypoints with a particular category

Parameters:	kp_identifiers (List) – list of keypoint category dicts, names, or ids
Returns:	num_removed: information on the number of items removed
Return type:	Dict

remove_keypoint_categories(kp_identifiers)[source]¶

Removes all keypoints of a particular category as well as all annotation keypoints with those ids.

Parameters:	kp_identifiers (List) – list of keypoint category dicts, names, or ids
Returns:	num_removed: information on the number of items removed
Return type:	Dict

Example

>>> self = CocoDataset.demo('shapes', rng=0)
>>> kp_identifiers = ['left_eye', 'mid_tip']
>>> remove_info = self.remove_keypoint_categories(kp_identifiers)
>>> print('remove_info = {!r}'.format(remove_info))
>>> # FIXME: for whatever reason demodata generation is not determenistic when seeded
>>> # assert remove_info == {'keypoint_categories': 2, 'annotation_keypoints': 16, 'reflection_ids': 1}
>>> assert self._resolve_to_kpcat('right_eye')['reflection_id'] is None

set_annotation_category(aid_or_ann, cid_or_cat)[source]¶

Sets the category of a single annotation

Parameters:	aid_or_ann (dict \| int) – annotation dict or id cid_or_cat (dict \| int) – category dict or id

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> old_freq = self.category_annotation_frequency()
>>> aid_or_ann = aid = 2
>>> cid_or_cat = new_cid = self.ensure_category('kitten')
>>> self.set_annotation_category(aid, new_cid)
>>> new_freq = self.category_annotation_frequency()
>>> print('new_freq = {}'.format(ub.repr2(new_freq, nl=1)))
>>> print('old_freq = {}'.format(ub.repr2(old_freq, nl=1)))
>>> assert sum(new_freq.values()) == sum(old_freq.values())
>>> assert new_freq['kitten'] == 1

class kwcoco.coco_dataset.CocoIndex[source]¶

Bases: object

Fast lookup index for the COCO dataset with dynamic modification

Variables:	imgs (Dict[int, dict]) – mapping between image ids and the image dictionaries anns (Dict[int, dict]) – mapping between annotation ids and the annotation dictionaries cats (Dict[int, dict]) – mapping between category ids and the category dictionaries

cid_to_gids¶

>>> import kwcoco
>>> self = dset = kwcoco.CocoDataset()
>>> self.index.cid_to_gids

Type:	Example

clear()[source]¶

build(parent)[source]¶

Build all id-to-obj reverse indexes from scratch.

Parameters:	parent (CocoDataset) – the dataset to index

Notation:: aid - Annotation ID gid - imaGe ID cid - Category ID vidid - Video ID

Example

>>> from kwcoco.demo.toydata import *  # NOQA
>>> parent = CocoDataset.demo('vidshapes1', num_frames=4, rng=1)
>>> index = parent.index
>>> index.build(parent)

class kwcoco.coco_dataset.MixinCocoIndex[source]¶

Bases: object

Give the dataset top level access to index attributes

anns¶

imgs¶

cats¶

videos¶

gid_to_aids¶

cid_to_aids¶

name_to_cat¶

class kwcoco.coco_dataset.CocoDataset(data=None, tag=None, img_root=None, autobuild=True)[source]¶

Bases: ubelt.util_mixins.NiceRepr, kwcoco.coco_dataset.MixinCocoAddRemove, kwcoco.coco_dataset.MixinCocoStats, kwcoco.coco_dataset.MixinCocoAttrs, kwcoco.coco_dataset.MixinCocoDraw, kwcoco.coco_dataset.MixinCocoExtras, kwcoco.coco_dataset.MixinCocoIndex, kwcoco.coco_dataset.MixinCocoDepricate

Notes

A keypoint annotation

{: “image_id” : int, “category_id” : int, “keypoints” : [x1,y1,v1,…,xk,yk,vk], “score” : float,

} Note that v[i] is a visibility flag, where v=0: not labeled,

v=1: labeled but not visible, and v=2: labeled and visible.

A bounding box annotation

{: “image_id” : int, “category_id” : int, “bbox” : [x,y,width,height], “score” : float,

}

We also define a non-standard “line” annotation (which

our fixup scripts will interpret as the diameter of a circle to convert into a bounding box)

A line* annotation (note this is a non-standard field)

{: “image_id” : int, “category_id” : int, “line” : [x1,y1,x2,y2], “score” : float,

}

Lastly, note that our datasets will sometimes specify multiple bbox, line, and/or, keypoints fields. In this case we may also specify a field roi_shape, which denotes which field is the “main” annotation type.

Variables:

dataset (Dict) – raw json data structure. This is the base dictionary that contains {‘annotations’: List, ‘images’: List, ‘categories’: List}
index (CocoIndex) – an efficient lookup index into the coco data structure. The index defines its own attributes like anns, cats, imgs, etc. See CocoIndex for more details on which attributes are available.
fpath (PathLike | None) – if known, this stores the filepath the dataset was loaded from
tag (str) – A tag indicating the name of the dataset.
img_root (PathLike | None) – If known, this is the root path that all image file names are relative to. This can also be manually overwritten by the user.
hashid (str | None) – If computed, this will be a hash uniquely identifing the dataset. To ensure this is computed see _build_hashid().

References

http://cocodataset.org/#format http://cocodataset.org/#download

CommandLine:: python -m kwcoco.coco_dataset CocoDataset –show

Example

>>> dataset = demo_coco_data()
>>> self = CocoDataset(dataset, tag='demo')
>>> # xdoctest: +REQUIRES(--show)
>>> self.show_image(gid=2)
>>> from matplotlib import pyplot as plt
>>> plt.show()

classmethod from_data(data, img_root=None)[source]¶: Constructor from a json dictionary

classmethod from_image_paths(gpaths, img_root=None)[source]¶

Constructor from a list of images paths

Example

>>> coco_dset = CocoDataset.from_image_paths(['a.png', 'b.png'])
>>> assert coco_dset.n_images == 2

classmethod from_coco_paths(fpaths, max_workers=0, verbose=1, mode='thread', union='try')[source]¶

Constructor from multiple coco file paths.

Loads multiple coco datasets and unions the result

Notes

if the union operation fails, the list of individually loaded files is returned instead.

Parameters:

fpaths (List[str]) – list of paths to multiple coco files to be loaded and unioned.
max_workers (int, default=0) – number of worker threads / processes
verbose (int) – verbosity level
mode (str) – thread, process, or serial
union (str | bool, default=’try’) – If True, unions the result datasets after loading. If False, just returns the result list. If ‘try’, then try to preform the union, but return the result list if it fails.

copy()[source]¶

Deep copies this object

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> new = self.copy()
>>> assert new.imgs[1] is new.dataset['images'][0]
>>> assert new.imgs[1] == self.dataset['images'][0]
>>> assert new.imgs[1] is not self.dataset['images'][0]

dumps(indent=None, newlines=False)[source]¶

Writes the dataset out to the json format

Parameters:	newlines (bool) – if True, each annotation, image, category gets its own line

Notes

Using newlines=True is similar to:: print(ub.repr2(dset.dataset, nl=2, trailsep=False)) However, the above may not output valid json if it contains ndarrays.

Example

>>> from kwcoco.coco_dataset import *
>>> import json
>>> self = CocoDataset.demo()
>>> text = self.dumps(newlines=True)
>>> print(text)
>>> self2 = CocoDataset(json.loads(text), tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

>>> text = self.dumps(newlines=True)
>>> print(text)
>>> self2 = CocoDataset(json.loads(text), tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

Ignore:

for k in self2.dataset:

if self.dataset[k] == self2.dataset[k]:: print(‘YES: k = {!r}’.format(k))
else:: print(‘NO: k = {!r}’.format(k))

self2.dataset[‘categories’] self.dataset[‘categories’]

dump(file, indent=None, newlines=False)[source]¶

Writes the dataset out to the json format

Parameters:	file (PathLike \| FileLike) – Where to write the data. Can either be a path to a file or an open file pointer / stream. newlines (bool) – if True, each annotation, image, category gets its own line.

Example

>>> import tempfile
>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> file = tempfile.NamedTemporaryFile('w')
>>> self.dump(file)
>>> file.seek(0)
>>> text = open(file.name, 'r').read()
>>> print(text)
>>> file.seek(0)
>>> dataset = json.load(open(file.name, 'r'))
>>> self2 = CocoDataset(dataset, tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

>>> file = tempfile.NamedTemporaryFile('w')
>>> self.dump(file, newlines=True)
>>> file.seek(0)
>>> text = open(file.name, 'r').read()
>>> print(text)
>>> file.seek(0)
>>> dataset = json.load(open(file.name, 'r'))
>>> self2 = CocoDataset(dataset, tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset

union(*others, **kwargs)[source]¶

Merges multiple CocoDataset items into one. Names and associations are retained, but ids may be different.

Parameters:	self – note that `union()` can be called as an instance method or a class method. If it is a class method, then this is the class type, otherwise the instance will also be unioned with `others`. others – a series of CocoDatasets that we will merge *kwargs – constructor options for the new merged CocoDataset
Returns:	a new merged coco dataset
Return type:	CocoDataset

Example

>>> # Test union works with different keypoint categories
>>> dset1 = CocoDataset.demo('shapes1')
>>> dset2 = CocoDataset.demo('shapes2')
>>> dset1.remove_keypoint_categories(['bot_tip', 'mid_tip', 'right_eye'])
>>> dset2.remove_keypoint_categories(['top_tip', 'left_eye'])
>>> dset_12a = CocoDataset.union(dset1, dset2)
>>> dset_12b = dset1.union(dset2)
>>> dset_21 = dset2.union(dset1)
>>> def add_hist(h1, h2):
>>>     return {k: h1.get(k, 0) + h2.get(k, 0) for k in set(h1) | set(h2)}
>>> kpfreq1 = dset1.keypoint_annotation_frequency()
>>> kpfreq2 = dset2.keypoint_annotation_frequency()
>>> kpfreq_want = add_hist(kpfreq1, kpfreq2)
>>> kpfreq_got1 = dset_12a.keypoint_annotation_frequency()
>>> kpfreq_got2 = dset_12b.keypoint_annotation_frequency()
>>> assert kpfreq_want == kpfreq_got1
>>> assert kpfreq_want == kpfreq_got2

>>> # Test disjoint gid datasets
>>> import kwcoco
>>> dset1 = kwcoco.CocoDataset.demo('shapes3')
>>> for new_gid, img in enumerate(dset1.dataset['images'], start=10):
>>>     for aid in dset1.gid_to_aids[img['id']]:
>>>         dset1.anns[aid]['image_id'] = new_gid
>>>     img['id'] = new_gid
>>> dset1.index.clear()
>>> dset1._build_index()
>>> # ------
>>> dset2 = kwcoco.CocoDataset.demo('shapes2')
>>> for new_gid, img in enumerate(dset2.dataset['images'], start=100):
>>>     for aid in dset2.gid_to_aids[img['id']]:
>>>         dset2.anns[aid]['image_id'] = new_gid
>>>     img['id'] = new_gid
>>> dset1.index.clear()
>>> dset2._build_index()
>>> others = [dset1, dset2]
>>> merged = kwcoco.CocoDataset.union(*others)
>>> print('merged = {!r}'.format(merged))
>>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1)))
>>> assert set(merged.imgs) & set([10, 11, 12, 100, 101]) == set(merged.imgs)

>>> # Test data is not preserved
>>> dset2 = kwcoco.CocoDataset.demo('shapes2')
>>> dset1 = kwcoco.CocoDataset.demo('shapes3')
>>> others = (dset1, dset2)
>>> cls = self = kwcoco.CocoDataset
>>> merged = cls.union(*others)
>>> print('merged = {!r}'.format(merged))
>>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1)))
>>> assert set(merged.imgs) & set([1, 2, 3, 4, 5]) == set(merged.imgs)

Todo

[ ] are supercategories broken?
[ ] reuse image ids where possible
[ ] reuse annotation / category ids where possible
[ ] disambiguate track-ids
[x] disambiguate video-ids

subset(gids, copy=False, autobuild=True)[source]¶

Return a subset of the larger coco dataset by specifying which images to port. All annotations in those images will be taken.

Parameters:	gids (List[int]) – image-ids to copy into a new dataset copy (bool, default=False) – if True, makes a deep copy of all nested attributes, otherwise makes a shallow copy. autobuild (bool, default=True) – if True will automatically build the fast lookup index.

Example

>>> self = CocoDataset.demo()
>>> gids = [1, 3]
>>> sub_dset = self.subset(gids)
>>> assert len(self.gid_to_aids) == 3
>>> assert len(sub_dset.gid_to_aids) == 2

Example

>>> self = CocoDataset.demo()
>>> sub1 = self.subset([1])
>>> sub2 = self.subset([2])
>>> sub3 = self.subset([3])
>>> others = [sub1, sub2, sub3]
>>> rejoined = CocoDataset.union(*others)
>>> assert len(sub1.anns) == 9
>>> assert len(sub2.anns) == 2
>>> assert len(sub3.anns) == 0
>>> assert rejoined.basic_stats() == self.basic_stats()

kwcoco.coco_dataset.demo_coco_data()[source]¶

Simple data for testing

Ignore:: # code for getting a segmentation polygon kwimage.grab_test_image_fpath(‘astro’) labelme /home/joncrall/.cache/kwimage/demodata/astro.png cat /home/joncrall/.cache/kwimage/demodata/astro.json

Example

>>> # xdoctest: +REQUIRES(--show)
>>> from kwcoco.coco_dataset import demo_coco_data, CocoDataset
>>> dataset = demo_coco_data()
>>> self = CocoDataset(dataset, tag='demo')
>>> import kwplot
>>> kwplot.autompl()
>>> self.show_image(gid=1)
>>> kwplot.show_if_requested()