kwcoco.coco_dataset module

An implementation and extension of the original MS-COCO API [1].

Extends the format to also include line annotations.

Dataset Spec:

  • Note: a formal spec has been defined in
category = {
    'id': int,
    'name': str,
    'supercategory': Optional[str],
    'keypoints': Optional(List[str]),
    'skeleton': Optional(List[Tuple[Int, Int]]),
}

image = {
    'id': int,
    'file_name': str
}

dataset = {
    # these are object level categories
    'categories': [category],
    'images': [image]
        ...
    ],
    'annotations': [
        {
            'id': Int,
            'image_id': Int,
            'category_id': Int,
            'track_id': Optional[Int],

            'bbox': [tl_x, tl_y, w, h],  # optional (xywh format)
            "score" : float,  # optional
            "prob" : List[float],  # optional
            "weight" : float,  # optional

            "caption": str,  # an optional text caption for this annotation
            "iscrowd" : <0 or 1>,  # denotes if the annotation covers a single object (0) or multiple objects (1)
            "keypoints" : [x1,y1,v1,...,xk,yk,vk], # or new dict-based format
            'segmentation': <RunLengthEncoding | Polygon>,  # formats are defined bellow
        },
        ...
    ],
    'licenses': [],
    'info': [],
}

Polygon:
    A flattned list of xy coordinates.
    [x1, y1, x2, y2, ..., xn, yn]

    or a list of flattned list of xy coordinates if the CCs are disjoint
    [[x1, y1, x2, y2, ..., xn, yn], [x1, y1, ..., xm, ym],]

    Note: the original coco spec does not allow for holes in polygons.

    We also allow a non-standard dictionary encoding of polygons
        {'exterior': [(x1, y1)...],
         'interiors': [[(x1, y1), ...], ...]}

RunLengthEncoding:
    The RLE can be in a special bytes encoding or in a binary array
    encoding. We reuse the original C functions are in [2]_ in
    ``kwimage.structs.Mask`` to provide a convinient way to abstract this
    rather esoteric bytes encoding.

    For pure python implementations see kwimage:
        Converting from an image to RLE can be done via kwimage.run_length_encoding
        Converting from RLE back to an image can be done via:
            kwimage.decode_run_length

        For compatibility with the COCO specs ensure the binary flags
        for these functions are set to true.

Keypoints:
    Annotation keypoints may also be specified in this non-standard (but
    ultimately more general) way:

    'annotations': [
        {
            'keypoints': [
                {
                    'xy': <x1, y1>,
                    'visible': <0 or 1 or 2>,
                    'keypoint_category_id': <kp_cid>,
                    'keypoint_category': <kp_name, optional>,  # this can be specified instead of an id
                }, ...
            ]
        }, ...
    ],
    'keypoint_categories': [{
        'name': <str>,
        'id': <int>,  # an id for this keypoint category
        'supercategory': <kp_name>  # name of coarser parent keypoint class (for hierarchical keypoints)
        'reflection_id': <kp_cid>  # specify only if the keypoint id would be swapped with another keypoint type
    },...
    ]

    In this scheme the "keypoints" property of each annotation (which used
    to be a list of floats) is now specified as a list of dictionaries that
    specify each keypoints location, id, and visibility explicitly. This
    allows for things like non-unique keypoints, partial keypoint
    annotations. This also removes the ordering requirement, which makes it
    simpler to keep track of each keypoints class type.

    We also have a new top-level dictionary to specify all the possible
    keypoint categories.

Auxillary Channels:
    For multimodal or multispectral images it is possible to specify
    auxillary channels in an image dictionary as follows:

    {
        'id': int, 'file_name': str
        'channels': <spec>,  # a spec code that indicates the layout of these channels.
        'auxillary': [  # information about auxillary channels
            {
                'file_name':
                'channels': <spec>
            }, ... # can have many auxillary channels with unique specs
        ]
    }

Video Sequences:
    For video sequences, we add the following video level index:

    "videos": [
        { "id": <int>, "name": <video_name:str> },
    ]

    Note that the videos might be given as encoded mp4/avi/etc.. files (in
    which case the name should correspond to a path) or as a series of
    frames in which case the images should be used to index the extracted
    frames and information in them.

    Then image dictionaries are augmented as follows:

    {
        'video_id': str  # optional, if this image is a frame in a video sequence, this id is shared by all frames in that sequence.
        'timestamp': int  # optional, timestamp (ideally in flicks), used to identify the timestamp of the frame. Only applicable video inputs.
        'frame_index': int  # optional, ordinal frame index which can be used if timestamp is unknown.
    }

    And annotations are augmented as follows:

    {
        "track_id": <int | str | uuid>  # optional, indicates association between annotations across frames
    }

Notes

The main object in this file is class:CocoDataset, which is composed of several mixin classes. See the class and method documentation for more details.

Todo

  • [ ] Use ijson to lazilly load pieces of the dataset in the background or on demand. This will give us faster access to categories / images, whereas we will always have to wait for annotations etc…
  • [ ] Should img_root be changed to data root?
  • [ ] Read video data, return numpy arrays (requires API for images)
  • [ ] Spec for video URI, and convert to frames @ framerate function.
  • [ ] remove videos

References

[1]http://cocodataset.org/#format-data
[2]https://github.com/nightrome/cocostuffapi/blob/master/PythonAPI/pycocotools/mask.py
[3]https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch/#coco-dataset-format
class kwcoco.coco_dataset.ObjectList1D(ids, dset, key)[source]

Bases: ubelt.util_mixins.NiceRepr

Vectorized access to lists of dictionary objects

Lightweight reference to a set of object (e.g. annotations, images) that allows for convenient property access.

Parameters:
  • ids (List[int]) – list of ids
  • dset (CocoDataset) – parent dataset
  • key (str) – main object name (e.g. ‘images’, ‘annotations’)
Types:
ObjT = Ann | Img | Cat # can be one of these types ObjectList1D gives us access to a List[ObjT]

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> # Both annots and images are object lists
>>> self = dset.annots()
>>> self = dset.images()
>>> # can call with a list of ids or not, for everything
>>> self = dset.annots([1, 2, 11])
>>> self = dset.images([1, 2, 3])
>>> self.lookup('id')
>>> self.lookup(['id'])
objs

all object dictionaries

Type:Returns
Type:List
take(idxs)[source]

Take a subset by index

Example

>>> self = CocoDataset.demo().annots()
>>> assert len(self.take([0, 2, 3])) == 3
compress(flags)[source]

Take a subset by flags

Example

>>> self = CocoDataset.demo().images()
>>> assert len(self.compress([True, False, True])) == 2
peek()[source]

Return the first object dictionary

lookup(key, default=NoParam, keepid=False)[source]

Lookup a list of object attributes

Parameters:
  • key (str | Iterable) – name of the property you want to lookup can also be a list of names, in which case we return a dict
  • default – if specified, uses this value if it doesn’t exist in an ObjT.
  • keepid – if True, return a mapping from ids to the property
Returns:

a list of whatever type the object is Dict[str, ObjT]

Return type:

List[ObjT]

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> self = dset.annots()
>>> self.lookup('id')
>>> key = ['id']
>>> default = None
>>> self.lookup(key=['id', 'image_id'])
>>> self.lookup(key=['id', 'image_id'])
>>> self.lookup(key='foo', default=None, keepid=True)
>>> self.lookup(key=['foo'], default=None, keepid=True)
>>> self.lookup(key=['id', 'image_id'], keepid=True)
get(key, default=NoParam, keepid=False)[source]

Lookup a list of object attributes

Parameters:
  • key (str) – name of the property you want to lookup
  • default – if specified, uses this value if it doesn’t exist in an ObjT.
  • keepid – if True, return a mapping from ids to the property
Returns:

a list of whatever type the object is Dict[str, ObjT]

Return type:

List[ObjT]

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> self = dset.annots()
>>> self.get('id')
>>> self.get(key='foo', default=None, keepid=True)
set(key, values)[source]

Assign a value to each annotation

Parameters:
  • key (str) – the annotation property to modify
  • values (Iterable | scalar) – an iterable of values to set for each annot in the dataset. If the item is not iterable, it is assigned to all objects.

Example

>>> dset = CocoDataset.demo()
>>> self = dset.annots()
>>> self.set('my-key1', 'my-scalar-value')
>>> self.set('my-key2', np.random.rand(len(self)))
>>> print('dset.imgs = {}'.format(ub.repr2(dset.imgs, nl=1)))
>>> self.get('my-key2')
class kwcoco.coco_dataset.ObjectGroups(groups, dset)[source]

Bases: ubelt.util_mixins.NiceRepr

An object for holding a groups of ObjectList1D objects

lookup(key, default=NoParam)[source]
class kwcoco.coco_dataset.Categories(ids, dset)[source]

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to category attributes

Example

>>> from kwcoco.coco_dataset import Categories  # NOQA
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo()
>>> ids = list(dset.cats.keys())
>>> self = Categories(ids, dset)
>>> print('self.name = {!r}'.format(self.name))
>>> print('self.supercategory = {!r}'.format(self.supercategory))
cids
name
supercategory
class kwcoco.coco_dataset.Videos(ids, dset)[source]

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to video attributes

Example

>>> from kwcoco.coco_dataset import Videos  # NOQA
>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('vidshapes5')
>>> ids = list(dset.index.videos.keys())
>>> self = Videos(ids, dset)
>>> print('self = {!r}'.format(self))
class kwcoco.coco_dataset.Images(ids, dset)[source]

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to image attributes

gids
gname
gpath
width
height
size
>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo().images()
>>> self._dset._ensure_imgsize()
>>> print(self.size)
[(512, 512), (300, 250), (256, 256)]
Type:Example
area
>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo().images()
>>> self._dset._ensure_imgsize()
>>> print(self.area)
[262144, 75000, 65536]
Type:Example
n_annots
>>> self = CocoDataset.demo().images()
>>> print(ub.repr2(self.n_annots, nl=0))
[9, 2, 0]
Type:Example
aids
>>> self = CocoDataset.demo().images()
>>> print(ub.repr2(list(map(list, self.aids)), nl=0))
[[1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11], []]
Type:Example
annots
>>> self = CocoDataset.demo().images()
>>> print(self.annots)
<AnnotGroups(n=3, m=3.7, s=3.9)>
Type:Example
class kwcoco.coco_dataset.Annots(ids, dset)[source]

Bases: kwcoco.coco_dataset.ObjectList1D

Vectorized access to annotation attributes

aids

The annotation ids of this column of annotations

images

Get the column of images

Returns:Images
image_id
category_id
gids

Get the column of image-ids

Returns:list of image ids
Return type:List[int]
cids

Get the column of category-ids

Returns:List[int]
cnames

Get the column of category names

Returns:List[int]
detections

Get the kwimage-style detection objects

Returns:kwimage.Detections

Example

>>> # xdoctest: +REQUIRES(module:kwimage)
>>> from kwcoco.coco_dataset import *  # NOQA
>>> self = CocoDataset.demo('shapes32').annots([1, 2, 11])
>>> dets = self.detections
>>> print('dets.data = {!r}'.format(dets.data))
>>> print('dets.meta = {!r}'.format(dets.meta))
boxes

Get the column of kwimage-style bounding boxes

Example

>>> self = CocoDataset.demo().annots([1, 2, 11])
>>> print(self.boxes)
<Boxes(xywh,
    array([[ 10,  10, 360, 490],
           [350,   5, 130, 290],
           [124,  96,  45,  18]]))>
xywh

Returns raw boxes

Example

>>> self = CocoDataset.demo().annots([1, 2, 11])
>>> print(self.xywh)
class kwcoco.coco_dataset.AnnotGroups(groups, dset)[source]

Bases: kwcoco.coco_dataset.ObjectGroups

cids
class kwcoco.coco_dataset.ImageGroups(groups, dset)[source]

Bases: kwcoco.coco_dataset.ObjectGroups

class kwcoco.coco_dataset.MixinCocoDepricate[source]

Bases: object

These functions are marked for deprication and may be removed at any time

lookup_imgs(filename=None)[source]

Linear search for an images with specific attributes

# DEPRICATE

Ignore:
filename = ‘201503.20150525.101841191.573975.png’ list(self.lookup_imgs(filename)) gid = 64940 img = self.imgs[gid] img[‘file_name’] = filename
lookup_anns(has=None)[source]

Linear search for an annotations with specific attributes

# DEPRICATE

Ignore:
list(self.lookup_anns(has=’radius’)) gid = 112888 img = self.imgs[gid] img[‘file_name’] = filename
class kwcoco.coco_dataset.MixinCocoExtras[source]

Bases: object

Misc functions for coco

load_image(gid_or_img)[source]

Reads an image from disk and

Parameters:gid_or_img (int or dict) – image id or image dict
Returns:the image
Return type:np.ndarray
load_image_fpath(gid_or_img)[source]
get_image_fpath(gid_or_img)[source]

Returns the full path to the image

Parameters:gid_or_img (int or dict) – image id or image dict
Returns:full path to the image
Return type:PathLike
get_auxillary_fpath(gid_or_img, channels)[source]

Returns the full path to auxillary data for an image

Parameters:
  • gid_or_img (int | dict) – an image or its id
  • channels (str) – the auxillary channel to load (e.g. disparity)

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes8', aux=True)
>>> self.get_auxillary_fpath(1, 'disparity')
load_annot_sample(aid_or_ann, image=None, pad=None)[source]

Reads the chip of an annotation. Note this is much less efficient than using a sampler, but it doesn’t require disk cache.

Parameters:
  • aid_or_int (int or dict) – annot id or dict
  • image (ArrayLike, default=None) – preloaded image (note: this process is inefficient unless image is specified)

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> sample = self.load_annot_sample(2, pad=100)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(sample['im'])
>>> kwplot.show_if_requested()
classmethod coerce(key, **kw)[source]
classmethod demo(key='photos', **kw)[source]

Create a toy coco dataset for testing and demo puposes

Parameters:
  • key (str) – either photos or shapes
  • **kw – if key is shapes, these arguments are passed to toydata generation

Example

>>> print(CocoDataset.demo('photos'))
>>> print(CocoDataset.demo('shapes', verbose=0))
>>> print(CocoDataset.demo('shapes256', verbose=0))
>>> print(CocoDataset.demo('shapes8', verbose=0))

Example

>>> import kwcoco
>>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, verbose=0, rng=None)
>>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, num_tracks=4, verbose=0, rng=44)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> pnums = kwplot.PlotNums(nSubplots=len(dset.imgs))
>>> fnum = 1
>>> for gx, gid in enumerate(dset.imgs.keys()):
>>>     canvas = dset.draw_image(gid=gid)
>>>     kwplot.imshow(canvas, pnum=pnums[gx], fnum=fnum)
>>>     #dset.show_image(gid=gid, pnum=pnums[gx])
>>> kwplot.show_if_requested()
category_graph()[source]

Construct a networkx category hierarchy

Returns:
graph: a directed graph where category names are
the nodes, supercategories define edges, and items in each category dict (e.g. category id) are added as node properties.
Return type:network.DiGraph

Example

>>> self = CocoDataset.demo()
>>> graph = self.category_graph()
>>> assert 'astronaut' in graph.nodes()
>>> assert 'keypoints' in graph.nodes['human']

import graphid graphid.util.show_nx(graph)

object_categories()[source]

Construct a consistent CategoryTree representation of object classes

Returns:category data structure
Return type:kwcoco.CategoryTree

Example

>>> self = CocoDataset.demo()
>>> classes = self.object_categories()
>>> print('classes = {}'.format(classes))
keypoint_categories()[source]

Construct a consistent CategoryTree representation of keypoint classes

Returns:category data structure
Return type:kwcoco.CategoryTree

Example

>>> self = CocoDataset.demo()
>>> classes = self.keypoint_categories()
>>> print('classes = {}'.format(classes))
missing_images(check_aux=False, verbose=0)[source]

Check for images that don’t exist

Parameters:check_aux (bool, default=Fasle) – if specified also checks auxillary images
Returns:bad indexes and paths
Return type:List[Tuple[int, str]]
corrupted_images(verbose=0)[source]

Check for images that don’t exist or can’t be opened

Returns:bad indexes and paths
Return type:List[Tuple[int, str]]
rename_categories(mapper, strict=False, preserve=False, rebuild=True, simple=True, merge_policy='ignore')[source]

Create a coarser categorization

Note: this function has been unstable in the past, and has not yet been properly stabalized. Either avoid or use with care. Ensuring simple=True should result in newer saner behavior that will likely be backwards compatible.

Todo

  • [X] Simple case where we relabel names with no conflicts
  • [ ] Case where annotation labels need to change to be coarser
    • dev note: see internal libraries for work on this
  • [ ] Other cases
Parameters:
  • mapper (dict or Function) – maps old names to new names.
  • strict (bool) – DEPRICATED IGNORE. if True, fails if mapper doesnt map all classes
  • preserve (bool) – DEPRICATED IGNORE. if True, preserve old categories as supercatgories. Broken.
  • simple (bool, default=True) – defaults to the new way of doing this. The old way is depricated.
  • merge_policy (str) – How to handle multiple categories that map to the same name. Can be update or ignore.

Example

>>> self = CocoDataset.demo()
>>> self.rename_categories({'astronomer': 'person',
>>>                         'astronaut': 'person',
>>>                         'mouth': 'person',
>>>                         'helmet': 'hat'}, preserve=0)
>>> assert 'hat' in self.name_to_cat
>>> assert 'helmet' not in self.name_to_cat
>>> # Test merge case
>>> self = CocoDataset.demo()
>>> mapper = {
>>>     'helmet': 'rocket',
>>>     'astronomer': 'rocket',
>>>     'human': 'rocket',
>>>     'mouth': 'helmet',
>>>     'star': 'gas'
>>> }
>>> self.rename_categories(mapper)
rebase(*args, **kw)[source]

Deprecated use reroot instead

reroot(new_root=None, old_root=None, absolute=False, check=True, safe=True, smart=False)[source]

Rebase image/data paths onto a new image/data root.

Parameters:
  • new_root (str, default=None) – New image root. If unspecified the current self.img_root is used.
  • old_root (str, default=None) – If specified, removes the root from file names. If unspecified, then the existing paths MUST be relative to new_root.
  • absolute (bool, default=False) – if True, file names are stored as absolute paths, otherwise they are relative to the new image root.
  • check (bool, default=True) – if True, checks that the images all exist.
  • safe (bool, default=True) – if True, does not overwrite values until all checks pass
  • smart (bool, default=False) – If True, we can try different reroot strategies and choose the one that works. Note, always be wary when algorithms try to be smart.
CommandLine:
xdoctest -m /home/joncrall/code/kwcoco/kwcoco/coco_dataset.py MixinCocoExtras.reroot

Todo

  • [ ] Incorporate maximum ordered subtree embedding once completed?
Ignore:
>>> # There might not be a way to easily handle the cases that I
>>> # want to here. Might need to discuss this.
>>> import kwcoco
>>> import os
>>> gname = 'images/foo.png'
>>> remote = '/remote/path'
>>> host = ub.ensure_app_cache_dir('kwcoco/tests/reroot')
>>> fpath = join(host, gname)
>>> ub.ensuredir(dirname(fpath))
>>> # In this test the image exists on the host path
>>> import kwimage
>>> kwimage.imwrite(fpath, np.random.rand(8, 8))
>>> #
>>> cases = {}
>>> # * given absolute paths on current machine
>>> cases['abs_curr'] = kwcoco.CocoDataset.from_image_paths([join(host, gname)])
>>> # * given "remote" rooted relative paths on current machine
>>> cases['rel_remoterooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname], img_root=remote)
>>> # * given "host" rooted relative paths on current machine
>>> cases['rel_hostrooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname], img_root=host)
>>> # * given unrooted relative paths on current machine
>>> cases['rel_unrooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname])
>>> # * given absolute paths on another machine
>>> cases['abs_remote'] = kwcoco.CocoDataset.from_image_paths([join(remote, gname)])
>>> def report(dset, name):
>>>     gid = 1
>>>     rel_fpath = dset.imgs[gid]['file_name']
>>>     abs_fpath = dset.get_image_fpath(gid)
>>>     color = 'green' if exists(abs_fpath) else 'red'
>>>     print('   * strategy_name = {!r}'.format(name))
>>>     print('       * rel_fpath = {!r}'.format(rel_fpath))
>>>     print('       * ' + ub.color_text('abs_fpath = {!r}'.format(abs_fpath), color))
>>> for key, dset in cases.items():
>>>     print('----')
>>>     print('case key = {!r}'.format(key))
>>>     print('ORIG = {!r}'.format(dset.imgs[1]['file_name']))
>>>     print('dset.img_root = {!r}'.format(dset.img_root))
>>>     print('missing_gids = {!r}'.format(dset.missing_images()))
>>>     print('cwd = {!r}'.format(os.getcwd()))
>>>     print('host = {!r}'.format(host))
>>>     print('remote = {!r}'.format(remote))
>>>     #
>>>     dset_None_rel = dset.copy().reroot(absolute=False, check=0)
>>>     report(dset_None_rel, 'dset_None_rel')
>>>     #
>>>     dset_None_abs = dset.copy().reroot(absolute=True, check=0)
>>>     report(dset_None_abs, 'dset_None_abs')
>>>     #
>>>     dset_host_rel = dset.copy().reroot(host, absolute=False, check=0)
>>>     report(dset_host_rel, 'dset_host_rel')
>>>     #
>>>     dset_host_abs = dset.copy().reroot(host, absolute=True, check=0)
>>>     report(dset_host_abs, 'dset_host_abs')
>>>     #
>>>     dset_remote_rel = dset.copy().reroot(host, old_root=remote, absolute=False, check=0)
>>>     report(dset_remote_rel, 'dset_remote_rel')
>>>     #
>>>     dset_remote_abs = dset.copy().reroot(host, old_root=remote, absolute=True, check=0)
>>>     report(dset_remote_abs, 'dset_remote_abs')

Example

>>> import kwcoco
>>> def report(dset, name):
>>>     gid = 1
>>>     abs_fpath = dset.get_image_fpath(gid)
>>>     rel_fpath = dset.imgs[gid]['file_name']
>>>     color = 'green' if exists(abs_fpath) else 'red'
>>>     print('strategy_name = {!r}'.format(name))
>>>     print(ub.color_text('abs_fpath = {!r}'.format(abs_fpath), color))
>>>     print('rel_fpath = {!r}'.format(rel_fpath))
>>> dset = self = kwcoco.CocoDataset.demo()
>>> # Change base relative directory
>>> img_root = ub.expandpath('~')
>>> print('ORIG self.imgs = {!r}'.format(self.imgs))
>>> print('ORIG dset.img_root = {!r}'.format(dset.img_root))
>>> print('NEW img_root       = {!r}'.format(img_root))
>>> self.reroot(img_root)
>>> report(self, 'self')
>>> print('NEW self.imgs = {!r}'.format(self.imgs))
>>> assert self.imgs[1]['file_name'].startswith('.cache')
>>> # Use absolute paths
>>> self.reroot(absolute=True)
>>> assert self.imgs[1]['file_name'].startswith(img_root)
>>> # Switch back to relative paths
>>> self.reroot()
>>> assert self.imgs[1]['file_name'].startswith('.cache')

Example

>>> # demo with auxillary data
>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes8', aux=True)
>>> img_root = ub.expandpath('~')
>>> print(self.imgs[1]['file_name'])
>>> print(self.imgs[1]['auxillary'][0]['file_name'])
>>> self.reroot(img_root)
>>> print(self.imgs[1]['file_name'])
>>> print(self.imgs[1]['auxillary'][0]['file_name'])
>>> assert self.imgs[1]['file_name'].startswith('.cache')
>>> assert self.imgs[1]['auxillary'][0]['file_name'].startswith('.cache')
data_root

In the future we may deprecate img_root for data_root

find_representative_images(gids=None)[source]

Find images that have a wide array of categories. Attempt to find the fewest images that cover all categories using images that contain both a large and small number of annotations.

Parameters:gids (None | List) – Subset of image ids to consider when finding representative images. Uses all images if unspecified.
Returns:list of image ids determined to be representative
Return type:List

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> gids = self.find_representative_images()
>>> print('gids = {!r}'.format(gids))
>>> gids = self.find_representative_images([3])
>>> print('gids = {!r}'.format(gids))
>>> self = kwcoco.CocoDataset.demo('shapes8')
>>> gids = self.find_representative_images()
>>> print('gids = {!r}'.format(gids))
>>> valid = {7, 1}
>>> gids = self.find_representative_images(valid)
>>> assert valid.issuperset(gids)
>>> print('gids = {!r}'.format(gids))
class kwcoco.coco_dataset.MixinCocoAttrs[source]

Bases: object

Expose methods to construct object lists / groups

annots(aids=None, gid=None)[source]

Return vectorized annotation objects

Parameters:
  • aids (List[int]) – annotation ids to reference, if unspecified all annotations are returned.
  • gid (int) – return all annotations that belong to this image id. mutually exclusive with aids arg.
Returns:

vectorized annotation object

Return type:

Annots

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> annots = self.annots()
>>> print(annots)
<Annots(num=11)>
>>> sub_annots = annots.take([1, 2, 3])
>>> print(sub_annots)
<Annots(num=3)>
>>> print(ub.repr2(sub_annots.get('bbox', None)))
[
    [350, 5, 130, 290],
    None,
    None,
]
images(gids=None)[source]

Return vectorized image objects

Parameters:gids (List[int]) – image ids to reference, if unspecified all images are returned.
Returns:vectorized images object
Return type:Images

Example

>>> self = CocoDataset.demo()
>>> images = self.images()
>>> print(images)
<Images(num=3)>
categories(cids=None)[source]

Return vectorized category objects

Example

>>> self = CocoDataset.demo()
>>> categories = self.categories()
>>> print(categories)
<Categories(num=8)>
videos(vidids=None)[source]

Return vectorized video objects

Example

>>> self = CocoDataset.demo('vidshapes2')
>>> videos = self.videos()
>>> print(videos)
>>> videos.lookup('name')
>>> videos.lookup('id')
>>> print('videos.objs = {}'.format(ub.repr2(videos.objs[0:2], nl=1)))
class kwcoco.coco_dataset.MixinCocoStats[source]

Bases: object

Methods for getting stats about the dataset

n_annots
n_images
n_cats
n_videos
keypoint_annotation_frequency()[source]

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo('shapes', rng=0)
>>> hist = self.keypoint_annotation_frequency()
>>> hist = ub.odict(sorted(hist.items()))
>>> # FIXME: for whatever reason demodata generation is not determenistic when seeded
>>> print(ub.repr2(hist))  # xdoc: +IGNORE_WANT
{
    'bot_tip': 6,
    'left_eye': 14,
    'mid_tip': 6,
    'right_eye': 14,
    'top_tip': 6,
}
category_annotation_frequency()[source]

Reports the number of annotations of each category

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> hist = self.category_annotation_frequency()
>>> print(ub.repr2(hist))
{
    'astroturf': 0,
    'human': 0,
    'astronaut': 1,
    'astronomer': 1,
    'helmet': 1,
    'rocket': 1,
    'mouth': 2,
    'star': 5,
}
category_annotation_type_frequency()[source]

Reports the number of annotations of each type for each category

Example

>>> self = CocoDataset.demo()
>>> hist = self.category_annotation_frequency()
>>> print(ub.repr2(hist))
basic_stats()[source]

Reports number of images, annotations, and categories.

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> print(ub.repr2(self.basic_stats()))
{
    'n_anns': 11,
    'n_imgs': 3,
    'n_videos': 0,
    'n_cats': 8,
}
>>> from kwcoco.demo.toydata import *  # NOQA
>>> dset = random_video_dset(render=True, num_frames=2, num_tracks=10, rng=0)
>>> print(ub.repr2(dset.basic_stats()))
{
    'n_anns': 20,
    'n_imgs': 2,
    'n_videos': 1,
    'n_cats': 3,
}
extended_stats()[source]

Reports number of images, annotations, and categories.

Example

>>> self = CocoDataset.demo()
>>> print(ub.repr2(self.extended_stats()))
boxsize_stats(anchors=None, perclass=True, gids=None, aids=None, verbose=0, clusterkw={}, statskw={})[source]

Compute statistics about bounding box sizes.

Also computes anchor boxes using kmeans if anchors is specified.

Parameters:
  • anchors (int) – if specified also computes box anchors
  • perclass (bool) – if True also computes stats for each category
  • gids (List[int], default=None) – if specified only compute stats for these image ids.
  • aids (List[int], default=None) – if specified only compute stats for these annotation ids.
  • verbose (int) – verbosity level
  • clusterkw (dict) – kwargs for sklearn.cluster.KMeans used if computing anchors.
  • statskw (dict) – kwargs for kwarray.stats_dict()
Returns:

Dict[str, Dict[str, Dict | ndarray]

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes32')
>>> infos = self.boxsize_stats(anchors=4, perclass=False)
>>> print(ub.repr2(infos, nl=-1, precision=2))
>>> infos = self.boxsize_stats(gids=[1], statskw=dict(median=True))
>>> print(ub.repr2(infos, nl=-1, precision=2))
class kwcoco.coco_dataset.MixinCocoDraw[source]

Bases: object

Matplotlib / display functionality

imread(gid)[source]

Loads a particular image

draw_image(gid)[source]

Use kwimage to draw all annotations on an image and return the pixels as a numpy array.

Returns:canvas
Return type:ndarray

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo('shapes8')
>>> self.draw_image(1)
>>> # Now you can dump the annotated image to disk / whatever
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.imshow(canvas)
show_image(gid=None, aids=None, aid=None, **kwargs)[source]

Use matplotlib to show an image with annotations overlaid

Parameters:
  • gid (int) – image to show
  • aids (list) – aids to highlight within the image
  • aid (int) – a specific aid to focus on. If gid is not give, look up gid based on this aid.
  • **kwargs – show_annots, show_aid, show_catname, show_kpname, show_segmentation, title, show_gid, show_filename, show_boxes,
Ignore:
# Programatically collect the kwargs for docs generation import xinspect import kwcoco kwargs = xinspect.get_kwargs(kwcoco.CocoDataset.show_image) print(ub.repr2(list(kwargs.keys()), nl=1, si=1))
class kwcoco.coco_dataset.MixinCocoAddRemove[source]

Bases: object

Mixin functions to dynamically add / remove annotations images and categories while maintaining lookup indexes.

add_video(name, id=None, **kw)[source]

Add a video to the dataset (dynamically updates the index)

Parameters:
  • name (str) – Unique name for this video.
  • id (None or int) – ADVANCED. Force using this image id.
  • **kw – stores arbitrary key/value pairs in this new video

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset()
>>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1)))
>>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1)))
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> vidid1 = self.add_video('foo', id=3)
>>> vidid2 = self.add_video('bar')
>>> vidid3 = self.add_video('baz')
>>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1)))
>>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1)))
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> gid1 = self.add_image('foo1.jpg', video_id=vidid1)
>>> gid2 = self.add_image('foo2.jpg', video_id=vidid1)
>>> gid3 = self.add_image('foo3.jpg', video_id=vidid1)
>>> self.add_image('bar1.jpg', video_id=vidid2)
>>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1)))
>>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1)))
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> self.remove_images([gid2])
>>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
add_image(file_name, id=None, **kw)[source]

Add an image to the dataset (dynamically updates the index)

Parameters:
  • file_name (str) – relative or absolute path to image
  • id (None or int) – ADVANCED. Force using this image id.
  • **kw – stores arbitrary key/value pairs in this new image

Example

>>> self = CocoDataset.demo()
>>> import kwimage
>>> gname = kwimage.grab_test_image_fpath('paraview')
>>> gid = self.add_image(gname)
>>> assert self.imgs[gid]['file_name'] == gname
add_annotation(image_id, category_id=None, bbox=None, id=None, **kw)[source]

Add an annotation to the dataset (dynamically updates the index)

Parameters:
  • image_id (int) – image_id to add to
  • category_id (int) – category_id to add to
  • bbox (list or kwimage.Boxes) – bounding box in xywh format
  • id (None or int) – ADVANCED. Force using this annotation id.
  • **kw – stores arbitrary key/value pairs in this new image

Example

>>> self = CocoDataset.demo()
>>> image_id = 1
>>> cid = 1
>>> bbox = [10, 10, 20, 20]
>>> aid = self.add_annotation(image_id, cid, bbox)
>>> assert self.anns[aid]['bbox'] == bbox

Example

>>> # Attempt to annot without a category or bbox
>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> image_id = 1
>>> aid = self.add_annotation(image_id)
>>> assert None in self.index.cid_to_aids
add_category(name, supercategory=None, id=None, **kw)[source]

Adds a category

Parameters:
  • name (str) – name of the new category
  • supercategory (str, optional) – parent of this category
  • id (int, optional) – use this category id, if it was not taken
  • **kw – stores arbitrary key/value pairs in this new image

Example

>>> self = CocoDataset.demo()
>>> prev_n_cats = self.n_cats
>>> cid = self.add_category('dog', supercategory='object')
>>> assert self.cats[cid]['name'] == 'dog'
>>> assert self.n_cats == prev_n_cats + 1
>>> import pytest
>>> with pytest.raises(ValueError):
>>>     self.add_category('dog', supercategory='object')
ensure_image(file_name, id=None, **kw)[source]

Like add_image, but returns the existing image id if it already exists instead of failing. In this case all metadata is ignored.

Parameters:
  • file_name (str) – relative or absolute path to image
  • id (None or int) – ADVANCED. Force using this image id.
  • **kw – stores arbitrary key/value pairs in this new image
Returns:

the existing or new image id

Return type:

int

ensure_category(name, supercategory=None, id=None, **kw)[source]

Like add_category, but returns the existing category id if it already exists instead of failing. In this case all metadata is ignored.

Returns:the existing or new category id
Return type:int
add_annotations(anns)[source]

Faster less-safe multi-item alternative

Parameters:anns (List[Dict]) – list of annotation dictionaries

Example

>>> self = CocoDataset.demo()
>>> anns = [self.anns[aid] for aid in [2, 3, 5, 7]]
>>> self.remove_annotations(anns)
>>> assert self.n_annots == 7 and self._check_index()
>>> self.add_annotations(anns)
>>> assert self.n_annots == 11 and self._check_index()
add_images(imgs)[source]

Faster less-safe multi-item alternative

Note

THIS FUNCTION WAS DESIGNED FOR SPEED, AS SUCH IT DOES NOT CHECK IF THE IMAGE-IDs or FILE_NAMES ARE DUPLICATED AND WILL BLINDLY ADD DATA EVEN IF IT IS BAD. THE SINGLE IMAGE VERSION IS SLOWER BUT SAFER.

Parameters:imgs (List[Dict]) – list of image dictionaries

Example

>>> imgs = CocoDataset.demo().dataset['images']
>>> self = CocoDataset()
>>> self.add_images(imgs)
>>> assert self.n_images == 3 and self._check_index()
clear_images()[source]

Removes all images and annotations (but not categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_images()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 0, n_videos: 0, n_cats: 8
clear_annotations()[source]

Removes all annotations (but not images and categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_annotations()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 3, n_videos: 0, n_cats: 8
remove_all_images()

Removes all images and annotations (but not categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_images()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 0, n_videos: 0, n_cats: 8
remove_all_annotations()

Removes all annotations (but not images and categories)

Example

>>> self = CocoDataset.demo()
>>> self.clear_annotations()
>>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1))
n_anns: 0, n_imgs: 3, n_videos: 0, n_cats: 8
remove_annotation(aid_or_ann)[source]

Remove a single annotation from the dataset

If you have multiple annotations to remove its more efficient to remove them in batch with self.remove_annotations

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]]
>>> self.remove_annotations(aids_or_anns)
>>> assert len(self.dataset['annotations']) == 7
>>> self._check_index()
remove_annotations(aids_or_anns, verbose=0, safe=True)[source]

Remove multiple annotations from the dataset.

Parameters:
  • anns_or_aids (List) – list of annotation dicts or ids
  • safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns:

num_removed: information on the number of items removed

Return type:

Dict

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> prev_n_annots = self.n_annots
>>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]]
>>> self.remove_annotations(aids_or_anns)  # xdoc: +IGNORE_WANT
{'annotations': 4}
>>> assert len(self.dataset['annotations']) == prev_n_annots - 4
>>> self._check_index()
remove_categories(cat_identifiers, keep_annots=False, verbose=0, safe=True)[source]

Remove categories and all annotations in those categories. Currently does not change any hierarchy information

Parameters:
  • cat_identifiers (List) – list of category dicts, names, or ids
  • keep_annots (bool, default=False) – if True, keeps annotations, but removes category labels.
  • safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns:

num_removed: information on the number of items removed

Return type:

Dict

Example

>>> self = CocoDataset.demo()
>>> cat_identifiers = [self.cats[1], 'rocket', 3]
>>> self.remove_categories(cat_identifiers)
>>> assert len(self.dataset['categories']) == 5
>>> self._check_index()
remove_images(gids_or_imgs, verbose=0, safe=True)[source]
Parameters:
  • gids_or_imgs (List) – list of image dicts, names, or ids
  • safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns:

num_removed: information on the number of items removed

Return type:

Dict

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> assert len(self.dataset['images']) == 3
>>> gids_or_imgs = [self.imgs[2], 'astro.png']
>>> self.remove_images(gids_or_imgs)  # xdoc: +IGNORE_WANT
{'annotations': 11, 'images': 2}
>>> assert len(self.dataset['images']) == 1
>>> self._check_index()
>>> gids_or_imgs = [3]
>>> self.remove_images(gids_or_imgs)
>>> assert len(self.dataset['images']) == 0
>>> self._check_index()
remove_annotation_keypoints(kp_identifiers)[source]

Removes all keypoints with a particular category

Parameters:kp_identifiers (List) – list of keypoint category dicts, names, or ids
Returns:num_removed: information on the number of items removed
Return type:Dict
remove_keypoint_categories(kp_identifiers)[source]

Removes all keypoints of a particular category as well as all annotation keypoints with those ids.

Parameters:kp_identifiers (List) – list of keypoint category dicts, names, or ids
Returns:num_removed: information on the number of items removed
Return type:Dict

Example

>>> self = CocoDataset.demo('shapes', rng=0)
>>> kp_identifiers = ['left_eye', 'mid_tip']
>>> remove_info = self.remove_keypoint_categories(kp_identifiers)
>>> print('remove_info = {!r}'.format(remove_info))
>>> # FIXME: for whatever reason demodata generation is not determenistic when seeded
>>> # assert remove_info == {'keypoint_categories': 2, 'annotation_keypoints': 16, 'reflection_ids': 1}
>>> assert self._resolve_to_kpcat('right_eye')['reflection_id'] is None
set_annotation_category(aid_or_ann, cid_or_cat)[source]

Sets the category of a single annotation

Parameters:
  • aid_or_ann (dict | int) – annotation dict or id
  • cid_or_cat (dict | int) – category dict or id

Example

>>> import kwcoco
>>> self = kwcoco.CocoDataset.demo()
>>> old_freq = self.category_annotation_frequency()
>>> aid_or_ann = aid = 2
>>> cid_or_cat = new_cid = self.ensure_category('kitten')
>>> self.set_annotation_category(aid, new_cid)
>>> new_freq = self.category_annotation_frequency()
>>> print('new_freq = {}'.format(ub.repr2(new_freq, nl=1)))
>>> print('old_freq = {}'.format(ub.repr2(old_freq, nl=1)))
>>> assert sum(new_freq.values()) == sum(old_freq.values())
>>> assert new_freq['kitten'] == 1
class kwcoco.coco_dataset.CocoIndex[source]

Bases: object

Fast lookup index for the COCO dataset with dynamic modification

Variables:
  • imgs (Dict[int, dict]) – mapping between image ids and the image dictionaries
  • anns (Dict[int, dict]) – mapping between annotation ids and the annotation dictionaries
  • cats (Dict[int, dict]) – mapping between category ids and the category dictionaries
cid_to_gids
>>> import kwcoco
>>> self = dset = kwcoco.CocoDataset()
>>> self.index.cid_to_gids
Type:Example
clear()[source]
build(parent)[source]

Build all id-to-obj reverse indexes from scratch.

Parameters:parent (CocoDataset) – the dataset to index
Notation:
aid - Annotation ID gid - imaGe ID cid - Category ID vidid - Video ID

Example

>>> from kwcoco.demo.toydata import *  # NOQA
>>> parent = CocoDataset.demo('vidshapes1', num_frames=4, rng=1)
>>> index = parent.index
>>> index.build(parent)
class kwcoco.coco_dataset.MixinCocoIndex[source]

Bases: object

Give the dataset top level access to index attributes

anns
imgs
cats
videos
gid_to_aids
cid_to_aids
name_to_cat
class kwcoco.coco_dataset.CocoDataset(data=None, tag=None, img_root=None, autobuild=True)[source]

Bases: ubelt.util_mixins.NiceRepr, kwcoco.coco_dataset.MixinCocoAddRemove, kwcoco.coco_dataset.MixinCocoStats, kwcoco.coco_dataset.MixinCocoAttrs, kwcoco.coco_dataset.MixinCocoDraw, kwcoco.coco_dataset.MixinCocoExtras, kwcoco.coco_dataset.MixinCocoIndex, kwcoco.coco_dataset.MixinCocoDepricate

Notes

A keypoint annotation
{
“image_id” : int, “category_id” : int, “keypoints” : [x1,y1,v1,…,xk,yk,vk], “score” : float,

} Note that v[i] is a visibility flag, where v=0: not labeled,

v=1: labeled but not visible, and v=2: labeled and visible.
A bounding box annotation
{
“image_id” : int, “category_id” : int, “bbox” : [x,y,width,height], “score” : float,

}

We also define a non-standard “line” annotation (which
our fixup scripts will interpret as the diameter of a circle to convert into a bounding box)
A line* annotation (note this is a non-standard field)
{
“image_id” : int, “category_id” : int, “line” : [x1,y1,x2,y2], “score” : float,

}

Lastly, note that our datasets will sometimes specify multiple bbox, line, and/or, keypoints fields. In this case we may also specify a field roi_shape, which denotes which field is the “main” annotation type.

Variables:
  • dataset (Dict) – raw json data structure. This is the base dictionary that contains {‘annotations’: List, ‘images’: List, ‘categories’: List}
  • index (CocoIndex) – an efficient lookup index into the coco data structure. The index defines its own attributes like anns, cats, imgs, etc. See CocoIndex for more details on which attributes are available.
  • fpath (PathLike | None) – if known, this stores the filepath the dataset was loaded from
  • tag (str) – A tag indicating the name of the dataset.
  • img_root (PathLike | None) – If known, this is the root path that all image file names are relative to. This can also be manually overwritten by the user.
  • hashid (str | None) – If computed, this will be a hash uniquely identifing the dataset. To ensure this is computed see _build_hashid().

References

http://cocodataset.org/#format http://cocodataset.org/#download

CommandLine:
python -m kwcoco.coco_dataset CocoDataset –show

Example

>>> dataset = demo_coco_data()
>>> self = CocoDataset(dataset, tag='demo')
>>> # xdoctest: +REQUIRES(--show)
>>> self.show_image(gid=2)
>>> from matplotlib import pyplot as plt
>>> plt.show()
classmethod from_data(data, img_root=None)[source]

Constructor from a json dictionary

classmethod from_image_paths(gpaths, img_root=None)[source]

Constructor from a list of images paths

Example

>>> coco_dset = CocoDataset.from_image_paths(['a.png', 'b.png'])
>>> assert coco_dset.n_images == 2
classmethod from_coco_paths(fpaths, max_workers=0, verbose=1, mode='thread', union='try')[source]

Constructor from multiple coco file paths.

Loads multiple coco datasets and unions the result

Notes

if the union operation fails, the list of individually loaded files is returned instead.

Parameters:
  • fpaths (List[str]) – list of paths to multiple coco files to be loaded and unioned.
  • max_workers (int, default=0) – number of worker threads / processes
  • verbose (int) – verbosity level
  • mode (str) – thread, process, or serial
  • union (str | bool, default=’try’) – If True, unions the result datasets after loading. If False, just returns the result list. If ‘try’, then try to preform the union, but return the result list if it fails.
copy()[source]

Deep copies this object

Example

>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> new = self.copy()
>>> assert new.imgs[1] is new.dataset['images'][0]
>>> assert new.imgs[1] == self.dataset['images'][0]
>>> assert new.imgs[1] is not self.dataset['images'][0]
dumps(indent=None, newlines=False)[source]

Writes the dataset out to the json format

Parameters:newlines (bool) – if True, each annotation, image, category gets its own line

Notes

Using newlines=True is similar to:
print(ub.repr2(dset.dataset, nl=2, trailsep=False)) However, the above may not output valid json if it contains ndarrays.

Example

>>> from kwcoco.coco_dataset import *
>>> import json
>>> self = CocoDataset.demo()
>>> text = self.dumps(newlines=True)
>>> print(text)
>>> self2 = CocoDataset(json.loads(text), tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset
>>> text = self.dumps(newlines=True)
>>> print(text)
>>> self2 = CocoDataset(json.loads(text), tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset
Ignore:
for k in self2.dataset:
if self.dataset[k] == self2.dataset[k]:
print(‘YES: k = {!r}’.format(k))
else:
print(‘NO: k = {!r}’.format(k))

self2.dataset[‘categories’] self.dataset[‘categories’]

dump(file, indent=None, newlines=False)[source]

Writes the dataset out to the json format

Parameters:
  • file (PathLike | FileLike) – Where to write the data. Can either be a path to a file or an open file pointer / stream.
  • newlines (bool) – if True, each annotation, image, category gets its own line.

Example

>>> import tempfile
>>> from kwcoco.coco_dataset import *
>>> self = CocoDataset.demo()
>>> file = tempfile.NamedTemporaryFile('w')
>>> self.dump(file)
>>> file.seek(0)
>>> text = open(file.name, 'r').read()
>>> print(text)
>>> file.seek(0)
>>> dataset = json.load(open(file.name, 'r'))
>>> self2 = CocoDataset(dataset, tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset
>>> file = tempfile.NamedTemporaryFile('w')
>>> self.dump(file, newlines=True)
>>> file.seek(0)
>>> text = open(file.name, 'r').read()
>>> print(text)
>>> file.seek(0)
>>> dataset = json.load(open(file.name, 'r'))
>>> self2 = CocoDataset(dataset, tag='demo2')
>>> assert self2.dataset == self.dataset
>>> assert self2.dataset is not self.dataset
union(*others, **kwargs)[source]

Merges multiple CocoDataset items into one. Names and associations are retained, but ids may be different.

Parameters:
  • self – note that union() can be called as an instance method or a class method. If it is a class method, then this is the class type, otherwise the instance will also be unioned with others.
  • *others – a series of CocoDatasets that we will merge
  • **kwargs – constructor options for the new merged CocoDataset
Returns:

a new merged coco dataset

Return type:

CocoDataset

Example

>>> # Test union works with different keypoint categories
>>> dset1 = CocoDataset.demo('shapes1')
>>> dset2 = CocoDataset.demo('shapes2')
>>> dset1.remove_keypoint_categories(['bot_tip', 'mid_tip', 'right_eye'])
>>> dset2.remove_keypoint_categories(['top_tip', 'left_eye'])
>>> dset_12a = CocoDataset.union(dset1, dset2)
>>> dset_12b = dset1.union(dset2)
>>> dset_21 = dset2.union(dset1)
>>> def add_hist(h1, h2):
>>>     return {k: h1.get(k, 0) + h2.get(k, 0) for k in set(h1) | set(h2)}
>>> kpfreq1 = dset1.keypoint_annotation_frequency()
>>> kpfreq2 = dset2.keypoint_annotation_frequency()
>>> kpfreq_want = add_hist(kpfreq1, kpfreq2)
>>> kpfreq_got1 = dset_12a.keypoint_annotation_frequency()
>>> kpfreq_got2 = dset_12b.keypoint_annotation_frequency()
>>> assert kpfreq_want == kpfreq_got1
>>> assert kpfreq_want == kpfreq_got2
>>> # Test disjoint gid datasets
>>> import kwcoco
>>> dset1 = kwcoco.CocoDataset.demo('shapes3')
>>> for new_gid, img in enumerate(dset1.dataset['images'], start=10):
>>>     for aid in dset1.gid_to_aids[img['id']]:
>>>         dset1.anns[aid]['image_id'] = new_gid
>>>     img['id'] = new_gid
>>> dset1.index.clear()
>>> dset1._build_index()
>>> # ------
>>> dset2 = kwcoco.CocoDataset.demo('shapes2')
>>> for new_gid, img in enumerate(dset2.dataset['images'], start=100):
>>>     for aid in dset2.gid_to_aids[img['id']]:
>>>         dset2.anns[aid]['image_id'] = new_gid
>>>     img['id'] = new_gid
>>> dset1.index.clear()
>>> dset2._build_index()
>>> others = [dset1, dset2]
>>> merged = kwcoco.CocoDataset.union(*others)
>>> print('merged = {!r}'.format(merged))
>>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1)))
>>> assert set(merged.imgs) & set([10, 11, 12, 100, 101]) == set(merged.imgs)
>>> # Test data is not preserved
>>> dset2 = kwcoco.CocoDataset.demo('shapes2')
>>> dset1 = kwcoco.CocoDataset.demo('shapes3')
>>> others = (dset1, dset2)
>>> cls = self = kwcoco.CocoDataset
>>> merged = cls.union(*others)
>>> print('merged = {!r}'.format(merged))
>>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1)))
>>> assert set(merged.imgs) & set([1, 2, 3, 4, 5]) == set(merged.imgs)

Todo

  • [ ] are supercategories broken?
  • [ ] reuse image ids where possible
  • [ ] reuse annotation / category ids where possible
  • [ ] disambiguate track-ids
  • [x] disambiguate video-ids
subset(gids, copy=False, autobuild=True)[source]

Return a subset of the larger coco dataset by specifying which images to port. All annotations in those images will be taken.

Parameters:
  • gids (List[int]) – image-ids to copy into a new dataset
  • copy (bool, default=False) – if True, makes a deep copy of all nested attributes, otherwise makes a shallow copy.
  • autobuild (bool, default=True) – if True will automatically build the fast lookup index.

Example

>>> self = CocoDataset.demo()
>>> gids = [1, 3]
>>> sub_dset = self.subset(gids)
>>> assert len(self.gid_to_aids) == 3
>>> assert len(sub_dset.gid_to_aids) == 2

Example

>>> self = CocoDataset.demo()
>>> sub1 = self.subset([1])
>>> sub2 = self.subset([2])
>>> sub3 = self.subset([3])
>>> others = [sub1, sub2, sub3]
>>> rejoined = CocoDataset.union(*others)
>>> assert len(sub1.anns) == 9
>>> assert len(sub2.anns) == 2
>>> assert len(sub3.anns) == 0
>>> assert rejoined.basic_stats() == self.basic_stats()
kwcoco.coco_dataset.demo_coco_data()[source]

Simple data for testing

Ignore:
# code for getting a segmentation polygon kwimage.grab_test_image_fpath(‘astro’) labelme /home/joncrall/.cache/kwimage/demodata/astro.png cat /home/joncrall/.cache/kwimage/demodata/astro.json

Example

>>> # xdoctest: +REQUIRES(--show)
>>> from kwcoco.coco_dataset import demo_coco_data, CocoDataset
>>> dataset = demo_coco_data()
>>> self = CocoDataset(dataset, tag='demo')
>>> import kwplot
>>> kwplot.autompl()
>>> self.show_image(gid=1)
>>> kwplot.show_if_requested()