kwcoco.coco_dataset module¶
An implementation and extension of the original MS-COCO API [1].
Extends the format to also include line annotations.
Dataset Spec:
- Note: a formal spec has been defined in
category = {
'id': int,
'name': str,
'supercategory': Optional[str],
'keypoints': Optional(List[str]),
'skeleton': Optional(List[Tuple[Int, Int]]),
}
image = {
'id': int,
'file_name': str
}
dataset = {
# these are object level categories
'categories': [category],
'images': [image]
...
],
'annotations': [
{
'id': Int,
'image_id': Int,
'category_id': Int,
'track_id': Optional[Int],
'bbox': [tl_x, tl_y, w, h], # optional (xywh format)
"score" : float, # optional
"prob" : List[float], # optional
"weight" : float, # optional
"caption": str, # an optional text caption for this annotation
"iscrowd" : <0 or 1>, # denotes if the annotation covers a single object (0) or multiple objects (1)
"keypoints" : [x1,y1,v1,...,xk,yk,vk], # or new dict-based format
'segmentation': <RunLengthEncoding | Polygon>, # formats are defined bellow
},
...
],
'licenses': [],
'info': [],
}
Polygon:
A flattned list of xy coordinates.
[x1, y1, x2, y2, ..., xn, yn]
or a list of flattned list of xy coordinates if the CCs are disjoint
[[x1, y1, x2, y2, ..., xn, yn], [x1, y1, ..., xm, ym],]
Note: the original coco spec does not allow for holes in polygons.
We also allow a non-standard dictionary encoding of polygons
{'exterior': [(x1, y1)...],
'interiors': [[(x1, y1), ...], ...]}
RunLengthEncoding:
The RLE can be in a special bytes encoding or in a binary array
encoding. We reuse the original C functions are in [2]_ in
``kwimage.structs.Mask`` to provide a convinient way to abstract this
rather esoteric bytes encoding.
For pure python implementations see kwimage:
Converting from an image to RLE can be done via kwimage.run_length_encoding
Converting from RLE back to an image can be done via:
kwimage.decode_run_length
For compatibility with the COCO specs ensure the binary flags
for these functions are set to true.
Keypoints:
Annotation keypoints may also be specified in this non-standard (but
ultimately more general) way:
'annotations': [
{
'keypoints': [
{
'xy': <x1, y1>,
'visible': <0 or 1 or 2>,
'keypoint_category_id': <kp_cid>,
'keypoint_category': <kp_name, optional>, # this can be specified instead of an id
}, ...
]
}, ...
],
'keypoint_categories': [{
'name': <str>,
'id': <int>, # an id for this keypoint category
'supercategory': <kp_name> # name of coarser parent keypoint class (for hierarchical keypoints)
'reflection_id': <kp_cid> # specify only if the keypoint id would be swapped with another keypoint type
},...
]
In this scheme the "keypoints" property of each annotation (which used
to be a list of floats) is now specified as a list of dictionaries that
specify each keypoints location, id, and visibility explicitly. This
allows for things like non-unique keypoints, partial keypoint
annotations. This also removes the ordering requirement, which makes it
simpler to keep track of each keypoints class type.
We also have a new top-level dictionary to specify all the possible
keypoint categories.
Auxillary Channels:
For multimodal or multispectral images it is possible to specify
auxillary channels in an image dictionary as follows:
{
'id': int, 'file_name': str
'channels': <spec>, # a spec code that indicates the layout of these channels.
'auxillary': [ # information about auxillary channels
{
'file_name':
'channels': <spec>
}, ... # can have many auxillary channels with unique specs
]
}
Video Sequences:
For video sequences, we add the following video level index:
"videos": [
{ "id": <int>, "name": <video_name:str> },
]
Note that the videos might be given as encoded mp4/avi/etc.. files (in
which case the name should correspond to a path) or as a series of
frames in which case the images should be used to index the extracted
frames and information in them.
Then image dictionaries are augmented as follows:
{
'video_id': str # optional, if this image is a frame in a video sequence, this id is shared by all frames in that sequence.
'timestamp': int # optional, timestamp (ideally in flicks), used to identify the timestamp of the frame. Only applicable video inputs.
'frame_index': int # optional, ordinal frame index which can be used if timestamp is unknown.
}
And annotations are augmented as follows:
{
"track_id": <int | str | uuid> # optional, indicates association between annotations across frames
}
Notes
The main object in this file is class:CocoDataset, which is composed of several mixin classes. See the class and method documentation for more details.
Todo
- [ ] Use ijson to lazilly load pieces of the dataset in the background or on demand. This will give us faster access to categories / images, whereas we will always have to wait for annotations etc…
- [ ] Should img_root be changed to data root?
- [ ] Read video data, return numpy arrays (requires API for images)
- [ ] Spec for video URI, and convert to frames @ framerate function.
- [ ] remove videos
References
[1] | http://cocodataset.org/#format-data |
[2] | https://github.com/nightrome/cocostuffapi/blob/master/PythonAPI/pycocotools/mask.py |
[3] | https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch/#coco-dataset-format |
-
class
kwcoco.coco_dataset.
ObjectList1D
(ids, dset, key)[source]¶ Bases:
ubelt.util_mixins.NiceRepr
Vectorized access to lists of dictionary objects
Lightweight reference to a set of object (e.g. annotations, images) that allows for convenient property access.
Parameters: - ids (List[int]) – list of ids
- dset (CocoDataset) – parent dataset
- key (str) – main object name (e.g. ‘images’, ‘annotations’)
- Types:
- ObjT = Ann | Img | Cat # can be one of these types ObjectList1D gives us access to a List[ObjT]
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo() >>> # Both annots and images are object lists >>> self = dset.annots() >>> self = dset.images() >>> # can call with a list of ids or not, for everything >>> self = dset.annots([1, 2, 11]) >>> self = dset.images([1, 2, 3]) >>> self.lookup('id') >>> self.lookup(['id'])
-
objs
¶ all object dictionaries
Type: Returns Type: List
-
take
(idxs)[source]¶ Take a subset by index
Example
>>> self = CocoDataset.demo().annots() >>> assert len(self.take([0, 2, 3])) == 3
-
compress
(flags)[source]¶ Take a subset by flags
Example
>>> self = CocoDataset.demo().images() >>> assert len(self.compress([True, False, True])) == 2
-
lookup
(key, default=NoParam, keepid=False)[source]¶ Lookup a list of object attributes
Parameters: - key (str | Iterable) – name of the property you want to lookup can also be a list of names, in which case we return a dict
- default – if specified, uses this value if it doesn’t exist in an ObjT.
- keepid – if True, return a mapping from ids to the property
Returns: a list of whatever type the object is Dict[str, ObjT]
Return type: List[ObjT]
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo() >>> self = dset.annots() >>> self.lookup('id') >>> key = ['id'] >>> default = None >>> self.lookup(key=['id', 'image_id']) >>> self.lookup(key=['id', 'image_id']) >>> self.lookup(key='foo', default=None, keepid=True) >>> self.lookup(key=['foo'], default=None, keepid=True) >>> self.lookup(key=['id', 'image_id'], keepid=True)
-
get
(key, default=NoParam, keepid=False)[source]¶ Lookup a list of object attributes
Parameters: - key (str) – name of the property you want to lookup
- default – if specified, uses this value if it doesn’t exist in an ObjT.
- keepid – if True, return a mapping from ids to the property
Returns: a list of whatever type the object is Dict[str, ObjT]
Return type: List[ObjT]
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo() >>> self = dset.annots() >>> self.get('id') >>> self.get(key='foo', default=None, keepid=True)
-
set
(key, values)[source]¶ Assign a value to each annotation
Parameters: - key (str) – the annotation property to modify
- values (Iterable | scalar) – an iterable of values to set for each annot in the dataset. If the item is not iterable, it is assigned to all objects.
Example
>>> dset = CocoDataset.demo() >>> self = dset.annots() >>> self.set('my-key1', 'my-scalar-value') >>> self.set('my-key2', np.random.rand(len(self))) >>> print('dset.imgs = {}'.format(ub.repr2(dset.imgs, nl=1))) >>> self.get('my-key2')
-
class
kwcoco.coco_dataset.
ObjectGroups
(groups, dset)[source]¶ Bases:
ubelt.util_mixins.NiceRepr
An object for holding a groups of
ObjectList1D
objects
-
class
kwcoco.coco_dataset.
Categories
(ids, dset)[source]¶ Bases:
kwcoco.coco_dataset.ObjectList1D
Vectorized access to category attributes
Example
>>> from kwcoco.coco_dataset import Categories # NOQA >>> import kwcoco >>> dset = kwcoco.CocoDataset.demo() >>> ids = list(dset.cats.keys()) >>> self = Categories(ids, dset) >>> print('self.name = {!r}'.format(self.name)) >>> print('self.supercategory = {!r}'.format(self.supercategory))
-
cids
¶
-
name
¶
-
supercategory
¶
-
-
class
kwcoco.coco_dataset.
Videos
(ids, dset)[source]¶ Bases:
kwcoco.coco_dataset.ObjectList1D
Vectorized access to video attributes
Example
>>> from kwcoco.coco_dataset import Videos # NOQA >>> import kwcoco >>> dset = kwcoco.CocoDataset.demo('vidshapes5') >>> ids = list(dset.index.videos.keys()) >>> self = Videos(ids, dset) >>> print('self = {!r}'.format(self))
-
class
kwcoco.coco_dataset.
Images
(ids, dset)[source]¶ Bases:
kwcoco.coco_dataset.ObjectList1D
Vectorized access to image attributes
-
gids
¶
-
gname
¶
-
gpath
¶
-
width
¶
-
height
¶
-
size
¶ >>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo().images() >>> self._dset._ensure_imgsize() >>> print(self.size) [(512, 512), (300, 250), (256, 256)]
Type: Example
-
area
¶ >>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo().images() >>> self._dset._ensure_imgsize() >>> print(self.area) [262144, 75000, 65536]
Type: Example
-
n_annots
¶ >>> self = CocoDataset.demo().images() >>> print(ub.repr2(self.n_annots, nl=0)) [9, 2, 0]
Type: Example
-
aids
¶ >>> self = CocoDataset.demo().images() >>> print(ub.repr2(list(map(list, self.aids)), nl=0)) [[1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11], []]
Type: Example
-
annots
¶ >>> self = CocoDataset.demo().images() >>> print(self.annots) <AnnotGroups(n=3, m=3.7, s=3.9)>
Type: Example
-
-
class
kwcoco.coco_dataset.
Annots
(ids, dset)[source]¶ Bases:
kwcoco.coco_dataset.ObjectList1D
Vectorized access to annotation attributes
-
aids
¶ The annotation ids of this column of annotations
-
images
¶ Get the column of images
Returns: Images
-
image_id
¶
-
category_id
¶
-
cids
¶ Get the column of category-ids
Returns: List[int]
-
cnames
¶ Get the column of category names
Returns: List[int]
-
detections
¶ Get the kwimage-style detection objects
Returns: kwimage.Detections Example
>>> # xdoctest: +REQUIRES(module:kwimage) >>> from kwcoco.coco_dataset import * # NOQA >>> self = CocoDataset.demo('shapes32').annots([1, 2, 11]) >>> dets = self.detections >>> print('dets.data = {!r}'.format(dets.data)) >>> print('dets.meta = {!r}'.format(dets.meta))
-
boxes
¶ Get the column of kwimage-style bounding boxes
Example
>>> self = CocoDataset.demo().annots([1, 2, 11]) >>> print(self.boxes) <Boxes(xywh, array([[ 10, 10, 360, 490], [350, 5, 130, 290], [124, 96, 45, 18]]))>
-
xywh
¶ Returns raw boxes
Example
>>> self = CocoDataset.demo().annots([1, 2, 11]) >>> print(self.xywh)
-
-
class
kwcoco.coco_dataset.
AnnotGroups
(groups, dset)[source]¶ Bases:
kwcoco.coco_dataset.ObjectGroups
-
cids
¶
-
-
class
kwcoco.coco_dataset.
MixinCocoDepricate
[source]¶ Bases:
object
These functions are marked for deprication and may be removed at any time
-
class
kwcoco.coco_dataset.
MixinCocoExtras
[source]¶ Bases:
object
Misc functions for coco
-
load_image
(gid_or_img)[source]¶ Reads an image from disk and
Parameters: gid_or_img (int or dict) – image id or image dict Returns: the image Return type: np.ndarray
-
get_image_fpath
(gid_or_img)[source]¶ Returns the full path to the image
Parameters: gid_or_img (int or dict) – image id or image dict Returns: full path to the image Return type: PathLike
-
get_auxillary_fpath
(gid_or_img, channels)[source]¶ Returns the full path to auxillary data for an image
Parameters: - gid_or_img (int | dict) – an image or its id
- channels (str) – the auxillary channel to load (e.g. disparity)
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes8', aux=True) >>> self.get_auxillary_fpath(1, 'disparity')
-
load_annot_sample
(aid_or_ann, image=None, pad=None)[source]¶ Reads the chip of an annotation. Note this is much less efficient than using a sampler, but it doesn’t require disk cache.
Parameters: - aid_or_int (int or dict) – annot id or dict
- image (ArrayLike, default=None) – preloaded image (note: this process is inefficient unless image is specified)
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> sample = self.load_annot_sample(2, pad=100) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(sample['im']) >>> kwplot.show_if_requested()
-
classmethod
demo
(key='photos', **kw)[source]¶ Create a toy coco dataset for testing and demo puposes
Parameters: - key (str) – either photos or shapes
- **kw – if key is shapes, these arguments are passed to toydata generation
Example
>>> print(CocoDataset.demo('photos')) >>> print(CocoDataset.demo('shapes', verbose=0)) >>> print(CocoDataset.demo('shapes256', verbose=0)) >>> print(CocoDataset.demo('shapes8', verbose=0))
Example
>>> import kwcoco >>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, verbose=0, rng=None) >>> dset = kwcoco.CocoDataset.demo('vidshapes5', num_frames=5, num_tracks=4, verbose=0, rng=44) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> pnums = kwplot.PlotNums(nSubplots=len(dset.imgs)) >>> fnum = 1 >>> for gx, gid in enumerate(dset.imgs.keys()): >>> canvas = dset.draw_image(gid=gid) >>> kwplot.imshow(canvas, pnum=pnums[gx], fnum=fnum) >>> #dset.show_image(gid=gid, pnum=pnums[gx]) >>> kwplot.show_if_requested()
-
category_graph
()[source]¶ Construct a networkx category hierarchy
Returns: - graph: a directed graph where category names are
- the nodes, supercategories define edges, and items in each category dict (e.g. category id) are added as node properties.
Return type: network.DiGraph Example
>>> self = CocoDataset.demo() >>> graph = self.category_graph() >>> assert 'astronaut' in graph.nodes() >>> assert 'keypoints' in graph.nodes['human']
import graphid graphid.util.show_nx(graph)
-
object_categories
()[source]¶ Construct a consistent CategoryTree representation of object classes
Returns: category data structure Return type: kwcoco.CategoryTree Example
>>> self = CocoDataset.demo() >>> classes = self.object_categories() >>> print('classes = {}'.format(classes))
-
keypoint_categories
()[source]¶ Construct a consistent CategoryTree representation of keypoint classes
Returns: category data structure Return type: kwcoco.CategoryTree Example
>>> self = CocoDataset.demo() >>> classes = self.keypoint_categories() >>> print('classes = {}'.format(classes))
-
missing_images
(check_aux=False, verbose=0)[source]¶ Check for images that don’t exist
Parameters: check_aux (bool, default=Fasle) – if specified also checks auxillary images Returns: bad indexes and paths Return type: List[Tuple[int, str]]
-
corrupted_images
(verbose=0)[source]¶ Check for images that don’t exist or can’t be opened
Returns: bad indexes and paths Return type: List[Tuple[int, str]]
-
rename_categories
(mapper, strict=False, preserve=False, rebuild=True, simple=True, merge_policy='ignore')[source]¶ Create a coarser categorization
Note: this function has been unstable in the past, and has not yet been properly stabalized. Either avoid or use with care. Ensuring
simple=True
should result in newer saner behavior that will likely be backwards compatible.Todo
- [X] Simple case where we relabel names with no conflicts
- [ ] Case where annotation labels need to change to be coarser
- dev note: see internal libraries for work on this
- [ ] Other cases
Parameters: - mapper (dict or Function) – maps old names to new names.
- strict (bool) – DEPRICATED IGNORE. if True, fails if mapper doesnt map all classes
- preserve (bool) – DEPRICATED IGNORE. if True, preserve old categories as supercatgories. Broken.
- simple (bool, default=True) – defaults to the new way of doing this. The old way is depricated.
- merge_policy (str) – How to handle multiple categories that map to the same name. Can be update or ignore.
Example
>>> self = CocoDataset.demo() >>> self.rename_categories({'astronomer': 'person', >>> 'astronaut': 'person', >>> 'mouth': 'person', >>> 'helmet': 'hat'}, preserve=0) >>> assert 'hat' in self.name_to_cat >>> assert 'helmet' not in self.name_to_cat >>> # Test merge case >>> self = CocoDataset.demo() >>> mapper = { >>> 'helmet': 'rocket', >>> 'astronomer': 'rocket', >>> 'human': 'rocket', >>> 'mouth': 'helmet', >>> 'star': 'gas' >>> } >>> self.rename_categories(mapper)
-
reroot
(new_root=None, old_root=None, absolute=False, check=True, safe=True, smart=False)[source]¶ Rebase image/data paths onto a new image/data root.
Parameters: - new_root (str, default=None) – New image root. If unspecified the current
self.img_root
is used. - old_root (str, default=None) – If specified, removes the root from file names. If unspecified,
then the existing paths MUST be relative to
new_root
. - absolute (bool, default=False) – if True, file names are stored as absolute paths, otherwise they are relative to the new image root.
- check (bool, default=True) – if True, checks that the images all exist.
- safe (bool, default=True) – if True, does not overwrite values until all checks pass
- smart (bool, default=False) – If True, we can try different reroot strategies and choose the one that works. Note, always be wary when algorithms try to be smart.
- CommandLine:
- xdoctest -m /home/joncrall/code/kwcoco/kwcoco/coco_dataset.py MixinCocoExtras.reroot
Todo
- [ ] Incorporate maximum ordered subtree embedding once completed?
- Ignore:
>>> # There might not be a way to easily handle the cases that I >>> # want to here. Might need to discuss this. >>> import kwcoco >>> import os >>> gname = 'images/foo.png' >>> remote = '/remote/path' >>> host = ub.ensure_app_cache_dir('kwcoco/tests/reroot') >>> fpath = join(host, gname) >>> ub.ensuredir(dirname(fpath)) >>> # In this test the image exists on the host path >>> import kwimage >>> kwimage.imwrite(fpath, np.random.rand(8, 8)) >>> # >>> cases = {} >>> # * given absolute paths on current machine >>> cases['abs_curr'] = kwcoco.CocoDataset.from_image_paths([join(host, gname)]) >>> # * given "remote" rooted relative paths on current machine >>> cases['rel_remoterooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname], img_root=remote) >>> # * given "host" rooted relative paths on current machine >>> cases['rel_hostrooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname], img_root=host) >>> # * given unrooted relative paths on current machine >>> cases['rel_unrooted_curr'] = kwcoco.CocoDataset.from_image_paths([gname]) >>> # * given absolute paths on another machine >>> cases['abs_remote'] = kwcoco.CocoDataset.from_image_paths([join(remote, gname)]) >>> def report(dset, name): >>> gid = 1 >>> rel_fpath = dset.imgs[gid]['file_name'] >>> abs_fpath = dset.get_image_fpath(gid) >>> color = 'green' if exists(abs_fpath) else 'red' >>> print(' * strategy_name = {!r}'.format(name)) >>> print(' * rel_fpath = {!r}'.format(rel_fpath)) >>> print(' * ' + ub.color_text('abs_fpath = {!r}'.format(abs_fpath), color)) >>> for key, dset in cases.items(): >>> print('----') >>> print('case key = {!r}'.format(key)) >>> print('ORIG = {!r}'.format(dset.imgs[1]['file_name'])) >>> print('dset.img_root = {!r}'.format(dset.img_root)) >>> print('missing_gids = {!r}'.format(dset.missing_images())) >>> print('cwd = {!r}'.format(os.getcwd())) >>> print('host = {!r}'.format(host)) >>> print('remote = {!r}'.format(remote)) >>> # >>> dset_None_rel = dset.copy().reroot(absolute=False, check=0) >>> report(dset_None_rel, 'dset_None_rel') >>> # >>> dset_None_abs = dset.copy().reroot(absolute=True, check=0) >>> report(dset_None_abs, 'dset_None_abs') >>> # >>> dset_host_rel = dset.copy().reroot(host, absolute=False, check=0) >>> report(dset_host_rel, 'dset_host_rel') >>> # >>> dset_host_abs = dset.copy().reroot(host, absolute=True, check=0) >>> report(dset_host_abs, 'dset_host_abs') >>> # >>> dset_remote_rel = dset.copy().reroot(host, old_root=remote, absolute=False, check=0) >>> report(dset_remote_rel, 'dset_remote_rel') >>> # >>> dset_remote_abs = dset.copy().reroot(host, old_root=remote, absolute=True, check=0) >>> report(dset_remote_abs, 'dset_remote_abs')
Example
>>> import kwcoco >>> def report(dset, name): >>> gid = 1 >>> abs_fpath = dset.get_image_fpath(gid) >>> rel_fpath = dset.imgs[gid]['file_name'] >>> color = 'green' if exists(abs_fpath) else 'red' >>> print('strategy_name = {!r}'.format(name)) >>> print(ub.color_text('abs_fpath = {!r}'.format(abs_fpath), color)) >>> print('rel_fpath = {!r}'.format(rel_fpath)) >>> dset = self = kwcoco.CocoDataset.demo() >>> # Change base relative directory >>> img_root = ub.expandpath('~') >>> print('ORIG self.imgs = {!r}'.format(self.imgs)) >>> print('ORIG dset.img_root = {!r}'.format(dset.img_root)) >>> print('NEW img_root = {!r}'.format(img_root)) >>> self.reroot(img_root) >>> report(self, 'self') >>> print('NEW self.imgs = {!r}'.format(self.imgs)) >>> assert self.imgs[1]['file_name'].startswith('.cache')
>>> # Use absolute paths >>> self.reroot(absolute=True) >>> assert self.imgs[1]['file_name'].startswith(img_root)
>>> # Switch back to relative paths >>> self.reroot() >>> assert self.imgs[1]['file_name'].startswith('.cache')
Example
>>> # demo with auxillary data >>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes8', aux=True) >>> img_root = ub.expandpath('~') >>> print(self.imgs[1]['file_name']) >>> print(self.imgs[1]['auxillary'][0]['file_name']) >>> self.reroot(img_root) >>> print(self.imgs[1]['file_name']) >>> print(self.imgs[1]['auxillary'][0]['file_name']) >>> assert self.imgs[1]['file_name'].startswith('.cache') >>> assert self.imgs[1]['auxillary'][0]['file_name'].startswith('.cache')
- new_root (str, default=None) – New image root. If unspecified the current
-
data_root
¶ In the future we may deprecate img_root for data_root
-
find_representative_images
(gids=None)[source]¶ Find images that have a wide array of categories. Attempt to find the fewest images that cover all categories using images that contain both a large and small number of annotations.
Parameters: gids (None | List) – Subset of image ids to consider when finding representative images. Uses all images if unspecified. Returns: list of image ids determined to be representative Return type: List Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> gids = self.find_representative_images() >>> print('gids = {!r}'.format(gids)) >>> gids = self.find_representative_images([3]) >>> print('gids = {!r}'.format(gids))
>>> self = kwcoco.CocoDataset.demo('shapes8') >>> gids = self.find_representative_images() >>> print('gids = {!r}'.format(gids)) >>> valid = {7, 1} >>> gids = self.find_representative_images(valid) >>> assert valid.issuperset(gids) >>> print('gids = {!r}'.format(gids))
-
-
class
kwcoco.coco_dataset.
MixinCocoAttrs
[source]¶ Bases:
object
Expose methods to construct object lists / groups
-
annots
(aids=None, gid=None)[source]¶ Return vectorized annotation objects
Parameters: - aids (List[int]) – annotation ids to reference, if unspecified all annotations are returned.
- gid (int) – return all annotations that belong to this image id. mutually exclusive with aids arg.
Returns: vectorized annotation object
Return type: Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> annots = self.annots() >>> print(annots) <Annots(num=11)> >>> sub_annots = annots.take([1, 2, 3]) >>> print(sub_annots) <Annots(num=3)> >>> print(ub.repr2(sub_annots.get('bbox', None))) [ [350, 5, 130, 290], None, None, ]
-
images
(gids=None)[source]¶ Return vectorized image objects
Parameters: gids (List[int]) – image ids to reference, if unspecified all images are returned. Returns: vectorized images object Return type: Images Example
>>> self = CocoDataset.demo() >>> images = self.images() >>> print(images) <Images(num=3)>
-
-
class
kwcoco.coco_dataset.
MixinCocoStats
[source]¶ Bases:
object
Methods for getting stats about the dataset
-
n_annots
¶
-
n_images
¶
-
n_cats
¶
-
n_videos
¶
-
keypoint_annotation_frequency
()[source]¶ Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo('shapes', rng=0) >>> hist = self.keypoint_annotation_frequency() >>> hist = ub.odict(sorted(hist.items())) >>> # FIXME: for whatever reason demodata generation is not determenistic when seeded >>> print(ub.repr2(hist)) # xdoc: +IGNORE_WANT { 'bot_tip': 6, 'left_eye': 14, 'mid_tip': 6, 'right_eye': 14, 'top_tip': 6, }
-
category_annotation_frequency
()[source]¶ Reports the number of annotations of each category
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> hist = self.category_annotation_frequency() >>> print(ub.repr2(hist)) { 'astroturf': 0, 'human': 0, 'astronaut': 1, 'astronomer': 1, 'helmet': 1, 'rocket': 1, 'mouth': 2, 'star': 5, }
-
category_annotation_type_frequency
()[source]¶ Reports the number of annotations of each type for each category
Example
>>> self = CocoDataset.demo() >>> hist = self.category_annotation_frequency() >>> print(ub.repr2(hist))
-
basic_stats
()[source]¶ Reports number of images, annotations, and categories.
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> print(ub.repr2(self.basic_stats())) { 'n_anns': 11, 'n_imgs': 3, 'n_videos': 0, 'n_cats': 8, }
>>> from kwcoco.demo.toydata import * # NOQA >>> dset = random_video_dset(render=True, num_frames=2, num_tracks=10, rng=0) >>> print(ub.repr2(dset.basic_stats())) { 'n_anns': 20, 'n_imgs': 2, 'n_videos': 1, 'n_cats': 3, }
-
extended_stats
()[source]¶ Reports number of images, annotations, and categories.
Example
>>> self = CocoDataset.demo() >>> print(ub.repr2(self.extended_stats()))
-
boxsize_stats
(anchors=None, perclass=True, gids=None, aids=None, verbose=0, clusterkw={}, statskw={})[source]¶ Compute statistics about bounding box sizes.
Also computes anchor boxes using kmeans if
anchors
is specified.Parameters: - anchors (int) – if specified also computes box anchors
- perclass (bool) – if True also computes stats for each category
- gids (List[int], default=None) – if specified only compute stats for these image ids.
- aids (List[int], default=None) – if specified only compute stats for these annotation ids.
- verbose (int) – verbosity level
- clusterkw (dict) – kwargs for
sklearn.cluster.KMeans
used if computing anchors. - statskw (dict) – kwargs for
kwarray.stats_dict()
Returns: Dict[str, Dict[str, Dict | ndarray]
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes32') >>> infos = self.boxsize_stats(anchors=4, perclass=False) >>> print(ub.repr2(infos, nl=-1, precision=2))
>>> infos = self.boxsize_stats(gids=[1], statskw=dict(median=True)) >>> print(ub.repr2(infos, nl=-1, precision=2))
-
-
class
kwcoco.coco_dataset.
MixinCocoDraw
[source]¶ Bases:
object
Matplotlib / display functionality
-
draw_image
(gid)[source]¶ Use kwimage to draw all annotations on an image and return the pixels as a numpy array.
Returns: canvas Return type: ndarray Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo('shapes8') >>> self.draw_image(1) >>> # Now you can dump the annotated image to disk / whatever >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.imshow(canvas)
-
show_image
(gid=None, aids=None, aid=None, **kwargs)[source]¶ Use matplotlib to show an image with annotations overlaid
Parameters: - gid (int) – image to show
- aids (list) – aids to highlight within the image
- aid (int) – a specific aid to focus on. If gid is not give, look up gid based on this aid.
- **kwargs – show_annots, show_aid, show_catname, show_kpname, show_segmentation, title, show_gid, show_filename, show_boxes,
- Ignore:
- # Programatically collect the kwargs for docs generation import xinspect import kwcoco kwargs = xinspect.get_kwargs(kwcoco.CocoDataset.show_image) print(ub.repr2(list(kwargs.keys()), nl=1, si=1))
-
-
class
kwcoco.coco_dataset.
MixinCocoAddRemove
[source]¶ Bases:
object
Mixin functions to dynamically add / remove annotations images and categories while maintaining lookup indexes.
-
add_video
(name, id=None, **kw)[source]¶ Add a video to the dataset (dynamically updates the index)
Parameters: - name (str) – Unique name for this video.
- id (None or int) – ADVANCED. Force using this image id.
- **kw – stores arbitrary key/value pairs in this new video
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset() >>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1))) >>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1))) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> vidid1 = self.add_video('foo', id=3) >>> vidid2 = self.add_video('bar') >>> vidid3 = self.add_video('baz') >>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1))) >>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1))) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> gid1 = self.add_image('foo1.jpg', video_id=vidid1) >>> gid2 = self.add_image('foo2.jpg', video_id=vidid1) >>> gid3 = self.add_image('foo3.jpg', video_id=vidid1) >>> self.add_image('bar1.jpg', video_id=vidid2) >>> print('self.index.videos = {}'.format(ub.repr2(self.index.videos, nl=1))) >>> print('self.index.imgs = {}'.format(ub.repr2(self.index.imgs, nl=1))) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
>>> self.remove_images([gid2]) >>> print('self.index.vidid_to_gids = {!r}'.format(self.index.vidid_to_gids))
-
add_image
(file_name, id=None, **kw)[source]¶ Add an image to the dataset (dynamically updates the index)
Parameters: - file_name (str) – relative or absolute path to image
- id (None or int) – ADVANCED. Force using this image id.
- **kw – stores arbitrary key/value pairs in this new image
Example
>>> self = CocoDataset.demo() >>> import kwimage >>> gname = kwimage.grab_test_image_fpath('paraview') >>> gid = self.add_image(gname) >>> assert self.imgs[gid]['file_name'] == gname
-
add_annotation
(image_id, category_id=None, bbox=None, id=None, **kw)[source]¶ Add an annotation to the dataset (dynamically updates the index)
Parameters: - image_id (int) – image_id to add to
- category_id (int) – category_id to add to
- bbox (list or kwimage.Boxes) – bounding box in xywh format
- id (None or int) – ADVANCED. Force using this annotation id.
- **kw – stores arbitrary key/value pairs in this new image
Example
>>> self = CocoDataset.demo() >>> image_id = 1 >>> cid = 1 >>> bbox = [10, 10, 20, 20] >>> aid = self.add_annotation(image_id, cid, bbox) >>> assert self.anns[aid]['bbox'] == bbox
Example
>>> # Attempt to annot without a category or bbox >>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> image_id = 1 >>> aid = self.add_annotation(image_id) >>> assert None in self.index.cid_to_aids
-
add_category
(name, supercategory=None, id=None, **kw)[source]¶ Adds a category
Parameters: - name (str) – name of the new category
- supercategory (str, optional) – parent of this category
- id (int, optional) – use this category id, if it was not taken
- **kw – stores arbitrary key/value pairs in this new image
Example
>>> self = CocoDataset.demo() >>> prev_n_cats = self.n_cats >>> cid = self.add_category('dog', supercategory='object') >>> assert self.cats[cid]['name'] == 'dog' >>> assert self.n_cats == prev_n_cats + 1 >>> import pytest >>> with pytest.raises(ValueError): >>> self.add_category('dog', supercategory='object')
-
ensure_image
(file_name, id=None, **kw)[source]¶ Like add_image, but returns the existing image id if it already exists instead of failing. In this case all metadata is ignored.
Parameters: - file_name (str) – relative or absolute path to image
- id (None or int) – ADVANCED. Force using this image id.
- **kw – stores arbitrary key/value pairs in this new image
Returns: the existing or new image id
Return type:
-
ensure_category
(name, supercategory=None, id=None, **kw)[source]¶ Like add_category, but returns the existing category id if it already exists instead of failing. In this case all metadata is ignored.
Returns: the existing or new category id Return type: int
-
add_annotations
(anns)[source]¶ Faster less-safe multi-item alternative
Parameters: anns (List[Dict]) – list of annotation dictionaries Example
>>> self = CocoDataset.demo() >>> anns = [self.anns[aid] for aid in [2, 3, 5, 7]] >>> self.remove_annotations(anns) >>> assert self.n_annots == 7 and self._check_index() >>> self.add_annotations(anns) >>> assert self.n_annots == 11 and self._check_index()
-
add_images
(imgs)[source]¶ Faster less-safe multi-item alternative
Note
THIS FUNCTION WAS DESIGNED FOR SPEED, AS SUCH IT DOES NOT CHECK IF THE IMAGE-IDs or FILE_NAMES ARE DUPLICATED AND WILL BLINDLY ADD DATA EVEN IF IT IS BAD. THE SINGLE IMAGE VERSION IS SLOWER BUT SAFER.
Parameters: imgs (List[Dict]) – list of image dictionaries Example
>>> imgs = CocoDataset.demo().dataset['images'] >>> self = CocoDataset() >>> self.add_images(imgs) >>> assert self.n_images == 3 and self._check_index()
-
clear_images
()[source]¶ Removes all images and annotations (but not categories)
Example
>>> self = CocoDataset.demo() >>> self.clear_images() >>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1)) n_anns: 0, n_imgs: 0, n_videos: 0, n_cats: 8
-
clear_annotations
()[source]¶ Removes all annotations (but not images and categories)
Example
>>> self = CocoDataset.demo() >>> self.clear_annotations() >>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1)) n_anns: 0, n_imgs: 3, n_videos: 0, n_cats: 8
-
remove_all_images
()¶ Removes all images and annotations (but not categories)
Example
>>> self = CocoDataset.demo() >>> self.clear_images() >>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1)) n_anns: 0, n_imgs: 0, n_videos: 0, n_cats: 8
-
remove_all_annotations
()¶ Removes all annotations (but not images and categories)
Example
>>> self = CocoDataset.demo() >>> self.clear_annotations() >>> print(ub.repr2(self.basic_stats(), nobr=1, nl=0, si=1)) n_anns: 0, n_imgs: 3, n_videos: 0, n_cats: 8
-
remove_annotation
(aid_or_ann)[source]¶ Remove a single annotation from the dataset
If you have multiple annotations to remove its more efficient to remove them in batch with
self.remove_annotations
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]] >>> self.remove_annotations(aids_or_anns) >>> assert len(self.dataset['annotations']) == 7 >>> self._check_index()
-
remove_annotations
(aids_or_anns, verbose=0, safe=True)[source]¶ Remove multiple annotations from the dataset.
Parameters: - anns_or_aids (List) – list of annotation dicts or ids
- safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns: num_removed: information on the number of items removed
Return type: Dict
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> prev_n_annots = self.n_annots >>> aids_or_anns = [self.anns[2], 3, 4, self.anns[1]] >>> self.remove_annotations(aids_or_anns) # xdoc: +IGNORE_WANT {'annotations': 4} >>> assert len(self.dataset['annotations']) == prev_n_annots - 4 >>> self._check_index()
-
remove_categories
(cat_identifiers, keep_annots=False, verbose=0, safe=True)[source]¶ Remove categories and all annotations in those categories. Currently does not change any hierarchy information
Parameters: - cat_identifiers (List) – list of category dicts, names, or ids
- keep_annots (bool, default=False) – if True, keeps annotations, but removes category labels.
- safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns: num_removed: information on the number of items removed
Return type: Dict
Example
>>> self = CocoDataset.demo() >>> cat_identifiers = [self.cats[1], 'rocket', 3] >>> self.remove_categories(cat_identifiers) >>> assert len(self.dataset['categories']) == 5 >>> self._check_index()
-
remove_images
(gids_or_imgs, verbose=0, safe=True)[source]¶ Parameters: - gids_or_imgs (List) – list of image dicts, names, or ids
- safe (bool, default=True) – if True, we perform checks to remove duplicates and non-existing identifiers.
Returns: num_removed: information on the number of items removed
Return type: Dict
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> assert len(self.dataset['images']) == 3 >>> gids_or_imgs = [self.imgs[2], 'astro.png'] >>> self.remove_images(gids_or_imgs) # xdoc: +IGNORE_WANT {'annotations': 11, 'images': 2} >>> assert len(self.dataset['images']) == 1 >>> self._check_index() >>> gids_or_imgs = [3] >>> self.remove_images(gids_or_imgs) >>> assert len(self.dataset['images']) == 0 >>> self._check_index()
-
remove_annotation_keypoints
(kp_identifiers)[source]¶ Removes all keypoints with a particular category
Parameters: kp_identifiers (List) – list of keypoint category dicts, names, or ids Returns: num_removed: information on the number of items removed Return type: Dict
-
remove_keypoint_categories
(kp_identifiers)[source]¶ Removes all keypoints of a particular category as well as all annotation keypoints with those ids.
Parameters: kp_identifiers (List) – list of keypoint category dicts, names, or ids Returns: num_removed: information on the number of items removed Return type: Dict Example
>>> self = CocoDataset.demo('shapes', rng=0) >>> kp_identifiers = ['left_eye', 'mid_tip'] >>> remove_info = self.remove_keypoint_categories(kp_identifiers) >>> print('remove_info = {!r}'.format(remove_info)) >>> # FIXME: for whatever reason demodata generation is not determenistic when seeded >>> # assert remove_info == {'keypoint_categories': 2, 'annotation_keypoints': 16, 'reflection_ids': 1} >>> assert self._resolve_to_kpcat('right_eye')['reflection_id'] is None
-
set_annotation_category
(aid_or_ann, cid_or_cat)[source]¶ Sets the category of a single annotation
Parameters: - aid_or_ann (dict | int) – annotation dict or id
- cid_or_cat (dict | int) – category dict or id
Example
>>> import kwcoco >>> self = kwcoco.CocoDataset.demo() >>> old_freq = self.category_annotation_frequency() >>> aid_or_ann = aid = 2 >>> cid_or_cat = new_cid = self.ensure_category('kitten') >>> self.set_annotation_category(aid, new_cid) >>> new_freq = self.category_annotation_frequency() >>> print('new_freq = {}'.format(ub.repr2(new_freq, nl=1))) >>> print('old_freq = {}'.format(ub.repr2(old_freq, nl=1))) >>> assert sum(new_freq.values()) == sum(old_freq.values()) >>> assert new_freq['kitten'] == 1
-
-
class
kwcoco.coco_dataset.
CocoIndex
[source]¶ Bases:
object
Fast lookup index for the COCO dataset with dynamic modification
Variables: -
cid_to_gids
¶ >>> import kwcoco >>> self = dset = kwcoco.CocoDataset() >>> self.index.cid_to_gids
Type: Example
-
build
(parent)[source]¶ Build all id-to-obj reverse indexes from scratch.
Parameters: parent (CocoDataset) – the dataset to index - Notation:
- aid - Annotation ID gid - imaGe ID cid - Category ID vidid - Video ID
Example
>>> from kwcoco.demo.toydata import * # NOQA >>> parent = CocoDataset.demo('vidshapes1', num_frames=4, rng=1) >>> index = parent.index >>> index.build(parent)
-
-
class
kwcoco.coco_dataset.
MixinCocoIndex
[source]¶ Bases:
object
Give the dataset top level access to index attributes
-
anns
¶
-
imgs
¶
-
cats
¶
-
videos
¶
-
gid_to_aids
¶
-
cid_to_aids
¶
-
name_to_cat
¶
-
-
class
kwcoco.coco_dataset.
CocoDataset
(data=None, tag=None, img_root=None, autobuild=True)[source]¶ Bases:
ubelt.util_mixins.NiceRepr
,kwcoco.coco_dataset.MixinCocoAddRemove
,kwcoco.coco_dataset.MixinCocoStats
,kwcoco.coco_dataset.MixinCocoAttrs
,kwcoco.coco_dataset.MixinCocoDraw
,kwcoco.coco_dataset.MixinCocoExtras
,kwcoco.coco_dataset.MixinCocoIndex
,kwcoco.coco_dataset.MixinCocoDepricate
Notes
- A keypoint annotation
- {
- “image_id” : int, “category_id” : int, “keypoints” : [x1,y1,v1,…,xk,yk,vk], “score” : float,
} Note that
v[i]
is a visibility flag, where v=0: not labeled,v=1: labeled but not visible, and v=2: labeled and visible.- A bounding box annotation
- {
- “image_id” : int, “category_id” : int, “bbox” : [x,y,width,height], “score” : float,
}
- We also define a non-standard “line” annotation (which
- our fixup scripts will interpret as the diameter of a circle to convert into a bounding box)
- A line* annotation (note this is a non-standard field)
- {
- “image_id” : int, “category_id” : int, “line” : [x1,y1,x2,y2], “score” : float,
}
Lastly, note that our datasets will sometimes specify multiple bbox, line, and/or, keypoints fields. In this case we may also specify a field roi_shape, which denotes which field is the “main” annotation type.
Variables: - dataset (Dict) – raw json data structure. This is the base dictionary that contains {‘annotations’: List, ‘images’: List, ‘categories’: List}
- index (CocoIndex) – an efficient lookup index into the coco data
structure. The index defines its own attributes like
anns
,cats
,imgs
, etc. SeeCocoIndex
for more details on which attributes are available. - fpath (PathLike | None) – if known, this stores the filepath the dataset was loaded from
- tag (str) – A tag indicating the name of the dataset.
- img_root (PathLike | None) – If known, this is the root path that all image file names are relative to. This can also be manually overwritten by the user.
- hashid (str | None) – If computed, this will be a hash uniquely identifing the dataset.
To ensure this is computed see
_build_hashid()
.
References
http://cocodataset.org/#format http://cocodataset.org/#download
- CommandLine:
- python -m kwcoco.coco_dataset CocoDataset –show
Example
>>> dataset = demo_coco_data() >>> self = CocoDataset(dataset, tag='demo') >>> # xdoctest: +REQUIRES(--show) >>> self.show_image(gid=2) >>> from matplotlib import pyplot as plt >>> plt.show()
-
classmethod
from_image_paths
(gpaths, img_root=None)[source]¶ Constructor from a list of images paths
Example
>>> coco_dset = CocoDataset.from_image_paths(['a.png', 'b.png']) >>> assert coco_dset.n_images == 2
-
classmethod
from_coco_paths
(fpaths, max_workers=0, verbose=1, mode='thread', union='try')[source]¶ Constructor from multiple coco file paths.
Loads multiple coco datasets and unions the result
Notes
if the union operation fails, the list of individually loaded files is returned instead.
Parameters: - fpaths (List[str]) – list of paths to multiple coco files to be loaded and unioned.
- max_workers (int, default=0) – number of worker threads / processes
- verbose (int) – verbosity level
- mode (str) – thread, process, or serial
- union (str | bool, default=’try’) – If True, unions the result datasets after loading. If False, just returns the result list. If ‘try’, then try to preform the union, but return the result list if it fails.
-
copy
()[source]¶ Deep copies this object
Example
>>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> new = self.copy() >>> assert new.imgs[1] is new.dataset['images'][0] >>> assert new.imgs[1] == self.dataset['images'][0] >>> assert new.imgs[1] is not self.dataset['images'][0]
-
dumps
(indent=None, newlines=False)[source]¶ Writes the dataset out to the json format
Parameters: newlines (bool) – if True, each annotation, image, category gets its own line Notes
- Using newlines=True is similar to:
- print(ub.repr2(dset.dataset, nl=2, trailsep=False)) However, the above may not output valid json if it contains ndarrays.
Example
>>> from kwcoco.coco_dataset import * >>> import json >>> self = CocoDataset.demo() >>> text = self.dumps(newlines=True) >>> print(text) >>> self2 = CocoDataset(json.loads(text), tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
>>> text = self.dumps(newlines=True) >>> print(text) >>> self2 = CocoDataset(json.loads(text), tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
- Ignore:
- for k in self2.dataset:
- if self.dataset[k] == self2.dataset[k]:
- print(‘YES: k = {!r}’.format(k))
- else:
- print(‘NO: k = {!r}’.format(k))
self2.dataset[‘categories’] self.dataset[‘categories’]
-
dump
(file, indent=None, newlines=False)[source]¶ Writes the dataset out to the json format
Parameters: - file (PathLike | FileLike) – Where to write the data. Can either be a path to a file or an open file pointer / stream.
- newlines (bool) – if True, each annotation, image, category gets its own line.
Example
>>> import tempfile >>> from kwcoco.coco_dataset import * >>> self = CocoDataset.demo() >>> file = tempfile.NamedTemporaryFile('w') >>> self.dump(file) >>> file.seek(0) >>> text = open(file.name, 'r').read() >>> print(text) >>> file.seek(0) >>> dataset = json.load(open(file.name, 'r')) >>> self2 = CocoDataset(dataset, tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
>>> file = tempfile.NamedTemporaryFile('w') >>> self.dump(file, newlines=True) >>> file.seek(0) >>> text = open(file.name, 'r').read() >>> print(text) >>> file.seek(0) >>> dataset = json.load(open(file.name, 'r')) >>> self2 = CocoDataset(dataset, tag='demo2') >>> assert self2.dataset == self.dataset >>> assert self2.dataset is not self.dataset
-
union
(*others, **kwargs)[source]¶ Merges multiple
CocoDataset
items into one. Names and associations are retained, but ids may be different.Parameters: - self – note that
union()
can be called as an instance method or a class method. If it is a class method, then this is the class type, otherwise the instance will also be unioned withothers
. - *others – a series of CocoDatasets that we will merge
- **kwargs – constructor options for the new merged CocoDataset
Returns: a new merged coco dataset
Return type: Example
>>> # Test union works with different keypoint categories >>> dset1 = CocoDataset.demo('shapes1') >>> dset2 = CocoDataset.demo('shapes2') >>> dset1.remove_keypoint_categories(['bot_tip', 'mid_tip', 'right_eye']) >>> dset2.remove_keypoint_categories(['top_tip', 'left_eye']) >>> dset_12a = CocoDataset.union(dset1, dset2) >>> dset_12b = dset1.union(dset2) >>> dset_21 = dset2.union(dset1) >>> def add_hist(h1, h2): >>> return {k: h1.get(k, 0) + h2.get(k, 0) for k in set(h1) | set(h2)} >>> kpfreq1 = dset1.keypoint_annotation_frequency() >>> kpfreq2 = dset2.keypoint_annotation_frequency() >>> kpfreq_want = add_hist(kpfreq1, kpfreq2) >>> kpfreq_got1 = dset_12a.keypoint_annotation_frequency() >>> kpfreq_got2 = dset_12b.keypoint_annotation_frequency() >>> assert kpfreq_want == kpfreq_got1 >>> assert kpfreq_want == kpfreq_got2
>>> # Test disjoint gid datasets >>> import kwcoco >>> dset1 = kwcoco.CocoDataset.demo('shapes3') >>> for new_gid, img in enumerate(dset1.dataset['images'], start=10): >>> for aid in dset1.gid_to_aids[img['id']]: >>> dset1.anns[aid]['image_id'] = new_gid >>> img['id'] = new_gid >>> dset1.index.clear() >>> dset1._build_index() >>> # ------ >>> dset2 = kwcoco.CocoDataset.demo('shapes2') >>> for new_gid, img in enumerate(dset2.dataset['images'], start=100): >>> for aid in dset2.gid_to_aids[img['id']]: >>> dset2.anns[aid]['image_id'] = new_gid >>> img['id'] = new_gid >>> dset1.index.clear() >>> dset2._build_index() >>> others = [dset1, dset2] >>> merged = kwcoco.CocoDataset.union(*others) >>> print('merged = {!r}'.format(merged)) >>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1))) >>> assert set(merged.imgs) & set([10, 11, 12, 100, 101]) == set(merged.imgs)
>>> # Test data is not preserved >>> dset2 = kwcoco.CocoDataset.demo('shapes2') >>> dset1 = kwcoco.CocoDataset.demo('shapes3') >>> others = (dset1, dset2) >>> cls = self = kwcoco.CocoDataset >>> merged = cls.union(*others) >>> print('merged = {!r}'.format(merged)) >>> print('merged.imgs = {}'.format(ub.repr2(merged.imgs, nl=1))) >>> assert set(merged.imgs) & set([1, 2, 3, 4, 5]) == set(merged.imgs)
Todo
- [ ] are supercategories broken?
- [ ] reuse image ids where possible
- [ ] reuse annotation / category ids where possible
- [ ] disambiguate track-ids
- [x] disambiguate video-ids
- self – note that
-
subset
(gids, copy=False, autobuild=True)[source]¶ Return a subset of the larger coco dataset by specifying which images to port. All annotations in those images will be taken.
Parameters: - gids (List[int]) – image-ids to copy into a new dataset
- copy (bool, default=False) – if True, makes a deep copy of all nested attributes, otherwise makes a shallow copy.
- autobuild (bool, default=True) – if True will automatically build the fast lookup index.
Example
>>> self = CocoDataset.demo() >>> gids = [1, 3] >>> sub_dset = self.subset(gids) >>> assert len(self.gid_to_aids) == 3 >>> assert len(sub_dset.gid_to_aids) == 2
Example
>>> self = CocoDataset.demo() >>> sub1 = self.subset([1]) >>> sub2 = self.subset([2]) >>> sub3 = self.subset([3]) >>> others = [sub1, sub2, sub3] >>> rejoined = CocoDataset.union(*others) >>> assert len(sub1.anns) == 9 >>> assert len(sub2.anns) == 2 >>> assert len(sub3.anns) == 0 >>> assert rejoined.basic_stats() == self.basic_stats()
-
kwcoco.coco_dataset.
demo_coco_data
()[source]¶ Simple data for testing
- Ignore:
- # code for getting a segmentation polygon kwimage.grab_test_image_fpath(‘astro’) labelme /home/joncrall/.cache/kwimage/demodata/astro.png cat /home/joncrall/.cache/kwimage/demodata/astro.json
Example
>>> # xdoctest: +REQUIRES(--show) >>> from kwcoco.coco_dataset import demo_coco_data, CocoDataset >>> dataset = demo_coco_data() >>> self = CocoDataset(dataset, tag='demo') >>> import kwplot >>> kwplot.autompl() >>> self.show_image(gid=1) >>> kwplot.show_if_requested()