kwcoco.cli.coco_stats module

class kwcoco.cli.coco_stats.CocoStatsCLI(*args: Any, **kwargs: Any)[source]

Bases: DataConfig

Compute summary statistics about a COCO dataset.

Basic stats are the number of images, annotations, categories, videos, and tracks. Extended stats are also available.

SeeAlso:

kwcoco visual_stats –help

Valid options: []

Parameters:
  • *args – positional arguments for this data config

  • **kwargs – keyword arguments for this data config

classmethod main(cmdline=True, **kw)[source]

CommandLine

xdoctest -m kwcoco.cli.coco_stats CocoStatsCLI.main:0
xdoctest -m kwcoco.cli.coco_stats CocoStatsCLI.main:1

Example

>>> kw = {'src': 'special:shapes8'}
>>> cmdline = False
>>> cls = CocoStatsCLI
>>> cls.main(cmdline, **kw)

Example

>>> # xdoctest: +REQUIRES(module:pyyaml)
>>> from kwcoco.cli.coco_stats import *  # NOQA
>>> kw = {
>>>     'src': ['special:shapes8', 'special:vidshapes8', 'special:vidshapes2'],
>>>     'basic': True,
>>>     'extended': True,
>>>     'catfreq': True,
>>>     'image_size': True,
>>>     'annot_attrs': True,
>>>     'image_attrs': True,
>>>     'video_attrs': True,
>>>     'disk_usage': True,
>>>     'boxes': True,
>>> }
>>> cmdline = False
>>> cls = CocoStatsCLI
>>> print('-- Test YAML format --')
>>> kw['format'] = 'yaml'
>>> cls.main(cmdline, **kw)
>>> print('-- Test Human format --')
>>> kw['format'] = 'human'
>>> cls.main(cmdline, **kw)
default = {'annot_attrs': <Value(False)>, 'basic': <Value(True)>, 'boxes': <Value(False)>, 'catfreq': <Value(True)>, 'channels': <Value(False)>, 'disk_usage': <Value(False)>, 'embed': <Value(False)>, 'extended': <Value(True)>, 'format': <Value('human')>, 'image_attrs': <Value(False)>, 'image_size': <Value(False)>, 'io_workers': <Value(0)>, 'src': <Value(['special:shapes8'])>, 'video_attrs': <Value(False)>}
kwcoco.cli.coco_stats._coco_channel_stats(coco_dset)[source]

Return information about which channels and sensors are available.

This is a streamlined version of the richer geowatch stats, focused on generic kwcoco datasets.

The exact return values of this function may change in the future.

Example

>>> # xdoctest: +REQUIRES(module:lark)
>>> import kwcoco
>>> from kwcoco.cli.coco_stats import _coco_channel_stats
>>> dset = kwcoco.CocoDataset()
>>> dset.add_category('a')
>>> gid1 = dset.add_image(file_name='img1.tif', sensor_coarse='S1', width=1, height=1)
>>> gid2 = dset.add_image(file_name='img2.tif', sensor_coarse='S2', width=1, height=1)
>>> dset.add_asset(gid=gid1, file_name='a1.tif', channels='red,green', width=1, height=1)
>>> dset.add_asset(gid=gid1, file_name='a2.tif', channels='blue', width=1, height=1)
>>> dset.add_asset(gid=gid2, file_name='b1.tif', channels='red,green', width=1, height=1)
>>> dset.add_asset(gid=gid2, file_name='b2.tif', channels='nir', width=1, height=1)
>>> info = _coco_channel_stats(dset)
>>> assert info['sensor_hist'] == {'S1': 1, 'S2': 1}
>>> assert info['chan_hist']['blue,red,green,unknown-chan'] == 1
>>> assert info['chan_hist']['nir,red,green,unknown-chan'] == 1
kwcoco.cli.coco_stats._dataset_disk_usage(dset)[source]

Compute disk usage of all image assets referenced by this dataset.

Returns:

{

‘num_files’: int, ‘total_bytes’: int, ‘total_gb’: float, ‘missing_files’: List[str],

}

Return type:

dict

kwcoco.cli.coco_stats.byte_str(num, unit='auto', precision=2)[source]

Automatically chooses relevant unit (KB, MB, or GB) for displaying some number of bytes.

Parameters:
  • num (int) – number of bytes

  • unit (str) – which unit to use, can be auto, B, KB, MB, GB, TB, PB, EB, ZB, or YB.

  • precision (int) – number of decimals of precision

References

https://en.wikipedia.org/wiki/Orders_of_magnitude_(data)

Returns:

string representing the number of bytes with appropriate units

Return type:

str

Example

>>> num_list = [1, 100, 1024,  1048576, 1073741824, 1099511627776]
>>> result = ub.urepr(list(map(byte_str, num_list)), nl=0)
>>> print(result)
['0.00 KB', '0.10 KB', '1.00 KB', '1.00 MB', '1.00 GB', '1.00 TB']