kwcoco.coco_sql_dataset module¶

Finally got a baseline implementation of an SQLite backend for COCO datasets. This mostly plugs into my existing tools (as long as only read operations are used; haven’t impelmented writing yet) by duck-typing the dict API.

This solves the issue of forking and then accessing nested dictionaries in the JSON-style COCO objects. (When you access the dictionary Python will increment a reference count which triggers copy-on-write for whatever memory page that data happened to live in. Non-contiguous access had the effect of excessive memory copies).

For “medium sized” datasets its quite a bit slower. Running through a torch DataLoader with 4 workers for 10,000 images executes at a rate of 100Hz but takes 850MB of RAM. Using the duck-typed SQL backend only uses 500MB (which includes the cost of caching), but runs at 45Hz (which includes the benefit of caching).

However, once I scale up to 100,000 images I start seeing benefits. The in-memory dictionary interface chugs at 1.05HZ, and is taking more than 4GB of memory at the time I killed the process (eta was over an hour). The SQL backend ran at 45Hz and took about 3 minutes and used about 2.45GB of memory.

Without a cache, SQL runs at 30HZ and takes 400MB for 10,000 images, and for 100,000 images it gets 30Hz with 1.1GB. There is also a much larger startup time. I’m not exactly sure what it is yet, but its probably some preprocessing I’m doing.

Using a LRU cache we get 45Hz and 1.05GB of memory, so that’s a clear win. We do need to be sure to disable the cache if we ever implement write mode.

I’d like to be a bit faster on the medium sized datasets (I’d really like to avoid caching rows, which is why the speed is currently semi-reasonable), but I don’t think I can do any better than this because single-row lookup time is O(log(N)) for sqlite, whereas its O(1) for dictionaries. (I wish sqlite had an option to create a hash-table index for a table, but I dont think it does). I optimized as many of the dictionary operations as possible (for instance, iterating through keys, values, and items should be O(N) instead of O(N log(N))), but the majority of the runtime cost is in the single-row lookup time.

There are a few questions I still have if anyone has insight:

Say I want to select a subset of K rows from a table with N entries, and I have a list of all of the rowids that I want. Is there any way to do this better than O(K log(N))? I tried using a SELECT col FROM table WHERE id IN (?, ?, ?, ?, ...) filling in enough ? as there are rows in my subset. I’m not sure what the complexity of using a query like this is. I’m not sure what the IN implementation looks like. Can this be done more efficiently by with a temporary table and a JOIN?

There really is no way to do O(1) row lookup in sqlite right? Is there a way in PostgreSQL or some other backend sqlalchemy supports?

I found that PostgreSQL does support hash indexes: https://www.postgresql.org/docs/13/indexes-types.html I’m really not interested in setting up a global service though 😞. I also found a 10-year old thread with a hash-index feature request for SQLite, which I unabashedly resurrected http://sqlite.1065341.n5.nabble.com/Feature-request-hash-index-td23367.html https://web.archive.org/web/20210326010915/http://sqlite.1065341.n5.nabble.com/Feature-request-hash-index-td23367.html

Todo

[ ] We get better speeds with raw SQL over alchemy. Can we mitigate the
speed difference so we can take advantage of alchemy’s expressiveness?

class kwcoco.coco_sql_dataset.FallbackCocoBase[source]¶

Bases: object

_decl_class_registry = {}¶

class kwcoco.coco_sql_dataset.Category(**kwargs)[source]¶

Bases: Base

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id¶: unique internal id

name¶: unique external name or identifier

alias¶: list of alter egos

supercategory¶: coarser category name

_unstructured¶

_sa_class_manager = {'_unstructured': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'alias': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'name': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'supercategory': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶

class kwcoco.coco_sql_dataset.KeypointCategory(**kwargs)[source]¶

Bases: Base

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id¶: unique internal id

name¶: unique external name or identifier

alias¶: list of alter egos

supercategory¶: coarser category name

reflection_id¶: if augmentation reflects the image, change keypoint id to this

_unstructured¶

_sa_class_manager = {'_unstructured': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'alias': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'name': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'reflection_id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'supercategory': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶

class kwcoco.coco_sql_dataset.Video(**kwargs)[source]¶

Bases: Base

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id¶: unique internal id

name¶

caption¶

width¶

height¶

_unstructured¶

_sa_class_manager = {'_unstructured': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'caption': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'height': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'name': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'width': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶

class kwcoco.coco_sql_dataset.Image(**kwargs)[source]¶

Bases: Base

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id¶: unique internal id

name¶

file_name¶

width¶

height¶

video_id¶

timestamp¶

frame_index¶

channels¶: See ChannelSpec

warp_img_to_vid¶: See TransformSpec

auxiliary¶

_unstructured¶

_sa_class_manager = {'_unstructured': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'auxiliary': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'channels': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'file_name': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'frame_index': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'height': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'name': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'timestamp': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'video_id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'warp_img_to_vid': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'width': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶

class kwcoco.coco_sql_dataset.Track(**kwargs)[source]¶

Bases: Base

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id¶: unique internal id

name¶

_unstructured¶

_sa_class_manager = {'_unstructured': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'name': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶

class kwcoco.coco_sql_dataset.Annotation(**kwargs)[source]¶

Bases: Base

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id¶

image_id¶

category_id¶

track_id¶

segmentation¶

keypoints¶

bbox¶

_bbox_x¶

_bbox_y¶

_bbox_w¶

_bbox_h¶

score¶

weight¶

prob¶

iscrowd¶

caption¶

_unstructured¶

_sa_class_manager = {'_bbox_h': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, '_bbox_w': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, '_bbox_x': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, '_bbox_y': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, '_unstructured': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'bbox': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'caption': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'category_id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'image_id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'iscrowd': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'keypoints': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'prob': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'score': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'segmentation': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'track_id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'weight': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶

kwcoco.coco_sql_dataset.cls¶: alias of Annotation

kwcoco.coco_sql_dataset.orm_to_dict(obj)[source]¶

kwcoco.coco_sql_dataset.dict_restructure(item)[source]¶: Removes the unstructured field so the API is transparent to the user.

kwcoco.coco_sql_dataset._orm_yielder(query, size=300)[source]¶: TODO: figure out the best way to yield, in batches or otherwise

kwcoco.coco_sql_dataset._raw_yielder(result, size=300)[source]¶: TODO: figure out the best way to yield, in batches or otherwise

kwcoco.coco_sql_dataset._new_proxy_cache()[source]¶: By returning None, we wont use item caching

class kwcoco.coco_sql_dataset.SqlListProxy(session, cls)[source]¶

Bases: NiceRepr

A view of an SQL table that behaves like a Python list

class kwcoco.coco_sql_dataset.SqlDictProxy(session, cls, keyattr=None, ignore_null=False)[source]¶

Bases: DictLike

Duck-types an SQL table as a dictionary of dictionaries.

The key is specified by an indexed column (by default it is the id column). The values are dictionaries containing all data for that row.

Note

With SQLite indexes are B-Trees so lookup is O(log(N)) and not O(1) as will regular dictionaries. Iteration should still be O(N), but databases have much more overhead than Python dictionaries.

Parameters:

session (sqlalchemy.orm.session.Session) – the sqlalchemy session
cls (Type) – the declarative sqlalchemy table class
keyattr (Column) – the indexed column to use as the keys
ignore_null (bool) – if True, ignores any keys set to NULL, otherwise NULL keys are allowed.

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> import pytest
>>> sql_dset, dct_dset = demo(num=10)
>>> proxy = sql_dset.index.anns

>>> keys = list(proxy.keys())
>>> values = list(proxy.values())
>>> items = list(proxy.items())
>>> item_keys = [t[0] for t in items]
>>> item_vals = [t[1] for t in items]
>>> lut_vals = [proxy[key] for key in keys]
>>> assert item_vals == lut_vals == values
>>> assert item_keys == keys
>>> assert len(proxy) == len(keys)

>>> goodkey1 = keys[1]
>>> badkey1 = -100000000000
>>> badkey2 = 'foobarbazbiz'
>>> assert goodkey1 in proxy
>>> assert badkey1 not in proxy
>>> assert badkey2 not in proxy
>>> with pytest.raises(KeyError):
>>>     proxy[badkey1]
>>> with pytest.raises(KeyError):
>>>     proxy[badkey2]
>>> badkey3 = object()
>>> assert badkey3 not in proxy
>>> with pytest.raises(KeyError):
>>>     proxy[badkey3]

>>> # xdoctest: +SKIP
>>> from kwcoco.coco_sql_dataset import _benchmark_dict_proxy_ops
>>> ti = _benchmark_dict_proxy_ops(proxy)
>>> print('ti.measures = {}'.format(ub.urepr(ti.measures, nl=2, align=':', precision=6)))

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> import kwcoco
>>> # Test the variant of the SqlDictProxy where we ignore None keys
>>> # This is the case for name_to_img and file_name_to_img
>>> dct_dset = kwcoco.CocoDataset.demo('shapes1')
>>> dct_dset.add_image(name='no_file_image1')
>>> dct_dset.add_image(name='no_file_image2')
>>> dct_dset.add_image(name='no_file_image3')
>>> sql_dset = dct_dset.view_sql(memory=True)
>>> assert len(dct_dset.index.imgs) == 4
>>> assert len(dct_dset.index.file_name_to_img) == 1
>>> assert len(dct_dset.index.name_to_img) == 3
>>> assert len(sql_dset.index.imgs) == 4
>>> assert len(sql_dset.index.file_name_to_img) == 1
>>> assert len(sql_dset.index.name_to_img) == 3

>>> proxy = sql_dset.index.file_name_to_img
>>> assert len(list(proxy.keys())) == 1
>>> assert len(list(proxy.values())) == 1

>>> proxy = sql_dset.index.name_to_img
>>> assert len(list(proxy.keys())) == 3
>>> assert len(list(proxy.values())) == 3

>>> proxy = sql_dset.index.imgs
>>> assert len(list(proxy.keys())) == 4
>>> assert len(list(proxy.values())) == 4

_uncached_getitem(key)[source]¶: The uncached getitem call

keys()[source]¶

values()[source]¶

items()[source]¶

class kwcoco.coco_sql_dataset.SqlIdGroupDictProxy(session, valattr, keyattr, parent_keyattr=None, order_attr=None, order_id=None)[source]¶

Bases: DictLike

Similar to SqlDictProxy, but maps ids to groups of other ids.

Simulates a dictionary that maps ids of a parent table to all ids of another table corresponding to rows where a specific column has that parent id.

The items in the group can be sorted by the order_attr if specified. The order_attr can belong to another table if parent_order_id and self_order_id are specified.

For example, imagine two tables: images with one column (id) and annotations with two columns (id, image_id). This class can help provide a mpaping from each image.id to a Set[annotation.id] where those annotation rows have annotation.image_id = image.id.

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> sql_dset, dct_dset = demo(num=10)
>>> proxy = sql_dset.index.gid_to_aids

>>> keys = list(proxy.keys())
>>> values = list(proxy.values())
>>> items = list(proxy.items())
>>> item_keys = [t[0] for t in items]
>>> item_vals = [t[1] for t in items]
>>> lut_vals = [proxy[key] for key in keys]
>>> assert item_vals == lut_vals == values
>>> assert item_keys == keys
>>> assert len(proxy) == len(keys)

>>> # xdoctest: +SKIP
>>> from kwcoco.coco_sql_dataset import _benchmark_dict_proxy_ops
>>> ti = _benchmark_dict_proxy_ops(proxy)
>>> print('ti.measures = {}'.format(ub.urepr(ti.measures, nl=2, align=':', precision=6)))

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> import kwcoco
>>> # Test the group sorted variant of this by using vidid_to_gids
>>> # where the "gids" must be sorted by the image frame indexes
>>> dct_dset = kwcoco.CocoDataset.demo('vidshapes1')
>>> dct_dset.add_image(name='frame-index-order-demo1', frame_index=-30, video_id=1)
>>> dct_dset.add_image(name='frame-index-order-demo2', frame_index=10, video_id=1)
>>> dct_dset.add_image(name='frame-index-order-demo3', frame_index=3, video_id=1)
>>> dct_dset.add_video(name='empty-video1')
>>> dct_dset.add_video(name='empty-video2')
>>> dct_dset.add_video(name='empty-video3')
>>> sql_dset = dct_dset.view_sql(memory=True)
>>> orig = dct_dset.index.vidid_to_gids
>>> proxy = sql_dset.index.vidid_to_gids
>>> from kwcoco.util.util_json import indexable_allclose
>>> assert indexable_allclose(orig, dict(proxy))
>>> items = list(proxy.items())
>>> vals = list(proxy.values())
>>> keys = list(proxy.keys())
>>> assert len(keys) == len(vals)
>>> assert dict(zip(keys, vals)) == dict(items)

Parameters:

session (sqlalchemy.orm.session.Session) – the sqlalchemy session
valattr (InstrumentedAttribute) – The column to lookup as a value
keyattr (InstrumentedAttribute) – The column to use as a key
parent_keyattr (InstrumentedAttribute | None) – The column of the table corresponding to the key. If unspecified the column in the indexed table is used which may be less efficient.
order_attr (InstrumentedAttribute | None) – This is the attribute that the returned results will be ordered by
order_id (InstrumentedAttribute | None) – if order_attr belongs to another table, then this must be a column of the value table that corresponds to the primary key of the table used for ordering (e.g. when ordering annotations by image frame index, this must be the annotation image id)

_uncached_getitem(key)[source]¶: getitem without the cache

keys()[source]¶

items()[source]¶

values()[source]¶

class kwcoco.coco_sql_dataset.CocoSqlIndex[source]¶

Bases: object

Simulates the dictionary provided by kwcoco.coco_dataset.CocoIndex

build(parent)[source]¶

_set_alchemy_mode(mode)[source]¶

kwcoco.coco_sql_dataset._handle_sql_uri(uri)[source]¶: Temporary function to deal with URI. Modern tools seem to use RFC 3968 URIs, but sqlalchemy uses RFC 1738. Attempt to gracefully handle special cases. With a better understanding of the above specs, this function may be able to be written more eloquently.

class kwcoco.coco_sql_dataset.CocoSqlDatabase(uri=None, tag=None, img_root=None)[source]¶

Bases: AbstractCocoDataset, MixinCocoAccessors, MixinCocoObjects, MixinCocoStats, MixinCocoDraw, NiceRepr

Provides an API nearly identical to kwcoco.CocoDatabase, but uses an SQL backend data store. This makes it robust to copy-on-write memory issues that arise when forking, as discussed in [1].

Note

By default constructing an instance of the CocoSqlDatabase does not create a connection to the databse. Use the connect() method to open a connection.

References

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> sql_dset, dct_dset = demo()
>>> dset1, dset2 = sql_dset, dct_dset
>>> tag1, tag2 = 'dset1', 'dset2'
>>> assert_dsets_allclose(sql_dset, dct_dset)

MEMORY_URI = 'sqlite:///:memory:'¶

classmethod coerce(data, backend=None)[source]¶

Create an SQL CocoDataset from the input pointer.

Example

import kwcoco dset = kwcoco.CocoDataset.demo(‘shapes8’) data = dset.fpath self = CocoSqlDatabase.coerce(data)

from kwcoco.coco_sql_dataset import CocoSqlDatabase import kwcoco dset = kwcoco.CocoDataset.coerce(‘spacenet7.kwcoco.json’)

self = CocoSqlDatabase.coerce(dset)

from kwcoco.coco_sql_dataset import CocoSqlDatabase sql_dset = CocoSqlDatabase.coerce(‘spacenet7.kwcoco.json’)

# from kwcoco.coco_sql_dataset import CocoSqlDatabase import kwcoco sql_dset = kwcoco.CocoDataset.coerce(‘_spacenet7.kwcoco.view.v006.sqlite’)

disconnect()[source]¶: Drop references to any SQL or cache objects

connect(readonly=False, verbose=0, must_exist=False)[source]¶

Connects this instance to the underlying database.

References

# details on read only mode, some of these didnt seem to work https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/dialects/sqlite/pysqlite.py#L71 https://github.com/pudo/dataset/issues/136 https://writeonly.wordpress.com/2009/07/16/simple-read-only-sqlalchemy-sessions/

CommandLine

KWCOCO_WITH_POSTGRESQL=1 xdoctest -m /home/joncrall/code/kwcoco/kwcoco/coco_sql_dataset.py CocoSqlDatabase.connect

Example

>>> # xdoctest: +REQUIRES(env:KWCOCO_WITH_POSTGRESQL)
>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> # xdoctest: +REQUIRES(module:psycopg2)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> dset = CocoSqlDatabase('postgresql+psycopg2://kwcoco:kwcoco_pw@localhost:5432/mydb')
>>> self = dset
>>> dset.connect(verbose=1)

property fpath¶

delete(verbose=0)[source]¶

table_names()[source]¶

populate_from(dset, verbose=1)[source]¶

Copy the information in a CocoDataset into this SQL database.

CommandLine

xdoctest -m kwcoco.coco_sql_dataset CocoSqlDatabase.populate_from:1

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import _benchmark_dset_readtime  # NOQA
>>> import kwcoco
>>> from kwcoco.coco_sql_dataset import *
>>> dset2 = dset = kwcoco.CocoDataset.demo()
>>> dset2.clear_annotations()
>>> dset1 = self = CocoSqlDatabase('sqlite:///:memory:')
>>> self.connect()
>>> self.populate_from(dset)
>>> dset1_images = list(dset1.dataset['images'])
>>> print('dset1_images = {}'.format(ub.urepr(dset1_images, nl=1)))
>>> print(dset2.dumps(newlines=True))
>>> assert_dsets_allclose(dset1, dset2, tag1='sql', tag2='dct')
>>> ti_sql = _benchmark_dset_readtime(dset1, 'sql')
>>> ti_dct = _benchmark_dset_readtime(dset2, 'dct')
>>> print('ti_sql.rankings = {}'.format(ub.urepr(ti_sql.rankings, nl=2, precision=6, align=':')))
>>> print('ti_dct.rankings = {}'.format(ub.urepr(ti_dct.rankings, nl=2, precision=6, align=':')))

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import _benchmark_dset_readtime  # NOQA
>>> import kwcoco
>>> from kwcoco.coco_sql_dataset import *
>>> dset2 = dset = kwcoco.CocoDataset.demo('vidshapes1')
>>> dset1 = self = CocoSqlDatabase('sqlite:///:memory:')
>>> self.connect()
>>> self.populate_from(dset)
>>> for tablename in dset1.dataset.keys():
>>>     print(tablename)
>>>     table = dset1.pandas_table(tablename)
>>>     print(table)

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import _benchmark_dset_readtime  # NOQA
>>> import kwcoco
>>> from kwcoco.coco_sql_dataset import *
>>> dset2 = dset = kwcoco.CocoDataset.demo()
>>> dset1 = self = CocoSqlDatabase('sqlite:///:memory:')
>>> self.connect()
>>> self.populate_from(dset)
>>> assert_dsets_allclose(dset1, dset2, tag1='sql', tag2='dct')
>>> ti_sql = _benchmark_dset_readtime(dset1, 'sql')
>>> ti_dct = _benchmark_dset_readtime(dset2, 'dct')
>>> print('ti_sql.rankings = {}'.format(ub.urepr(ti_sql.rankings, nl=2, precision=6, align=':')))
>>> print('ti_dct.rankings = {}'.format(ub.urepr(ti_dct.rankings, nl=2, precision=6, align=':')))

CommandLine

KWCOCO_WITH_POSTGRESQL=1 xdoctest -m /home/joncrall/code/kwcoco/kwcoco/coco_sql_dataset.py CocoSqlDatabase.populate_from:1

Example

>>> # xdoctest: +REQUIRES(env:KWCOCO_WITH_POSTGRESQL)
>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> # xdoctest: +REQUIRES(module:psycopg2)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> import kwcoco
>>> dset = dset2 = kwcoco.CocoDataset.demo()
>>> self = dset1 = CocoSqlDatabase('postgresql+psycopg2://kwcoco:kwcoco_pw@localhost:5432/test_populate')
>>> self.delete(verbose=1)
>>> self.connect(verbose=1)
>>> #self.populate_from(dset)

property dataset¶

property anns¶

property cats¶

property imgs¶

property name_to_cat¶

pandas_table(table_name, strict=False)[source]¶

Loads an entire SQL table as a pandas DataFrame

Parameters:: table_name (str) – name of the table
Returns:: pandas.DataFrame

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> # xdoctest: +REQUIRES(module:pandas)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> self, dset = demo()
>>> table_df = self.pandas_table('annotations')
>>> print(table_df)
>>> table_df = self.pandas_table('categories')
>>> print(table_df)
>>> table_df = self.pandas_table('videos')
>>> print(table_df)
>>> table_df = self.pandas_table('images')
>>> print(table_df)
>>> table_df = self.pandas_table('tracks')
>>> print(table_df)

raw_table(table_name)[source]¶

_raw_tables()[source]¶

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> import pandas as pd
>>> self, dset = demo()
>>> targets = self._raw_tables()
>>> for tblname, table in targets.items():
...     print(f'tblname={tblname}')
...     print(pd.DataFrame(table))

_column_lookup(tablename, key, rowids, default=NoParam, keepid=False)[source]¶

Convinience method to lookup only a single column of information

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> self, dset = demo(10)
>>> tablename = 'annotations'
>>> key = 'category_id'
>>> rowids = list(self.anns.keys())[::3]
>>> cids1 = self._column_lookup(tablename, key, rowids)
>>> cids2 = self.annots(rowids).get(key)
>>> cids3 = dset.annots(rowids).get(key)
>>> assert cids3 == cids2 == cids1
>>> # Test json columns work
>>> vals1 = self._column_lookup(tablename, 'bbox', rowids)
>>> vals2 = self.annots(rowids).lookup('bbox')
>>> vals3 = dset.annots(rowids).lookup('bbox')
>>> assert vals1 == vals2 == vals3
>>> vals1 = self._column_lookup(tablename, 'segmentation', rowids)
>>> vals2 = self.annots(rowids).lookup('segmentation')
>>> vals3 = dset.annots(rowids).lookup('segmentation')
>>> assert vals1 == vals2 == vals3

_all_rows_column_lookup(tablename, keys)[source]¶

Convinience method to look up all rows from a table and only a few columns.

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> self, dset = demo(10)
>>> tablename = 'annotations'
>>> keys = ['id', 'category_id']
>>> rows = self._all_rows_column_lookup(tablename, keys)

tabular_targets()[source]¶

Convinience method to create an in-memory summary of basic annotation properties with minimal SQL overhead.

Example

>>> # xdoctest: +REQUIRES(module:sqlalchemy)
>>> from kwcoco.coco_sql_dataset import *  # NOQA
>>> self, dset = demo()
>>> targets = self.tabular_targets()
>>> print(targets.pandas())

_table_names()[source]¶

property bundle_dpath¶

property data_fpath¶: data_fpath is an alias of fpath

_orig_coco_fpath()[source]¶

Hack to reconstruct the original name. Makes assumptions about how naming is handled elsewhere. There should be centralized logic about how to construct side-car names that can be queried for inversed like this.

Returns:: ub.Path | None

_abc_impl = <_abc._abc_data object>¶

_cached_hashid()[source]¶: Compatibility with the way the exiting cached hashid in the coco dataset is used. Both of these functions are private and subject to change (and need optimization).

kwcoco.coco_sql_dataset.cached_sql_coco_view(dct_db_fpath=None, sql_db_fpath=None, dset=None, force_rewrite=False, backend=None)[source]¶: Attempts to load a cached SQL-View dataset, only loading and converting the json dataset if necessary.

kwcoco.coco_sql_dataset.ensure_sql_coco_view(dset, db_fpath=None, force_rewrite=False, backend=None)[source]¶

Create a cached on-disk SQL view of an on-disk COCO dataset.

# DEPREICATE, use cache function instead

Note

This function is fragile. It depends on looking at file modified timestamps to determine if it needs to write the dataset.

kwcoco.coco_sql_dataset.demo(num=10, backend=None)[source]¶

kwcoco.coco_sql_dataset.assert_dsets_allclose(dset1, dset2, tag1='dset1', tag2='dset2')[source]¶

kwcoco.coco_sql_dataset._benchmark_dset_readtime(dset, tag='?', n=4, post_iterate=False)[source]¶: Helper for understanding the time differences between backends

Note

post_iterate ensures that all of the returned data is looked at by the python interpreter. Makes this a more fair comparison because python can just return pointers to the data, but only in the case where most of the data will touched. For one attribute lookups it is not a good test.

kwcoco.coco_sql_dataset._benchmark_dict_proxy_ops(proxy)[source]¶: Get insight on the efficiency of operations

kwcoco.coco_sql_dataset.devcheck()[source]¶

Scratch work for things that should eventually become unit or doc tests

from kwcoco.coco_sql_dataset import * # NOQA self, dset = demo()