kwcoco.metrics.sklearn_alts module

Faster pure-python versions of sklearn functions that avoid expensive checks and label rectifications. It is assumed that all labels are consecutive non-negative integers.

kwcoco.metrics.sklearn_alts.confusion_matrix(y_true, y_pred, n_labels=None, labels=None, sample_weight=None)[source]

faster version of sklearn confusion matrix that avoids the expensive checks and label rectification

Runs in about 0.7ms

Returns:

matrix where rows represent real and cols represent pred

Return type:

ndarray

Example

>>> y_true = np.array([0, 0, 0, 0, 1, 1, 1, 0,  0, 1])
>>> y_pred = np.array([0, 0, 0, 0, 0, 0, 0, 1,  1, 1])
>>> confusion_matrix(y_true, y_pred, 2)
array([[4, 2],
       [3, 1]]...)
>>> confusion_matrix(y_true, y_pred, 2).ravel()
array([4, 2, 3, 1]...)

Benchmark

>>> # xdoctest: +SKIP
>>> import ubelt as ub
>>> y_true = np.random.randint(0, 2, 10000)
>>> y_pred = np.random.randint(0, 2, 10000)
>>> n = 1000
>>> for timer in ub.Timerit(n, bestof=10, label='py-time'):
>>>     sample_weight = [1] * len(y_true)
>>>     confusion_matrix(y_true, y_pred, 2, sample_weight=sample_weight)
>>> for timer in ub.Timerit(n, bestof=10, label='np-time'):
>>>     sample_weight = np.ones(len(y_true), dtype=int)
>>>     confusion_matrix(y_true, y_pred, 2, sample_weight=sample_weight)
kwcoco.metrics.sklearn_alts.global_accuracy_from_confusion(cfsn)[source]
kwcoco.metrics.sklearn_alts.class_accuracy_from_confusion(cfsn)[source]
kwcoco.metrics.sklearn_alts._binary_clf_curve2(y_true, y_score, pos_label=None, sample_weight=None)[source]

MODIFIED VERSION OF SCIKIT-LEARN API

Calculate true and false positives per binary classification threshold.

Parameters:
  • y_true (array, shape = [n_samples]) – True targets of binary classification

  • y_score (array, shape = [n_samples]) – Estimated probabilities or decision function

  • pos_label (int or str, default=None) – The label of the positive class

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

  • fps (array, shape = [n_thresholds]) – A count of false positives, at index i being the number of negative samples assigned a score >= thresholds[i]. The total number of negative samples is equal to fps[-1] (thus true negatives are given by fps[-1] - fps).

  • tps (array, shape = [n_thresholds <= len(np.unique(y_score))]) – An increasing count of true positives, at index i being the number of positive samples assigned a score >= thresholds[i]. The total number of positive samples is equal to tps[-1] (thus false negatives are given by tps[-1] - tps).

  • thresholds (array, shape = [n_thresholds]) – Decreasing score values.

Example

>>> y_true  = [      1,   1,   1,   1,   1,   1,   0]
>>> y_score = [ np.nan, 0.2, 0.3, 0.4, 0.5, 0.6, 0.3]
>>> sample_weight = None
>>> pos_label = None
>>> fps, tps, thresholds = _binary_clf_curve2(y_true, y_score)