kwcoco.metrics.sklearn_alts

Faster pure-python versions of sklearn functions that avoid expensive checks and label rectifications. It is assumed that all labels are consecutive non-negative integers.

Module Contents

Functions

confusion_matrix(y_true, y_pred, n_labels=None, labels=None, sample_weight=None)

faster version of sklearn confusion matrix that avoids the

global_accuracy_from_confusion(cfsn)

class_accuracy_from_confusion(cfsn)

_binary_clf_curve2(y_true, y_score, pos_label=None, sample_weight=None)

MODIFIED VERSION OF SCIKIT-LEARN API

Attributes

profile

kwcoco.metrics.sklearn_alts.profile[source]
kwcoco.metrics.sklearn_alts.confusion_matrix(y_true, y_pred, n_labels=None, labels=None, sample_weight=None)[source]

faster version of sklearn confusion matrix that avoids the expensive checks and label rectification

Runs in about 0.7ms

Returns

matrix where rows represent real and cols represent pred

Return type

ndarray

Example

>>> y_true = np.array([0, 0, 0, 0, 1, 1, 1, 0,  0, 1])
>>> y_pred = np.array([0, 0, 0, 0, 0, 0, 0, 1,  1, 1])
>>> confusion_matrix(y_true, y_pred, 2)
array([[4, 2],
       [3, 1]])
>>> confusion_matrix(y_true, y_pred, 2).ravel()
array([4, 2, 3, 1])
Benchmarks:

import ubelt as ub y_true = np.random.randint(0, 2, 10000) y_pred = np.random.randint(0, 2, 10000)

n = 1000 for timer in ub.Timerit(n, bestof=10, label=’py-time’):

sample_weight = [1] * len(y_true) confusion_matrix(y_true, y_pred, 2, sample_weight=sample_weight)

for timer in ub.Timerit(n, bestof=10, label=’np-time’):

sample_weight = np.ones(len(y_true), dtype=int) confusion_matrix(y_true, y_pred, 2, sample_weight=sample_weight)

kwcoco.metrics.sklearn_alts.global_accuracy_from_confusion(cfsn)[source]
kwcoco.metrics.sklearn_alts.class_accuracy_from_confusion(cfsn)[source]
kwcoco.metrics.sklearn_alts._binary_clf_curve2(y_true, y_score, pos_label=None, sample_weight=None)[source]

MODIFIED VERSION OF SCIKIT-LEARN API

Calculate true and false positives per binary classification threshold.

Parameters
  • y_true (array, shape = [n_samples]) – True targets of binary classification

  • y_score (array, shape = [n_samples]) – Estimated probabilities or decision function

  • pos_label (int or str, default=None) – The label of the positive class

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

  • fps (array, shape = [n_thresholds]) – A count of false positives, at index i being the number of negative samples assigned a score >= thresholds[i]. The total number of negative samples is equal to fps[-1] (thus true negatives are given by fps[-1] - fps).

  • tps (array, shape = [n_thresholds <= len(np.unique(y_score))]) – An increasing count of true positives, at index i being the number of positive samples assigned a score >= thresholds[i]. The total number of positive samples is equal to tps[-1] (thus false negatives are given by tps[-1] - tps).

  • thresholds (array, shape = [n_thresholds]) – Decreasing score values.

Example

>>> y_true  = [      1,   1,   1,   1,   1,   1,   0]
>>> y_score = [ np.nan, 0.2, 0.3, 0.4, 0.5, 0.6, 0.3]
>>> sample_weight = None
>>> pos_label = None
>>> fps, tps, thresholds = _binary_clf_curve2(y_true, y_score)