kwcoco.metrics.clf_report module¶

kwcoco.metrics.clf_report.classification_report(y_true, y_pred, target_names=None, sample_weight=None, verbose=False, remove_unsupported=False, log=None, ascii_only=False)[source]¶

Computes a classification report which is a collection of various metrics commonly used to evaulate classification quality. This can handle binary and multiclass settings.

Note that this function does not accept probabilities or scores and must instead act on final decisions. See ovr_classification_report for a probability based report function using a one-vs-rest strategy.

This emulates the bm(cm) Matlab script [MatlabBM] written by David Powers that is used for computing bookmaker, markedness, and various other scores and is based on the paper [PowersMetrics].

References

[PowersMetrics]

https://csem.flinders.edu.au/research/techreps/SIE07001.pdf

[MatlabBM]

https://www.mathworks.com/matlabcentral/fileexchange/5648-bm-cm-?requestedDomain=www.mathworks.com

[MulticlassMCC]

Jurman, Riccadonna, Furlanello, (2012). A Comparison of MCC and CEN Error Measures in MultiClass Prediction

Parameters:

y_true (ndarray) – true labels for each item
y_pred (ndarray) – predicted labels for each item
target_names (List | None) – mapping from label to category name
sample_weight (ndarray | None) – weight for each item
verbose (int) – print if True
log (callable | None) – print or logging function
remove_unsupported (bool) – removes categories that have no support. Defaults to False.
ascii_only (bool) – if True dont use unicode characters. if the environ ASCII_ONLY is present this is forced to True and cannot be undone. Defaults to False.

Example

>>> # xdoctest: +IGNORE_WANT
>>> # xdoctest: +REQUIRES(module:sklearn)
>>> # xdoctest: +REQUIRES(module:pandas)
>>> y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
>>> y_pred = [1, 2, 1, 3, 1, 2, 2, 3, 2, 2, 3, 3, 2, 3, 3, 3, 1, 3]
>>> target_names = None
>>> sample_weight = None
>>> report = classification_report(y_true, y_pred, verbose=0, ascii_only=1)
>>> print(report['confusion'])
pred  1  2  3  Σr
real
1     3  1  1   5
2     0  4  1   5
3     1  1  6   8
Σp    4  6  8  18
>>> print(report['metrics'])
metric    precision  recall    fpr  markedness  bookmaker    mcc  support
class
1            0.7500  0.6000 0.0769      0.6071     0.5231 0.5635        5
2            0.6667  0.8000 0.1538      0.5833     0.6462 0.6139        5
3            0.7500  0.7500 0.2000      0.5500     0.5500 0.5500        8
combined     0.7269  0.7222 0.1530      0.5751     0.5761 0.5758       18

Example

>>> # xdoctest: +IGNORE_WANT
>>> # xdoctest: +REQUIRES(module:sklearn)
>>> # xdoctest: +REQUIRES(module:pandas)
>>> from kwcoco.metrics.clf_report import *  # NOQA
>>> y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
>>> y_pred = [1, 2, 1, 3, 1, 2, 2, 3, 2, 2, 3, 3, 2, 3, 3, 3, 1, 3]
>>> target_names = None
>>> sample_weight = None
>>> logs = []
>>> report = classification_report(y_true, y_pred, verbose=1, ascii_only=True, log=logs.append)
>>> print('\n'.join(logs))

kwcoco.metrics.clf_report.ovr_classification_report(mc_y_true, mc_probs, target_names=None, sample_weight=None, metrics=None, verbose=0, remove_unsupported=False, log=None)[source]¶

One-vs-rest classification report

Parameters:

mc_y_true (ndarray) – multiclass truth labels (integer label format). Shape [N].
mc_probs (ndarray) – multiclass probabilities for each class. Shape [N x C].
target_names (Dict[int, str] | None) – mapping from int label to string name
sample_weight (ndarray | None) – weight for each item. Shape [N].
metrics (List[str] | None) – names of metrics to compute

Example

>>> # xdoctest: +IGNORE_WANT
>>> # xdoctest: +REQUIRES(module:sklearn)
>>> # xdoctest: +REQUIRES(module:pandas)
>>> from kwcoco.metrics.clf_report import *  # NOQA
>>> y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0]
>>> y_probs = np.random.rand(len(y_true), max(y_true) + 1)
>>> target_names = None
>>> sample_weight = None
>>> verbose = True
>>> report = ovr_classification_report(y_true, y_probs)
>>> print(report['ave'])
auc     0.6541
ap      0.6824
kappa   0.0963
mcc     0.1002
brier   0.2214
dtype: float64
>>> print(report['ovr'])
     auc     ap  kappa    mcc  brier  support  weight
0 0.6062 0.6161 0.0526 0.0598 0.2608        8  0.4444
1 0.5846 0.6014 0.0000 0.0000 0.2195        5  0.2778
2 0.8000 0.8693 0.2623 0.2652 0.1602        5  0.2778