12. How to compute evaluation metrics¶
Use this recipe to list metrics and compute them from predictions or score matrices. It shows both CLI and Python options, with Catalogs and registries linked for the full metric list.
12.1 Problem statement¶
You need consistent accuracy and F1 metrics for model outputs. [1][2] Use the metric IDs from this page in configs or scripts so results are comparable.
12.2 When to use¶
Use these helpers when scoring predictions produced by inductive or transductive methods. [3]
12.3 Steps¶
1) Inspect available metrics. [1][2]
2) Compute metrics from label arrays or score matrices. [1]
3) (Optional) Use the CLI for quick checks on .npy files. [2]
For the full list of metric IDs, see Catalogs and registries.
12.4 Copy-paste example¶
Use the CLI when you want file-based evaluation from .npy arrays, and use the Python API when you already have arrays in memory. [2][1]
CLI:
modssc evaluation list
modssc evaluation compute --y-true y_true.npy --y-pred y_pred.npy --metric accuracy --metric macro_f1
Python:
import numpy as np
from modssc.evaluation import evaluate
y_true = np.array([0, 1, 2, 1, 0])
scores = np.array([
[0.7, 0.2, 0.1],
[0.1, 0.8, 0.1],
[0.2, 0.3, 0.5],
[0.2, 0.6, 0.2],
[0.6, 0.3, 0.1],
])
print(evaluate(y_true, scores, ["accuracy", "macro_f1"]))
12.5 Pitfalls¶
Warning
modssc evaluation compute only accepts .npy files and will reject other formats. [2]
Tip
evaluate accepts one-hot labels or class scores; it converts them internally. [1]