17. Benchmarks¶

This page explains how to run the benchmark runner and interpret its outputs. For config structure, see the Configuration reference.

Use the bench runner when you want end-to-end, reproducible experiments. If you only need one brick, the CLI reference and the how-to guides may be a faster starting point.

17.1 How to run bench¶

Use the benchmark runner module with an experiment config:

python -m bench.main --config bench/configs/experiments/toy_inductive.yaml
python -m bench.main --config bench/configs/experiments/toy_transductive.yaml

Enable verbose logging for a run:

python -m bench.main --config bench/configs/experiments/toy_inductive.yaml --log-level detailed

The --log-level flag is defined on the bench CLI entry point. ^[1]

The bench entry point and example configs are in bench/main.py and bench/configs/experiments/. ^[1][2]

17.2 How outputs are stored¶

Each run writes a timestamped directory under runs/ with: - config.yaml (copied config) - run.json (metrics + metadata) - error.txt (only on failure)

These outputs are created by the run context and reporting orchestrator. ^[3][4][5]

17.3 How to interpret results¶

run.json includes: - run metadata (name, seed, status) - resolved config blocks - artifacts and metrics - HPO summary when search is enabled

This structure is written in bench/orchestrators/reporting.py. ^[4]

17.4 Reproducibility tips¶

Fix run.seed to make sampling, preprocessing, and method seeds deterministic. ^[6][3]
Keep the copied config.yaml alongside results for auditability. ^[3]
Caches for datasets, graphs, and views reduce re-downloads and make runs faster. ^[7][8]

Sources