17. Benchmarks¶
This page explains how to run the benchmark runner and interpret its outputs. For config structure, see the Configuration reference.
Use the bench runner when you want end-to-end, reproducible experiments. If you only need one brick, the CLI reference and the how-to guides may be a faster starting point.
17.1 How to run bench¶
Use the benchmark runner module with an experiment config:
python -m bench.main --config bench/configs/experiments/toy_inductive.yaml
python -m bench.main --config bench/configs/experiments/toy_transductive.yaml
Enable verbose logging for a run:
python -m bench.main --config bench/configs/experiments/toy_inductive.yaml --log-level detailed
The --log-level flag is defined on the bench CLI entry point. [1]
The bench entry point and example configs are in bench/main.py and bench/configs/experiments/. [1][2]
17.2 How outputs are stored¶
Each run writes a timestamped directory under runs/ with:
- config.yaml (copied config)
- run.json (metrics + metadata)
- error.txt (only on failure)
These outputs are created by the run context and reporting orchestrator. [3][4][5]
17.3 How to interpret results¶
run.json includes:
- run metadata (name, seed, status)
- resolved config blocks
- artifacts and metrics
- HPO summary when search is enabled
This structure is written in bench/orchestrators/reporting.py. [4]