6. Glossary¶

Use this page when a term appears in the docs and you want the shortest stable definition before going deeper.

6.1 Core workflow terms¶

benchmark runner: the repository-level execution path driven by python -m bench.main --config .... It orchestrates dataset loading, sampling, preprocess, optional graph and views stages, method execution, and reporting.
catalog: a curated list of IDs and metadata, such as datasets or preprocess steps, exposed through CLI and Python helpers.
provider: the backend that knows how to fetch or materialize a dataset, such as OpenML, Hugging Face, TFDS, torchvision, torchaudio, or PyG.
modality: the data family a dataset or step primarily targets, such as tabular, text, vision, audio, or graph.
method ID: the string used to select a learning method in a registry or config, for example pseudo_label or label_propagation.
classifier ID: the string used to select a supervised baseline or model backend inside an inductive method configuration.

inductive: a workflow where the method learns from labeled and unlabeled examples and must generalize to unseen samples without requiring an explicit graph.
transductive: a workflow where the method operates on a fixed graph over all nodes and predicts labels inside that graph.
split: the partition of data into subsets such as train, validation, and test.
labeling: the rule that decides how many train examples remain labeled versus unlabeled inside a semi-supervised split.
view: an alternative feature representation of the same examples, often used for multi-view methods such as co-training.
graph spec: the structured description of how to build a graph, including scheme, metric, backend, and weighting choices.

fingerprint: the stable hash-like identity ModSSC derives from inputs such as dataset content, config blocks, seeds, and selected fields to name cache artifacts deterministically.
cache: the on-disk storage for reusable artifacts such as downloaded datasets, preprocess outputs, graphs, and graph views.
fit_on: the subset used to fit preprocess steps that learn statistics, such as scaling or PCA.
official split: a provider-defined train or test partition that comes from the dataset source rather than from a user-defined split plan.

optional extra: an install group exposed by Python packaging, such as graph, preprocess-text, or transductive-torch.
required_extra: the metadata field that tells you which optional extra must be installed for a dataset, step, model, or method to be usable.
available-only: a registry filter that hides entries whose optional dependencies are currently missing.

respect_official_test: sampling policy flag that keeps the provider test split when the dataset ships one.
use_official_graph_masks: sampling policy flag that preserves graph-native train, validation, and test masks when the dataset already defines them.
allow_override_official: sampling policy flag that lets an inductive custom split override the provider test partition and instead resplit dataset.train.

Sources