5. Transductive tutorial: toy label propagation¶
This is an end-to-end transductive walkthrough with graph construction and label propagation. If you want to assemble it brick-by-brick, start with datasets, sampling, preprocess, and graph.
5.1 Goal¶
Run a full transductive SSL experiment on the toy dataset using a graph construction spec and label propagation. [1][2][3]
5.2 Why this tutorial¶
Use this tutorial when your method expects a graph and node masks (NodeDatasetLike) and you plan to run a graph construction step. If you only need feature matrices without a graph, use the inductive tutorial instead. [14]
This walkthrough uses the bench runner because it validates a single YAML config and orchestrates dataset, sampling, preprocess, graph build, and method execution. For individual bricks, start with the dataset, sampling, preprocess, and graph how-to guides instead. [1][10]
5.3 Prerequisites¶
-
Python 3.11+ with ModSSC installed from source (bench runner is in the repo). [4][5]
-
No extra dependencies are required for the toy dataset and numpy graph backend. [2][6]
5.4 Files used¶
- Benchmark entry point:
bench/main.py - Experiment config:
bench/configs/experiments/toy_transductive.yaml - Graph spec schema:
src/modssc/graph/specs.py
5.5 Step by step commands¶
1) Install the repo in editable mode:
python -m pip install -e "."
2) Run the transductive toy experiment:
python -m bench.main --config bench/configs/experiments/toy_transductive.yaml
The benchmark runner and config paths are in bench/main.py and bench/configs/experiments/toy_transductive.yaml. [1][2]
5.6 Full YAML config used¶
This is the full config file from bench/configs/experiments/toy_transductive.yaml:
run:
name: "toy_label_propagation_knn"
seed: 7
output_dir: "runs"
fail_fast: true
dataset:
id: "toy"
sampling:
seed: 7
plan:
split:
kind: "holdout"
test_fraction: 0.0
val_fraction: 0.2
stratify: true
shuffle: true
labeling:
mode: "fraction"
value: 0.1
strategy: "balanced"
min_per_class: 1
imbalance:
kind: "none"
policy:
respect_official_test: true
allow_override_official: false
preprocess:
seed: 7
fit_on: "train_labeled"
cache: true
plan:
output_key: "features.X"
steps:
- id: "core.ensure_2d"
- id: "core.to_numpy"
graph:
enabled: true
seed: 7
cache: true
spec:
scheme: "knn"
metric: "euclidean"
k: 8
symmetrize: "mutual"
weights:
kind: "heat"
sigma: 1.0
normalize: "rw"
self_loops: true
backend: "numpy"
chunk_size: 128
feature_field: "features.X"
method:
kind: "transductive"
id: "label_propagation"
device:
device: "auto"
dtype: "float32"
params:
max_iter: 50
tol: 1.0e-4
normalize_rows: true
evaluation:
report_splits: ["val", "test"]
metrics: ["accuracy", "macro_f1"]
5.7 Expected outputs and where they appear¶
A run directory is created under runs/ with the config snapshot and the run.json summary. [7][8]
Graph artifacts are cached when graph.cache: true is set. The cache layout is managed by modssc.graph.cache.GraphCache. [9][2]
5.8 How it works¶
-
The bench runner validates the config and orchestrates dataset, sampling, preprocess, graph build, and method execution. [1][10]
-
The graph is constructed using the
GraphBuilderSpecfields in the config. [11][12] -
Label propagation runs with hard clamping over the graph. [3]
5.9 Common pitfalls and troubleshooting¶
Warning
Transductive methods require a graph; if graph.enabled is false and the dataset is not a graph dataset, the bench runner raises a config error. [1]
Tip
Use modssc graph build --help to see graph options and validate the spec. [13]
5.10 Related links¶
Sources
bench/main.pybench/configs/experiments/toy_transductive.yamlsrc/modssc/transductive/methods/classic/label_propagation.pypyproject.tomlbench/README.mdsrc/modssc/graph/construction/backends/numpy_backend.pybench/context.pybench/orchestrators/reporting.pysrc/modssc/graph/cache.pybench/schema.pysrc/modssc/graph/specs.pysrc/modssc/graph/construction/api.pysrc/modssc/cli/graph.pysrc/modssc/transductive/base.py