5. Transductive tutorial: toy label propagation¶

This is an end-to-end transductive walkthrough with graph construction and label propagation. If you want to assemble it brick-by-brick, start with datasets, sampling, preprocess, and graph.

5.1 Goal¶

Run a full transductive SSL experiment on the toy dataset using a graph construction spec and label propagation. ^[1][2][3]

5.2 Why this tutorial¶

Use this tutorial when your method expects a graph and node masks (NodeDatasetLike) and you plan to run a graph construction step. If you only need feature matrices without a graph, use the inductive tutorial instead. ^[14]

This walkthrough uses the bench runner because it validates a single YAML config and orchestrates dataset, sampling, preprocess, graph build, and method execution. For individual bricks, start with the dataset, sampling, preprocess, and graph how-to guides instead. ^[1][10]

5.3 Prerequisites¶

Python 3.11+ with ModSSC installed from source (bench runner is in the repo). ^[4][5]
No extra dependencies are required for the toy dataset and numpy graph backend. ^[2][6]

5.4 Files used¶

Benchmark entry point: bench/main.py
Experiment config: bench/configs/experiments/toy_transductive.yaml
Graph spec schema: src/modssc/graph/specs.py

5.5 Step by step commands¶

1) Install the repo in editable mode:

python -m pip install -e "."

2) Run the transductive toy experiment:

python -m bench.main --config bench/configs/experiments/toy_transductive.yaml

The benchmark runner and config paths are in bench/main.py and bench/configs/experiments/toy_transductive.yaml. ^[1][2]

5.6 Full YAML config used¶

This is the full config file from bench/configs/experiments/toy_transductive.yaml:

run:
  name: "toy_label_propagation_knn"
  seed: 7
  output_dir: "runs"
  fail_fast: true

dataset:
  id: "toy"

sampling:
  seed: 7
  plan:
    split:
      kind: "holdout"
      test_fraction: 0.0
      val_fraction: 0.2
      stratify: true
      shuffle: true
    labeling:
      mode: "fraction"
      value: 0.1
      strategy: "balanced"
      min_per_class: 1
    imbalance:
      kind: "none"
    policy:
      respect_official_test: true
      allow_override_official: false

preprocess:
  seed: 7
  fit_on: "train_labeled"
  cache: true
  plan:
    output_key: "features.X"
    steps:
      - id: "core.ensure_2d"
      - id: "core.to_numpy"

graph:
  enabled: true
  seed: 7
  cache: true
  spec:
    scheme: "knn"
    metric: "euclidean"
    k: 8
    symmetrize: "mutual"
    weights:
      kind: "heat"
      sigma: 1.0
    normalize: "rw"
    self_loops: true
    backend: "numpy"
    chunk_size: 128
    feature_field: "features.X"

method:
  kind: "transductive"
  id: "label_propagation"
  device:
    device: "auto"
    dtype: "float32"
  params:
    max_iter: 50
    tol: 1.0e-4
    normalize_rows: true

evaluation:
  report_splits: ["val", "test"]
  metrics: ["accuracy", "macro_f1"]

5.7 Expected outputs and where they appear¶

A run directory is created under runs/ with the config snapshot and the run.json summary. ^[7][8]

Graph artifacts are cached when graph.cache: true is set. The cache layout is managed by modssc.graph.cache.GraphCache. ^[9][2]

5.8 How it works¶

The bench runner validates the config and orchestrates dataset, sampling, preprocess, graph build, and method execution. ^[1][10]
The graph is constructed using the GraphBuilderSpec fields in the config. ^[11][12]
Label propagation runs with hard clamping over the graph. ^[3]

5.9 Common pitfalls and troubleshooting¶

Warning

Transductive methods require a graph; if graph.enabled is false and the dataset is not a graph dataset, the bench runner raises a config error. ^[1]

Tip

Use modssc graph build --help to see graph options and validate the spec. ^[13]

Sources