4. Skip to content

4. Inductive tutorial: toy pseudo-label run

This is an end-to-end inductive walkthrough using the toy benchmark config. If you prefer step-by-step brick workflows, use the datasets, sampling, preprocess, and evaluation guides.

4.1 Goal

Run a full inductive SSL experiment on the built-in toy dataset using the benchmark runner and a YAML config. [1][2][3]

4.2 Why this tutorial

Use this tutorial when your method consumes feature matrices and labeled/unlabeled splits (InductiveDataset) and you do not need an explicit graph. If your method expects a graph and node masks, use the transductive tutorial instead. [19][20]

This walkthrough uses the bench runner because it validates a single YAML config and orchestrates the full pipeline (dataset, sampling, preprocess, method, evaluation). For individual bricks, start with the dataset, sampling, preprocess, and evaluation how-to guides instead. [1][9]

4.3 Prerequisites

  • Python 3.11+ with ModSSC installed from source (bench runner is in the repo). [4][5]

  • No extra dependencies are required for the toy dataset and numpy backends used here. [2][6]

4.4 Files used

4.5 Step by step commands

1) Install the repo in editable mode:

python -m pip install -e "."

2) Run the inductive toy experiment:

python -m bench.main --config bench/configs/experiments/toy_inductive.yaml

The bench runner and the example config are in bench/main.py and bench/configs/experiments/toy_inductive.yaml. [1][2]

4.6 Full YAML config used

This is the full config file from bench/configs/experiments/toy_inductive.yaml:

run:
  name: "toy_pseudo_label_numpy"
  seed: 42
  output_dir: "runs"
  fail_fast: true

dataset:
  id: "toy"

sampling:
  seed: 42
  plan:
    split:
      kind: "holdout"
      test_fraction: 0.0
      val_fraction: 0.2
      stratify: true
      shuffle: true
    labeling:
      mode: "fraction"
      value: 0.2
      strategy: "balanced"
      min_per_class: 1
    imbalance:
      kind: "none"
    policy:
      respect_official_test: true
      allow_override_official: false

preprocess:
  seed: 42
  fit_on: "train_labeled"
  cache: true
  plan:
    output_key: "features.X"
    steps:
      - id: "core.ensure_2d"
      - id: "core.to_numpy"

method:
  kind: "inductive"
  id: "pseudo_label"
  device:
    device: "auto"
    dtype: "float32"
  params:
    classifier_id: "knn"
    classifier_backend: "numpy"
    max_iter: 5
    confidence_threshold: 0.8

evaluation:
  split_for_model_selection: "val"
  report_splits: ["val", "test"]
  metrics: ["accuracy", "macro_f1"]

4.7 Expected outputs and where they appear

A new run directory is created under runs/ with: - config.yaml (config snapshot) - run.json (metrics and metadata) - error.txt (if the run fails)

These outputs are written by the bench context and reporting orchestrator. [7][8]

4.8 How it works

  • bench/main.py loads the YAML, validates it against the schema, and orchestrates each stage. [1][9]

  • The toy dataset is loaded via the data loader and cached. [10][3]

  • Sampling produces labeled/unlabeled splits using the sampling plan. [11][12]

  • Preprocess steps convert raw features into 2D numpy arrays. [13][14][15]

  • The pseudo-label method runs with a numpy kNN classifier. [16][6]

4.9 Common pitfalls and troubleshooting

Warning

If the run fails because runs/ already contains a folder with the same name and timestamp collision, delete the old folder and rerun. The run directory is created with exist_ok=False. [7]

Tip

Use modssc --log-level detailed to increase logging detail if a stage fails. [17][18]

Sources
  1. bench/main.py
  2. bench/configs/experiments/toy_inductive.yaml
  3. src/modssc/data_loader/catalog/toy.py
  4. pyproject.toml
  5. bench/README.md
  6. src/modssc/supervised/backends/numpy/knn.py
  7. bench/context.py
  8. bench/orchestrators/reporting.py
  9. bench/schema.py
  10. src/modssc/data_loader/api.py
  11. src/modssc/sampling/api.py
  12. src/modssc/sampling/plan.py
  13. src/modssc/preprocess/plan.py
  14. src/modssc/preprocess/steps/core/ensure_2d.py
  15. src/modssc/preprocess/steps/core/to_numpy.py
  16. src/modssc/inductive/methods/pseudo_label.py
  17. src/modssc/logging.py
  18. src/modssc/cli/app.py
  19. src/modssc/inductive/types.py
  20. src/modssc/transductive/base.py