3. Skip to content

3. Concepts

This page introduces the key ideas and vocabulary used throughout the docs. For runnable examples, go to the inductive tutorial or transductive tutorial.

3.1 Problem framing

ModSSC targets semi-supervised classification, where a small labeled set and a larger unlabeled set are used together. This framing is reflected in the inductive and transductive bricks and their datasets. [1][2][3]

3.2 Inductive vs transductive in this project

Inductive methods operate on feature matrices and labeled/unlabeled splits, without requiring a graph. The inductive brick lives in src/modssc/inductive/ and validates InductiveDataset inputs. [4][5][1]

Transductive methods operate on a fixed graph over all nodes and accept NodeDataset-like objects with a graph and optional masks. Sampling outputs for graph datasets use masks like train/val/test/labeled/unlabeled. The transductive brick lives in src/modssc/transductive/, and graph utilities are in src/modssc/graph/. [2][6][7][8]

3.3 Key abstractions in this codebase

  • Dataset catalog and providers: curated dataset keys and provider URIs for downloading and caching. [9][10][11]

  • Sampling plans: deterministic split + labeling specs that produce reproducible indices/masks. [12][13]

  • Preprocess plans: ordered steps that transform raw datasets into feature representations. [14][15]

  • Graph specs and views: graph construction specs and view generation (attr/diffusion/struct). [16][17]

  • View plans: multi-view feature generation for methods like co-training. [18][19]

  • Registries: method registries for inductive and transductive algorithms. [20][21]

  • Benchmark configs: end-to-end experiment configuration for reproducible runs. [22][23]

3.4 Small illustrative examples

Inductive dataset payload (labeled + unlabeled):

import numpy as np
from modssc.inductive import InductiveDataset

X_l = np.random.randn(10, 4)
y_l = np.random.randint(0, 3, size=(10,))
X_u = np.random.randn(50, 4)

payload = InductiveDataset(X_l=X_l, y_l=y_l, X_u=X_u)

Transductive dataset payload (graph + masks):

import numpy as np
from modssc.graph import GraphBuilderSpec, build_graph
from modssc.graph.artifacts import NodeDataset

X = np.random.randn(20, 8).astype(np.float32)
edge_spec = GraphBuilderSpec(scheme="knn", metric="cosine", k=3)
graph = build_graph(X, spec=edge_spec, seed=0, cache=False)

train_mask = np.zeros((20,), dtype=bool)
train_mask[:3] = True
node_data = NodeDataset(X=X, y=np.zeros((20,), dtype=np.int64), graph=graph, masks={"train_mask": train_mask})

The inductive and transductive dataset types are defined in src/modssc/inductive/types.py and src/modssc/graph/artifacts.py, and graph construction is implemented in src/modssc/graph/construction/api.py. [1][6][24]

Sources
  1. src/modssc/inductive/types.py
  2. src/modssc/transductive/base.py
  3. README.md
  4. src/modssc/inductive/
  5. src/modssc/inductive/validation.py
  6. src/modssc/graph/artifacts.py
  7. src/modssc/sampling/result.py
  8. src/modssc/graph/
  9. src/modssc/data_loader/catalog/
  10. src/modssc/data_loader/providers/
  11. src/modssc/data_loader/api.py
  12. src/modssc/sampling/plan.py
  13. src/modssc/sampling/api.py
  14. src/modssc/preprocess/plan.py
  15. src/modssc/preprocess/catalog.py
  16. src/modssc/graph/specs.py
  17. src/modssc/graph/featurization/api.py
  18. src/modssc/views/plan.py
  19. src/modssc/views/api.py
  20. src/modssc/inductive/registry.py
  21. src/modssc/transductive/registry.py
  22. bench/schema.py
  23. bench/configs/experiments/
  24. src/modssc/graph/construction/api.py