3. Concepts¶

This page introduces the key ideas and vocabulary used throughout the docs. For runnable examples, go to the inductive tutorial or transductive tutorial.

3.1 Problem framing¶

ModSSC targets semi-supervised classification, where a small labeled set and a larger unlabeled set are used together. This framing is reflected in the inductive and transductive bricks and their datasets. ^[1][2][3]

3.2 Inductive vs transductive in this project¶

Inductive methods operate on feature matrices and labeled/unlabeled splits, without requiring a graph. The inductive brick lives in src/modssc/inductive/ and validates InductiveDataset inputs. ^[4][5][1]

Transductive methods operate on a fixed graph over all nodes and accept NodeDataset-like objects with a graph and optional masks. Sampling outputs for graph datasets use masks like train/val/test/labeled/unlabeled. The transductive brick lives in src/modssc/transductive/, and graph utilities are in src/modssc/graph/. ^[2][6][7][8]

3.3 Key abstractions in this codebase¶

Dataset catalog and providers: curated dataset keys and provider URIs for downloading and caching. ^[9][10][11]
Sampling plans: deterministic split + labeling specs that produce reproducible indices/masks. ^[12][13]
Preprocess plans: ordered steps that transform raw datasets into feature representations. ^[14][15]
Graph specs and views: graph construction specs and view generation (attr/diffusion/struct). ^[16][17]
View plans: multi-view feature generation for methods like co-training. ^[18][19]
Registries: method registries for inductive and transductive algorithms. ^[20][21]
Benchmark configs: end-to-end experiment configuration for reproducible runs. ^[22][23]

3.4 Small illustrative examples¶

Inductive dataset payload (labeled + unlabeled):

import numpy as np
from modssc.inductive import InductiveDataset

X_l = np.random.randn(10, 4)
y_l = np.random.randint(0, 3, size=(10,))
X_u = np.random.randn(50, 4)

payload = InductiveDataset(X_l=X_l, y_l=y_l, X_u=X_u)

Transductive dataset payload (graph + masks):

import numpy as np
from modssc.graph import GraphBuilderSpec, build_graph
from modssc.graph.artifacts import NodeDataset

X = np.random.randn(20, 8).astype(np.float32)
edge_spec = GraphBuilderSpec(scheme="knn", metric="cosine", k=3)
graph = build_graph(X, spec=edge_spec, seed=0, cache=False)

train_mask = np.zeros((20,), dtype=bool)
train_mask[:3] = True
node_data = NodeDataset(X=X, y=np.zeros((20,), dtype=np.int64), graph=graph, masks={"train_mask": train_mask})

The inductive and transductive dataset types are defined in src/modssc/inductive/types.py and src/modssc/graph/artifacts.py, and graph construction is implemented in src/modssc/graph/construction/api.py. ^[1][6][24]

Sources