3. Concepts¶
This page introduces the key ideas and vocabulary used throughout the docs. For runnable examples, go to the inductive tutorial or transductive tutorial.
3.1 Problem framing¶
ModSSC targets semi-supervised classification, where a small labeled set and a larger unlabeled set are used together. This framing is reflected in the inductive and transductive bricks and their datasets. [1][2][3]
3.2 Inductive vs transductive in this project¶
Inductive methods operate on feature matrices and labeled/unlabeled splits, without requiring a graph. The inductive brick lives in src/modssc/inductive/ and validates InductiveDataset inputs. [4][5][1]
Transductive methods operate on a fixed graph over all nodes and accept NodeDataset-like objects with a graph and optional masks. Sampling outputs for graph datasets use masks like train/val/test/labeled/unlabeled. The transductive brick lives in src/modssc/transductive/, and graph utilities are in src/modssc/graph/. [2][6][7][8]
3.3 Key abstractions in this codebase¶
-
Dataset catalog and providers: curated dataset keys and provider URIs for downloading and caching. [9][10][11]
-
Sampling plans: deterministic split + labeling specs that produce reproducible indices/masks. [12][13]
-
Preprocess plans: ordered steps that transform raw datasets into feature representations. [14][15]
-
Graph specs and views: graph construction specs and view generation (attr/diffusion/struct). [16][17]
-
View plans: multi-view feature generation for methods like co-training. [18][19]
-
Registries: method registries for inductive and transductive algorithms. [20][21]
-
Benchmark configs: end-to-end experiment configuration for reproducible runs. [22][23]
3.4 Small illustrative examples¶
Inductive dataset payload (labeled + unlabeled):
import numpy as np
from modssc.inductive import InductiveDataset
X_l = np.random.randn(10, 4)
y_l = np.random.randint(0, 3, size=(10,))
X_u = np.random.randn(50, 4)
payload = InductiveDataset(X_l=X_l, y_l=y_l, X_u=X_u)
Transductive dataset payload (graph + masks):
import numpy as np
from modssc.graph import GraphBuilderSpec, build_graph
from modssc.graph.artifacts import NodeDataset
X = np.random.randn(20, 8).astype(np.float32)
edge_spec = GraphBuilderSpec(scheme="knn", metric="cosine", k=3)
graph = build_graph(X, spec=edge_spec, seed=0, cache=False)
train_mask = np.zeros((20,), dtype=bool)
train_mask[:3] = True
node_data = NodeDataset(X=X, y=np.zeros((20,), dtype=np.int64), graph=graph, masks={"train_mask": train_mask})
The inductive and transductive dataset types are defined in src/modssc/inductive/types.py and src/modssc/graph/artifacts.py, and graph construction is implemented in src/modssc/graph/construction/api.py. [1][6][24]
Sources
src/modssc/inductive/types.pysrc/modssc/transductive/base.pyREADME.mdsrc/modssc/inductive/src/modssc/inductive/validation.pysrc/modssc/graph/artifacts.pysrc/modssc/sampling/result.pysrc/modssc/graph/src/modssc/data_loader/catalog/src/modssc/data_loader/providers/src/modssc/data_loader/api.pysrc/modssc/sampling/plan.pysrc/modssc/sampling/api.pysrc/modssc/preprocess/plan.pysrc/modssc/preprocess/catalog.pysrc/modssc/graph/specs.pysrc/modssc/graph/featurization/api.pysrc/modssc/views/plan.pysrc/modssc/views/api.pysrc/modssc/inductive/registry.pysrc/modssc/transductive/registry.pybench/schema.pybench/configs/experiments/src/modssc/graph/construction/api.py