ModSSC documentation¶
Start here for a quick tour of ModSSC, its core capabilities, and the recommended next steps. If you're new, begin with Installation or Quickstart.
What is ModSSC¶
ModSSC is a modular framework for semi-supervised classification across heterogeneous modalities (audio, text, vision, tabular, graph) with research-focused abstractions and reproducible pipelines. [1]
Quick examples¶
The CLI examples below use the modssc entry points declared in pyproject.toml, and the Python examples call the data loader, sampling, and preprocess APIs from src/modssc/. [2][3][4][5][6]
modssc datasets list
modssc sampling --help
from modssc.data_loader import load_dataset
from modssc.sampling import SamplingPlan, HoldoutSplitSpec, LabelingSpec, sample
from modssc.preprocess import PreprocessPlan, StepConfig, preprocess
ds = load_dataset("toy", download=True)
plan = SamplingPlan(split=HoldoutSplitSpec(test_fraction=0.0, val_fraction=0.2))
res, _ = sample(ds, plan=plan, seed=0, dataset_fingerprint=str(ds.meta["dataset_fingerprint"]))
pre_plan = PreprocessPlan(steps=(StepConfig(step_id="core.ensure_2d"), StepConfig(step_id="core.to_numpy")))
_ = preprocess(ds, pre_plan, seed=0, fit_indices=res.train_idx)
Key features¶
-
Dataset catalog with curated keys for tabular, text, vision, audio, and graph datasets. [7]
-
Optional provider backends for OpenML, Hugging Face datasets, TFDS, torchvision, torchaudio, and PyG. [8]
-
Deterministic sampling plans (holdout/k-fold + labeling strategies) with cached split artifacts. [5][9][10]
-
Deterministic preprocessing plans with step registry and optional pretrained encoders. [6][11][12]
-
Graph construction (kNN/epsilon/anchor) and graph-derived views (attribute, diffusion, structural). [13][14]
-
Inductive and transductive method registries with method IDs. [15][16]
-
CLI tools for datasets, sampling, preprocessing, graphs, augmentation, and evaluation. [2][3]
-
Benchmark runner with YAML experiment configs (GitHub-only, not shipped to PyPI). [17][2]
Quickstart links¶
Version¶
Current version is 0.0.3, sourced from src/modssc/__about__.py and referenced by the Hatch version config in pyproject.toml. [18][2]
Project status and support¶
-
Status is alpha (Development Status 3) in project metadata. [2]
-
Report issues via the GitHub tracker. [2]
-
Contributor guidance lives in the docs. [19]
-
Citation metadata is provided in
CITATION.cff. [20]
Sources
README.mdpyproject.tomlsrc/modssc/cli/src/modssc/data_loader/src/modssc/sampling/src/modssc/preprocess/src/modssc/data_loader/catalog/src/modssc/data_loader/providers/src/modssc/sampling/plan.pysrc/modssc/sampling/storage.pysrc/modssc/preprocess/catalog.pysrc/modssc/preprocess/models.pysrc/modssc/graph/specs.pysrc/modssc/graph/featurization/api.pysrc/modssc/inductive/registry.pysrc/modssc/transductive/registry.pybench/README.mdsrc/modssc/__about__.pydocs/development/contributing.mdCITATION.cff