Skip to content

ModSSC documentation

Start here for a quick tour of ModSSC, its core capabilities, and the recommended next steps. If you're new, begin with Installation or Quickstart.

What is ModSSC

ModSSC is a modular framework for semi-supervised classification across heterogeneous modalities (audio, text, vision, tabular, graph) with research-focused abstractions and reproducible pipelines. [1]

Quick examples

The CLI examples below use the modssc entry points declared in pyproject.toml, and the Python examples call the data loader, sampling, and preprocess APIs from src/modssc/. [2][3][4][5][6]

CLI: [2][3]

modssc datasets list
modssc sampling --help

Python: [4][5][6]

from modssc.data_loader import load_dataset
from modssc.sampling import SamplingPlan, HoldoutSplitSpec, LabelingSpec, sample
from modssc.preprocess import PreprocessPlan, StepConfig, preprocess

ds = load_dataset("toy", download=True)
plan = SamplingPlan(split=HoldoutSplitSpec(test_fraction=0.0, val_fraction=0.2))
res, _ = sample(ds, plan=plan, seed=0, dataset_fingerprint=str(ds.meta["dataset_fingerprint"]))
pre_plan = PreprocessPlan(steps=(StepConfig(step_id="core.ensure_2d"), StepConfig(step_id="core.to_numpy")))
_ = preprocess(ds, pre_plan, seed=0, fit_indices=res.train_idx)

Key features

  • Dataset catalog with curated keys for tabular, text, vision, audio, and graph datasets. [7]

  • Optional provider backends for OpenML, Hugging Face datasets, TFDS, torchvision, torchaudio, and PyG. [8]

  • Deterministic sampling plans (holdout/k-fold + labeling strategies) with cached split artifacts. [5][9][10]

  • Deterministic preprocessing plans with step registry and optional pretrained encoders. [6][11][12]

  • Graph construction (kNN/epsilon/anchor) and graph-derived views (attribute, diffusion, structural). [13][14]

  • Inductive and transductive method registries with method IDs. [15][16]

  • CLI tools for datasets, sampling, preprocessing, graphs, augmentation, and evaluation. [2][3]

  • Benchmark runner with YAML experiment configs (GitHub-only, not shipped to PyPI). [17][2]

Version

Current version is 0.0.3, sourced from src/modssc/__about__.py and referenced by the Hatch version config in pyproject.toml. [18][2]

Project status and support

  • Status is alpha (Development Status 3) in project metadata. [2]

  • Report issues via the GitHub tracker. [2]

  • Contributor guidance lives in the docs. [19]

  • Citation metadata is provided in CITATION.cff. [20]

Sources
  1. README.md
  2. pyproject.toml
  3. src/modssc/cli/
  4. src/modssc/data_loader/
  5. src/modssc/sampling/
  6. src/modssc/preprocess/
  7. src/modssc/data_loader/catalog/
  8. src/modssc/data_loader/providers/
  9. src/modssc/sampling/plan.py
  10. src/modssc/sampling/storage.py
  11. src/modssc/preprocess/catalog.py
  12. src/modssc/preprocess/models.py
  13. src/modssc/graph/specs.py
  14. src/modssc/graph/featurization/api.py
  15. src/modssc/inductive/registry.py
  16. src/modssc/transductive/registry.py
  17. bench/README.md
  18. src/modssc/__about__.py
  19. docs/development/contributing.md
  20. CITATION.cff