24. Skip to content

24. Catalogs and registries

Use this reference when you need stable IDs for configs or when you want to inspect the registry-backed surfaces exposed by ModSSC. For execution workflows, continue with the matching how-to page.

24.1 What it is for

ModSSC exposes registry-backed catalogs for datasets, providers, preprocess steps and models, augmentation ops, methods, and evaluation metrics through both CLI commands and Python APIs. These registries are the source of truth for IDs and metadata such as modality, availability, and required_extra. [1][2][3][4][5][6]

24.2 When to use

  • Use this page when writing configs and you need exact IDs.
  • Use it when you want to check whether a dataset, step, model, or method requires an optional extra.
  • Use the how-to guides instead when you want end-to-end workflow instructions rather than registry inspection.

24.3 Minimal examples

CLI inspection:

modssc datasets providers
modssc preprocess steps list
modssc inductive methods list --available-only
modssc evaluation list

Python inspection:

from modssc.data_loader import available_datasets, available_providers, dataset_info

print(available_providers())
print(available_datasets())
print(dataset_info("toy").as_dict())

24.4 Registry map

Registry Typical question Primary entry point
datasets and providers which dataset IDs or providers exist modssc datasets ...
preprocess steps which step IDs can I put in a plan modssc preprocess steps ...
preprocess models which pretrained encoders are available modssc preprocess models ...
augmentation ops which augmentation ops exist for a modality modssc augmentation ...
inductive methods which inductive SSL methods are available modssc inductive methods ...
transductive methods which graph-based methods are available modssc transductive methods ...
evaluation metrics which metric names are valid modssc evaluation ...

24.5 Datasets and providers

Use providers to understand which backends are available, and use dataset keys when wiring configs or CLI commands. Dataset info includes modality and required_extra. [1][7][11]

CLI: modssc datasets in src/modssc/cli/datasets.py. Python: data loader helpers in src/modssc/data_loader/api.py. [1][11]

CLI:

modssc datasets providers
modssc datasets list --modalities text
modssc datasets info --dataset toy

Python:

from modssc.data_loader import available_datasets, available_providers, dataset_info

print(available_providers())
print(available_datasets())
print(dataset_info("toy").as_dict())

24.6 Preprocess steps

Steps are registered in the preprocess catalog and surfaced through the CLI and registry helpers. Use step_info to check required_extra before you add a step to a plan. [2][8][12]

CLI: modssc preprocess in src/modssc/cli/preprocess.py. Python: preprocess registry in src/modssc/preprocess/registry.py. [2][12]

CLI:

modssc preprocess steps list
modssc preprocess steps info core.ensure_2d

Python:

from modssc.preprocess import available_steps, step_info

print(available_steps())
print(step_info("core.ensure_2d"))

24.7 Pretrained models

Pretrained encoder models are listed by the preprocess model registry. Use the CLI for quick inspection or the Python helpers when you need the metadata in code. [2][9]

CLI: modssc preprocess in src/modssc/cli/preprocess.py. Python: model registry in src/modssc/preprocess/models.py. [2][9]

CLI:

modssc preprocess models list --modality text
modssc preprocess models info stub:text

Python:

from modssc.preprocess import available_models, model_info

print(available_models(modality="text"))
print(model_info("stub:text"))

24.8 Augmentation ops

Augmentation operations are registered in the augmentation registry and can be listed or inspected from the CLI. [3][13]

CLI: modssc augmentation in src/modssc/cli/augmentation.py. Python: augmentation registry in src/modssc/data_augmentation/registry.py. [3][13]

CLI:

modssc augmentation list --modality text
modssc augmentation info text.word_dropout --as-json

Python:

from modssc.data_augmentation.registry import available_ops, op_info

print(available_ops(modality="text"))
print(op_info("text.word_dropout"))

24.9 Methods

Inductive and transductive registries expose method IDs. Use --available-only if you want to exclude planned or unresolvable methods. [4][5][14][15]

CLI: inductive and transductive CLIs in src/modssc/cli/inductive.py and src/modssc/cli/transductive.py. Python: registries in src/modssc/inductive/registry.py and src/modssc/transductive/registry.py. [4][5][14][15]

CLI:

modssc inductive methods list --available-only
modssc transductive methods list --available-only

Python:

from modssc.inductive import registry as inductive_registry
from modssc.transductive import registry as transductive_registry

print(inductive_registry.available_methods())
print(transductive_registry.available_methods())

24.10 Evaluation metrics

Metric names are listed by the evaluation module and exposed in the CLI. [6][16]

CLI: modssc evaluation in src/modssc/cli/evaluation.py. Python: metric helpers in src/modssc/evaluation/metrics.py. [6][16]

CLI:

modssc evaluation list

Python:

from modssc.evaluation import list_metrics

print(list_metrics())

24.11 Common mistakes

  • Looking for method, step, or dataset IDs in tutorial pages instead of querying the registries directly.
  • Ignoring required_extra and then discovering a dependency problem only at run time.
  • Assuming available-only means “recommended”. It only means the dependency checks pass.
  • Treating registry availability as equivalent to successful execution on your machine. Some backends still have platform-specific constraints.
Sources
  1. src/modssc/cli/datasets.py
  2. src/modssc/cli/preprocess.py
  3. src/modssc/cli/augmentation.py
  4. src/modssc/cli/inductive.py
  5. src/modssc/cli/transductive.py
  6. src/modssc/cli/evaluation.py
  7. src/modssc/data_loader/types.py
  8. src/modssc/preprocess/catalog.py
  9. src/modssc/preprocess/models.py
  10. src/modssc/cli/app.py
  11. src/modssc/data_loader/api.py
  12. src/modssc/preprocess/registry.py
  13. src/modssc/data_augmentation/registry.py
  14. src/modssc/inductive/registry.py
  15. src/modssc/transductive/registry.py
  16. src/modssc/evaluation/metrics.py