24. Catalogs and registries¶
Use this reference when you need stable IDs for configs or when you want to inspect the registry-backed surfaces exposed by ModSSC. For execution workflows, continue with the matching how-to page.
24.1 What it is for¶
ModSSC exposes registry-backed catalogs for datasets, providers, preprocess steps and models, augmentation ops, methods, and evaluation metrics through both CLI commands and Python APIs. These registries are the source of truth for IDs and metadata such as modality, availability, and required_extra. [1][2][3][4][5][6]
24.2 When to use¶
- Use this page when writing configs and you need exact IDs.
- Use it when you want to check whether a dataset, step, model, or method requires an optional extra.
- Use the how-to guides instead when you want end-to-end workflow instructions rather than registry inspection.
24.3 Minimal examples¶
CLI inspection:
modssc datasets providers
modssc preprocess steps list
modssc inductive methods list --available-only
modssc evaluation list
Python inspection:
from modssc.data_loader import available_datasets, available_providers, dataset_info
print(available_providers())
print(available_datasets())
print(dataset_info("toy").as_dict())
24.4 Registry map¶
| Registry | Typical question | Primary entry point |
|---|---|---|
| datasets and providers | which dataset IDs or providers exist | modssc datasets ... |
| preprocess steps | which step IDs can I put in a plan | modssc preprocess steps ... |
| preprocess models | which pretrained encoders are available | modssc preprocess models ... |
| augmentation ops | which augmentation ops exist for a modality | modssc augmentation ... |
| inductive methods | which inductive SSL methods are available | modssc inductive methods ... |
| transductive methods | which graph-based methods are available | modssc transductive methods ... |
| evaluation metrics | which metric names are valid | modssc evaluation ... |
24.5 Datasets and providers¶
Use providers to understand which backends are available, and use dataset keys when wiring configs or CLI commands. Dataset info includes modality and required_extra. [1][7][11]
CLI: modssc datasets in src/modssc/cli/datasets.py. Python: data loader helpers in src/modssc/data_loader/api.py. [1][11]
CLI:
modssc datasets providers
modssc datasets list --modalities text
modssc datasets info --dataset toy
Python:
from modssc.data_loader import available_datasets, available_providers, dataset_info
print(available_providers())
print(available_datasets())
print(dataset_info("toy").as_dict())
24.6 Preprocess steps¶
Steps are registered in the preprocess catalog and surfaced through the CLI and registry helpers. Use step_info to check required_extra before you add a step to a plan. [2][8][12]
CLI: modssc preprocess in src/modssc/cli/preprocess.py. Python: preprocess registry in src/modssc/preprocess/registry.py. [2][12]
CLI:
modssc preprocess steps list
modssc preprocess steps info core.ensure_2d
Python:
from modssc.preprocess import available_steps, step_info
print(available_steps())
print(step_info("core.ensure_2d"))
24.7 Pretrained models¶
Pretrained encoder models are listed by the preprocess model registry. Use the CLI for quick inspection or the Python helpers when you need the metadata in code. [2][9]
CLI: modssc preprocess in src/modssc/cli/preprocess.py. Python: model registry in src/modssc/preprocess/models.py. [2][9]
CLI:
modssc preprocess models list --modality text
modssc preprocess models info stub:text
Python:
from modssc.preprocess import available_models, model_info
print(available_models(modality="text"))
print(model_info("stub:text"))
24.8 Augmentation ops¶
Augmentation operations are registered in the augmentation registry and can be listed or inspected from the CLI. [3][13]
CLI: modssc augmentation in src/modssc/cli/augmentation.py. Python: augmentation registry in src/modssc/data_augmentation/registry.py. [3][13]
CLI:
modssc augmentation list --modality text
modssc augmentation info text.word_dropout --as-json
Python:
from modssc.data_augmentation.registry import available_ops, op_info
print(available_ops(modality="text"))
print(op_info("text.word_dropout"))
24.9 Methods¶
Inductive and transductive registries expose method IDs. Use --available-only if you want to exclude planned or unresolvable methods. [4][5][14][15]
CLI: inductive and transductive CLIs in src/modssc/cli/inductive.py and src/modssc/cli/transductive.py. Python: registries in src/modssc/inductive/registry.py and src/modssc/transductive/registry.py. [4][5][14][15]
CLI:
modssc inductive methods list --available-only
modssc transductive methods list --available-only
Python:
from modssc.inductive import registry as inductive_registry
from modssc.transductive import registry as transductive_registry
print(inductive_registry.available_methods())
print(transductive_registry.available_methods())
24.10 Evaluation metrics¶
Metric names are listed by the evaluation module and exposed in the CLI. [6][16]
CLI: modssc evaluation in src/modssc/cli/evaluation.py. Python: metric helpers in src/modssc/evaluation/metrics.py. [6][16]
CLI:
modssc evaluation list
Python:
from modssc.evaluation import list_metrics
print(list_metrics())
24.11 Common mistakes¶
- Looking for method, step, or dataset IDs in tutorial pages instead of querying the registries directly.
- Ignoring
required_extraand then discovering a dependency problem only at run time. - Assuming
available-onlymeans “recommended”. It only means the dependency checks pass. - Treating registry availability as equivalent to successful execution on your machine. Some backends still have platform-specific constraints.
24.12 Related links¶
- CLI reference
- Glossary
- Common errors and where to go
- Manage datasets
- Run preprocessing plans
- Use data augmentation
- Compute evaluation metrics
Sources
src/modssc/cli/datasets.pysrc/modssc/cli/preprocess.pysrc/modssc/cli/augmentation.pysrc/modssc/cli/inductive.pysrc/modssc/cli/transductive.pysrc/modssc/cli/evaluation.pysrc/modssc/data_loader/types.pysrc/modssc/preprocess/catalog.pysrc/modssc/preprocess/models.pysrc/modssc/cli/app.pysrc/modssc/data_loader/api.pysrc/modssc/preprocess/registry.pysrc/modssc/data_augmentation/registry.pysrc/modssc/inductive/registry.pysrc/modssc/transductive/registry.pysrc/modssc/evaluation/metrics.py