9. How to use data augmentation¶
This page focuses on training-time augmentation plans and how to inspect available ops. If you are looking for feature engineering and caching that happens before training, see Preprocess how-to.
9.1 Problem statement¶
You want to apply deterministic, training-time augmentations to inputs for SSL methods (for example, weak/strong views). [1][2] Keeping augmentations separate makes it easier to swap them without touching cached preprocessing. [7][8]
9.2 When to use¶
Use this when your method expects stochastic augmentations (FixMatch-style, strong/weak pipelines). [1]
9.3 Steps¶
1) Inspect available ops for your modality. [3][4]
2) Define an AugmentationPlan (list of ops + params). [2]
3) Build and apply a pipeline with a deterministic context. [1][5]
9.4 Copy-paste example¶
Use the CLI when you want to inspect ops quickly (modssc augmentation in src/modssc/cli/augmentation.py), and use Python when you want to build pipelines in code (API in src/modssc/data_augmentation/api.py). [4][1]
CLI:
modssc augmentation list --modality vision
modssc augmentation info vision.random_horizontal_flip --as-json
Python:
import numpy as np
from modssc.data_augmentation import AugmentationContext, AugmentationPlan, StepConfig, build_pipeline
plan = AugmentationPlan(
steps=(
StepConfig(op_id="vision.random_horizontal_flip", params={"p": 0.5}),
StepConfig(op_id="vision.cutout", params={"frac": 0.25, "fill": 0.0}),
),
modality="vision",
)
pipeline = build_pipeline(plan)
x = np.zeros((32, 32, 3), dtype=np.float32)
ctx = AugmentationContext(seed=0, sample_id=0, epoch=0)
aug_x = pipeline(x, ctx=ctx)
print(aug_x.shape)
Ops and plan mechanics are defined in src/modssc/data_augmentation/ops/ and src/modssc/data_augmentation/api.py. [6][1]
9.5 Pitfalls¶
Warning
Augmentations are applied at training time; preprocessing is a separate, cacheable step. Do not mix the two in the same pipeline. [7][8]
9.6 Related links¶
Sources
src/modssc/data_augmentation/api.pysrc/modssc/data_augmentation/plan.pysrc/modssc/data_augmentation/registry.pysrc/modssc/cli/augmentation.pysrc/modssc/data_augmentation/types.pysrc/modssc/data_augmentation/ops/vision.pysrc/modssc/data_augmentation/__init__.pysrc/modssc/preprocess/__init__.pysrc/modssc/data_augmentation/utils.py