12. How to use data augmentation¶
This guide focuses on training-time augmentation plans and how to inspect available operations. If you are looking for feature engineering and caching that happens before training, see Run preprocessing plans.
12.1 Problem statement¶
You want to apply deterministic, training-time augmentations to inputs for SSL methods (for example, weak/strong views). [1][2] Keeping augmentations separate makes it easier to swap them without changing cached preprocessing. [7][8]
12.2 When to use¶
Use this when your method expects stochastic augmentations (FixMatch-style, strong/weak pipelines). [1]
12.3 Steps¶
1) Inspect available operations for your modality. [3][4]
2) Define an AugmentationPlan (list of ops + params). [2]
3) Build and apply a pipeline with a deterministic context. [1][5]
12.4 Copy-paste example¶
Use the CLI when you want to inspect operations quickly (modssc augmentation in src/modssc/cli/augmentation.py), and use Python when you want to build pipelines in code through the public modssc.data_augmentation package API. Operation discovery and lazy registration are handled behind that facade. [4][1]
CLI:
modssc augmentation list --modality vision
modssc augmentation info vision.random_horizontal_flip --as-json
Python:
import numpy as np
from modssc.data_augmentation import AugmentationContext, AugmentationPlan, StepConfig, build_pipeline
plan = AugmentationPlan(
steps=(
StepConfig(op_id="vision.random_horizontal_flip", params={"p": 0.5}),
StepConfig(op_id="vision.cutout", params={"frac": 0.25, "fill": 0.0}),
),
modality="vision",
)
pipeline = build_pipeline(plan)
x = np.zeros((32, 32, 3), dtype=np.float32)
ctx = AugmentationContext(seed=0, sample_id=0, epoch=0)
aug_x = pipeline(x, ctx=ctx)
print(aug_x.shape)
Augmentation operations and plan mechanics are defined in src/modssc/data_augmentation/ops/ and src/modssc/data_augmentation/api.py. [6][1]
12.5 Pitfalls¶
Warning
Augmentations are applied at training time; preprocessing is a separate, cacheable step. Do not mix the two in the same pipeline. [7][8]
12.6 Related links¶
Sources
src/modssc/data_augmentation/api.pysrc/modssc/data_augmentation/plan.pysrc/modssc/data_augmentation/registry.pysrc/modssc/cli/augmentation.pysrc/modssc/data_augmentation/types.pysrc/modssc/data_augmentation/ops/vision.pysrc/modssc/data_augmentation/__init__.pysrc/modssc/preprocess/__init__.pysrc/modssc/data_augmentation/utils.py