9. Skip to content

9. How to use data augmentation

This page focuses on training-time augmentation plans and how to inspect available ops. If you are looking for feature engineering and caching that happens before training, see Preprocess how-to.

9.1 Problem statement

You want to apply deterministic, training-time augmentations to inputs for SSL methods (for example, weak/strong views). [1][2] Keeping augmentations separate makes it easier to swap them without touching cached preprocessing. [7][8]

9.2 When to use

Use this when your method expects stochastic augmentations (FixMatch-style, strong/weak pipelines). [1]

9.3 Steps

1) Inspect available ops for your modality. [3][4]

2) Define an AugmentationPlan (list of ops + params). [2]

3) Build and apply a pipeline with a deterministic context. [1][5]

9.4 Copy-paste example

Use the CLI when you want to inspect ops quickly (modssc augmentation in src/modssc/cli/augmentation.py), and use Python when you want to build pipelines in code (API in src/modssc/data_augmentation/api.py). [4][1]

CLI:

modssc augmentation list --modality vision
modssc augmentation info vision.random_horizontal_flip --as-json

Python:

import numpy as np
from modssc.data_augmentation import AugmentationContext, AugmentationPlan, StepConfig, build_pipeline

plan = AugmentationPlan(
    steps=(
        StepConfig(op_id="vision.random_horizontal_flip", params={"p": 0.5}),
        StepConfig(op_id="vision.cutout", params={"frac": 0.25, "fill": 0.0}),
    ),
    modality="vision",
)
pipeline = build_pipeline(plan)

x = np.zeros((32, 32, 3), dtype=np.float32)
ctx = AugmentationContext(seed=0, sample_id=0, epoch=0)
aug_x = pipeline(x, ctx=ctx)
print(aug_x.shape)

Ops and plan mechanics are defined in src/modssc/data_augmentation/ops/ and src/modssc/data_augmentation/api.py. [6][1]

9.5 Pitfalls

Warning

Augmentations are applied at training time; preprocessing is a separate, cacheable step. Do not mix the two in the same pipeline. [7][8]

Tip

Use AugmentationContext to make randomness deterministic across epochs and samples. [5][9]

Sources
  1. src/modssc/data_augmentation/api.py
  2. src/modssc/data_augmentation/plan.py
  3. src/modssc/data_augmentation/registry.py
  4. src/modssc/cli/augmentation.py
  5. src/modssc/data_augmentation/types.py
  6. src/modssc/data_augmentation/ops/vision.py
  7. src/modssc/data_augmentation/__init__.py
  8. src/modssc/preprocess/__init__.py
  9. src/modssc/data_augmentation/utils.py