12. Skip to content

12. How to use data augmentation

This guide focuses on training-time augmentation plans and how to inspect available operations. If you are looking for feature engineering and caching that happens before training, see Run preprocessing plans.

12.1 Problem statement

You want to apply deterministic, training-time augmentations to inputs for SSL methods (for example, weak/strong views). [1][2] Keeping augmentations separate makes it easier to swap them without changing cached preprocessing. [7][8]

12.2 When to use

Use this when your method expects stochastic augmentations (FixMatch-style, strong/weak pipelines). [1]

12.3 Steps

1) Inspect available operations for your modality. [3][4]

2) Define an AugmentationPlan (list of ops + params). [2]

3) Build and apply a pipeline with a deterministic context. [1][5]

12.4 Copy-paste example

Use the CLI when you want to inspect operations quickly (modssc augmentation in src/modssc/cli/augmentation.py), and use Python when you want to build pipelines in code through the public modssc.data_augmentation package API. Operation discovery and lazy registration are handled behind that facade. [4][1]

CLI:

modssc augmentation list --modality vision
modssc augmentation info vision.random_horizontal_flip --as-json

Python:

import numpy as np
from modssc.data_augmentation import AugmentationContext, AugmentationPlan, StepConfig, build_pipeline

plan = AugmentationPlan(
    steps=(
        StepConfig(op_id="vision.random_horizontal_flip", params={"p": 0.5}),
        StepConfig(op_id="vision.cutout", params={"frac": 0.25, "fill": 0.0}),
    ),
    modality="vision",
)
pipeline = build_pipeline(plan)

x = np.zeros((32, 32, 3), dtype=np.float32)
ctx = AugmentationContext(seed=0, sample_id=0, epoch=0)
aug_x = pipeline(x, ctx=ctx)
print(aug_x.shape)

Augmentation operations and plan mechanics are defined in src/modssc/data_augmentation/ops/ and src/modssc/data_augmentation/api.py. [6][1]

12.5 Pitfalls

Warning

Augmentations are applied at training time; preprocessing is a separate, cacheable step. Do not mix the two in the same pipeline. [7][8]

Tip

Use AugmentationContext to make randomness deterministic across epochs and samples. [5][9]

Sources
  1. src/modssc/data_augmentation/api.py
  2. src/modssc/data_augmentation/plan.py
  3. src/modssc/data_augmentation/registry.py
  4. src/modssc/cli/augmentation.py
  5. src/modssc/data_augmentation/types.py
  6. src/modssc/data_augmentation/ops/vision.py
  7. src/modssc/data_augmentation/__init__.py
  8. src/modssc/preprocess/__init__.py
  9. src/modssc/data_augmentation/utils.py