2. Quickstart¶

This quickstart gives the smallest runnable CLI and Python examples and tells you what output to expect. For background and terminology, see Concepts.

2.1 One minimal run¶

Use the benchmark runner when you want a full pipeline driven by a YAML config, and use the Python snippet when you want to exercise the sampling API directly in code. ^[1][2][3][4]

Run the benchmark runner with the toy inductive config:

python -m bench.main --config bench/configs/experiments/toy_inductive.yaml

Run a quick Python API pass with the dataset loader and sampling plan:

from modssc.data_loader import load_dataset
from modssc.sampling import HoldoutSplitSpec, LabelingSpec, SamplingPlan, sample

ds = load_dataset("toy", download=True)
plan = SamplingPlan(
    split=HoldoutSplitSpec(test_fraction=0.0, val_fraction=0.2, stratify=True),
    labeling=LabelingSpec(mode="fraction", value=0.2, strategy="balanced"),
)
res, _ = sample(ds, plan=plan, seed=0, dataset_fingerprint=str(ds.meta["dataset_fingerprint"]))
print(res.stats)

The benchmark command and the toy config live in bench/main.py and bench/configs/experiments/toy_inductive.yaml. The sampling API is defined in src/modssc/sampling/api.py and src/modssc/sampling/plan.py. ^[1][2][3][4]

2.2 What you should see¶

For the benchmark run, a timestamped folder is created under runs/ containing: - config.yaml (copied config) - run.json (metrics + metadata) - error.txt (only if failed)

These outputs are written by the bench runner and context utilities. ^[5][6][7][8]

For the Python snippet, you should see a stats dictionary printed to stdout. The stats structure is produced by modssc.sampling.stats.build_inductive_stats. ^[9]

2.3 Next steps¶

2.4 Troubleshooting¶

Warning

If a dataset provider is missing, the loader raises an optional dependency error with a suggested pip install "modssc[extra]" command. ^[10]

Tip

Use modssc doctor to see which CLI bricks are available and which extras are missing. ^[11]

Sources