23. Views API¶

This page documents the views API. For workflows, see Views how-to.

23.1 What it is for¶

The views brick creates multiple feature views for multi-view SSL methods like co-training. ^[1][2]

23.2 Examples¶

Generate two random feature views:

from modssc.data_loader import load_dataset
from modssc.views import generate_views, two_view_random_feature_split

ds = load_dataset("toy", download=True)
plan = two_view_random_feature_split(fraction=0.5)
views = generate_views(ds, plan=plan, seed=0)
print(list(views.views.keys()))

Create a custom plan with explicit indices:

from modssc.views import ColumnSelectSpec, ViewSpec, ViewsPlan

plan = ViewsPlan(
    views=(
        ViewSpec(name="view_a", columns=ColumnSelectSpec(mode="indices", indices=(0, 1, 2))),
        ViewSpec(name="view_b", columns=ColumnSelectSpec(mode="complement", complement_of="view_a")),
    )
)

The view plan schema is defined in src/modssc/views/plan.py. ^[2]

23.3 API reference¶

Multi-view feature generation.

This brick focuses on feature views (classic multi-view SSL methods such as Co-Training), not on augmentation-based multi-view training (handled by :mod:modssc.data_augmentation).

The core entry-point is :func:modssc.views.generate_views.

23.4 `ColumnSelectSpec` `dataclass` ¶

How to select columns from a 2D feature matrix.

This is used to generate feature views (e.g. classic Co-Training), where each view sees a different subset of the features.

23.4.0.1 Notes¶

mode="complement" assumes the referenced view has already been resolved.
fraction is only used for mode="random".

Source code in src/modssc/views/plan.py

@dataclass(frozen=True)
class ColumnSelectSpec:
    """How to select columns from a 2D feature matrix.

    This is used to generate *feature views* (e.g. classic Co-Training),
    where each view sees a different subset of the features.

    Notes
    -----
    - `mode="complement"` assumes the referenced view has already been resolved.
    - `fraction` is only used for `mode="random"`.
    """

    mode: Literal["all", "indices", "random", "complement"] = "all"
    indices: tuple[int, ...] = ()
    fraction: float = 0.5
    complement_of: str | None = None
    seed_offset: int = 0

    def validate(self) -> None:
        if self.mode not in ("all", "indices", "random", "complement"):
            raise ViewsValidationError(f"Unknown ColumnSelectSpec.mode={self.mode!r}")

        if self.mode == "indices":
            if not self.indices:
                raise ViewsValidationError("ColumnSelectSpec(mode='indices') requires `indices`")
            if any(int(i) < 0 for i in self.indices):
                raise ViewsValidationError(
                    "ColumnSelectSpec.indices cannot contain negative values"
                )

        if self.mode == "random":
            f = float(self.fraction)
            if not (0.0 < f <= 1.0):
                raise ViewsValidationError("ColumnSelectSpec.fraction must be in (0, 1] for random")

        if self.mode == "complement" and not self.complement_of:
            raise ViewsValidationError(
                "ColumnSelectSpec(mode='complement') requires `complement_of`"
            )

23.5 `ViewSpec` `dataclass` ¶

A single view definition.

Source code in src/modssc/views/plan.py

@dataclass(frozen=True)
class ViewSpec:
    """A single view definition."""

    name: str
    preprocess: PreprocessPlan | None = None
    columns: ColumnSelectSpec | None = None
    meta: dict[str, Any] | None = None

    def validate(self) -> None:
        if not str(self.name).strip():
            raise ViewsValidationError("ViewSpec.name cannot be empty")
        if self.columns is not None:
            self.columns.validate()
        if self.meta is not None and not isinstance(self.meta, dict):
            raise ViewsValidationError("ViewSpec.meta must be a dict when provided")

23.6 `ViewsError` ¶

Bases: Exception

Base exception for the modssc.views brick.

Source code in src/modssc/views/errors.py

class ViewsError(Exception):
    """Base exception for the `modssc.views` brick."""

23.7 `ViewsPlan` `dataclass` ¶

A plan that generates multiple views from the same dataset.

Source code in src/modssc/views/plan.py

@dataclass(frozen=True)
class ViewsPlan:
    """A plan that generates multiple views from the same dataset."""

    views: tuple[ViewSpec, ...]

    def validate(self) -> None:
        if len(self.views) < 2:
            raise ViewsValidationError("ViewsPlan must contain at least 2 views")
        names = [v.name for v in self.views]
        if len(set(names)) != len(names):
            raise ViewsValidationError("View names must be unique")
        for v in self.views:
            v.validate()

        # Complement dependency must point to a previous view in the tuple
        seen: set[str] = set()
        for v in self.views:
            if v.columns is not None and v.columns.mode == "complement":
                target = str(v.columns.complement_of)
                if target not in seen:
                    raise ViewsValidationError(
                        f"View {v.name!r} uses complement_of={target!r} but that view wasn't resolved yet. "
                        "Put the referenced view earlier in ViewsPlan.views."
                    )
            seen.add(v.name)

23.8 `ViewsResult` `dataclass` ¶

Result of generate_views.

23.8.0.1 Attributes¶

views: Mapping of view name -> dataset where each split's .X is the view-specific feature matrix. columns: Mapping of view name -> selected column indices (sorted, unique). seed: Global seed used for any stochastic view operations (e.g. random column selection). plan: The input plan (validated). meta: Arbitrary metadata.

Source code in src/modssc/views/types.py

@dataclass(frozen=True)
class ViewsResult:
    """Result of `generate_views`.

    Attributes
    ----------
    views:
        Mapping of view name -> dataset where each split's `.X` is the view-specific feature matrix.
    columns:
        Mapping of view name -> selected column indices (sorted, unique).
    seed:
        Global seed used for any stochastic view operations (e.g. random column selection).
    plan:
        The input plan (validated).
    meta:
        Arbitrary metadata.
    """

    views: dict[str, LoadedDataset]
    columns: dict[str, np.ndarray]
    seed: int
    plan: ViewsPlan
    meta: dict[str, Any]

23.9 `ViewsValidationError` ¶

Bases: 23.6 ViewsError, ValueError

Raised when a ViewsPlan / ViewSpec is invalid.

Source code in src/modssc/views/errors.py

class ViewsValidationError(ViewsError, ValueError):
    """Raised when a ViewsPlan / ViewSpec is invalid."""

23.10 `generate_views(dataset, *, plan, seed=0, cache=True, fit_indices=None)` ¶

Generate multiple feature views from a dataset.

23.10.0.1 Parameters¶

dataset: Input dataset from :mod:modssc.data_loader (train/test splits). plan: ViewsPlan describing how to create each view. seed: Global seed controlling stochastic view operations (e.g. random feature split). cache: Passed through to :func:modssc.preprocess.preprocess when preprocessing is used. fit_indices: Indices (relative to the train split) to use when fitting preprocessing steps (e.g. PCA). Defaults to np.arange(len(train)).

23.10.0.2 Returns¶

ViewsResult Each view is returned as a LoadedDataset where .train.X and .test.X are view-specific feature matrices, while labels / edges / masks are preserved.

Source code in src/modssc/views/api.py

def generate_views(
    dataset: LoadedDataset,
    *,
    plan: ViewsPlan,
    seed: int = 0,
    cache: bool = True,
    fit_indices: np.ndarray | None = None,
) -> ViewsResult:
    """Generate multiple feature views from a dataset.

    Parameters
    ----------
    dataset:
        Input dataset from :mod:`modssc.data_loader` (train/test splits).
    plan:
        ViewsPlan describing how to create each view.
    seed:
        Global seed controlling stochastic view operations (e.g. random feature split).
    cache:
        Passed through to :func:`modssc.preprocess.preprocess` when preprocessing is used.
    fit_indices:
        Indices (relative to the *train* split) to use when fitting preprocessing steps
        (e.g. PCA). Defaults to ``np.arange(len(train))``.

    Returns
    -------
    ViewsResult
        Each view is returned as a `LoadedDataset` where `.train.X` and `.test.X` are view-specific
        feature matrices, while labels / edges / masks are preserved.
    """

    start = perf_counter()
    plan.validate()

    dataset_fp = None
    if hasattr(dataset, "meta") and isinstance(dataset.meta, dict):
        dataset_fp = dataset.meta.get("dataset_fingerprint")
    logger.info(
        "Views start: views=%s seed=%s cache=%s dataset_fp=%s",
        [v.name for v in plan.views],
        seed,
        bool(cache),
        dataset_fp,
    )

    train_y = _as_numpy(dataset.train.y)
    n_train = int(train_y.shape[0])
    if fit_indices is None:
        fit_indices = np.arange(n_train, dtype=np.int64)

    views: dict[str, LoadedDataset] = {}
    columns: dict[str, np.ndarray] = {}
    n_features_map: dict[str, int] = {}

    for view in plan.views:
        view_start = perf_counter()
        # 1) Optional preprocessing (cached, deterministic)
        ds = dataset
        if view.preprocess is not None:
            res = run_preprocess(
                ds, plan=view.preprocess, seed=int(seed), fit_indices=fit_indices, cache=bool(cache)
            )
            ds = res.dataset

        def _get_feats(x):
            if isinstance(x, dict) and "x" in x:
                return _as_numpy(x["x"])
            return _as_numpy(x)

        X_train = _get_feats(ds.train.X)
        X_test = _get_feats(ds.test.X) if ds.test is not None else None

        if X_train.ndim < 2:
            raise ViewsValidationError(
                f"View {view.name!r}: expected train.X to be at least 2D, got shape={X_train.shape}"
            )
        if X_test is not None and X_test.ndim < 2:
            raise ViewsValidationError(
                f"View {view.name!r}: expected test.X to be at least 2D, got shape={X_test.shape}"
            )

        n_features = int(X_train.shape[1])
        cols = _resolve_columns(
            spec=view.columns,
            n_features=n_features,
            seed=int(seed),
            view_name=str(view.name),
            resolved=columns,
            n_features_map=n_features_map,
        )
        n_features_map[str(view.name)] = n_features
        columns[str(view.name)] = cols

        X_train_v_sub = X_train[:, cols]
        X_test_v_sub = X_test[:, cols] if X_test is not None else None

        def _reconstruct(orig, feats):
            if isinstance(orig, dict) and "x" in orig:
                new_d = dict(orig)
                new_d["x"] = feats
                return new_d
            return feats

        X_train_v = _reconstruct(ds.train.X, X_train_v_sub)
        X_test_v = _reconstruct(ds.test.X, X_test_v_sub) if ds.test is not None else None

        # 2) Preserve y/edges/masks (do NOT copy large arrays)
        train_split = Split(
            X=X_train_v,
            y=ds.train.y,
            edges=ds.train.edges,
            masks=ds.train.masks,
        )
        test_split = (
            Split(X=X_test_v, y=ds.test.y, edges=ds.test.edges, masks=ds.test.masks)
            if ds.test is not None
            else None
        )

        # 3) Meta
        meta: dict[str, Any] = dict(ds.meta) if isinstance(ds.meta, dict) else {}
        meta.setdefault("views", {})
        meta["views"][str(view.name)] = {
            "columns": cols.tolist(),
            "columns_mode": (view.columns.mode if view.columns is not None else "all"),
            "preprocess": (asdict(view.preprocess) if view.preprocess is not None else None),
        }
        if view.meta:
            # view-level metadata override/additions
            meta.setdefault("view_meta", {})
            meta["view_meta"][str(view.name)] = dict(view.meta)

        views[str(view.name)] = LoadedDataset(train=train_split, test=test_split, meta=meta)
        logger.debug(
            "View built: name=%s train_shape=%s test_shape=%s duration_s=%.3f",
            view.name,
            _shape_of(X_train_v),
            _shape_of(X_test_v),
            perf_counter() - view_start,
        )

    result = ViewsResult(
        views=views,
        columns=columns,
        seed=int(seed),
        plan=plan,
        meta={"n_views": len(views)},
    )
    logger.info(
        "Views done: count=%s duration_s=%.3f",
        len(views),
        perf_counter() - start,
    )
    return result

23.11 `two_view_random_feature_split(*, preprocess=None, fraction=0.5, seed_offset=0, name_a='view_a', name_b='view_b')` ¶

Convenience helper for classic 2-view feature split.

The first view picks a random subset of columns, the second view is its complement.

Source code in src/modssc/views/plan.py

def two_view_random_feature_split(
    *,
    preprocess: PreprocessPlan | None = None,
    fraction: float = 0.5,
    seed_offset: int = 0,
    name_a: str = "view_a",
    name_b: str = "view_b",
) -> ViewsPlan:
    """Convenience helper for classic 2-view feature split.

    The first view picks a random subset of columns, the second view is its complement.
    """

    a = ViewSpec(
        name=name_a,
        preprocess=preprocess,
        columns=ColumnSelectSpec(
            mode="random", fraction=float(fraction), seed_offset=int(seed_offset)
        ),
        meta={"role": "primary"},
    )
    b = ViewSpec(
        name=name_b,
        preprocess=preprocess,
        columns=ColumnSelectSpec(mode="complement", complement_of=name_a),
        meta={"role": "complement"},
    )
    plan = ViewsPlan(views=(a, b))
    plan.validate()
    return plan

Sources

23. Views API¶

23.1 What it is for¶

23.2 Examples¶

23.3 API reference¶

23.4 ColumnSelectSpec dataclass ¶

23.4.0.1 Notes¶

23.5 ViewSpec dataclass ¶

23.6 ViewsError ¶

23.7 ViewsPlan dataclass ¶

23.8 ViewsResult dataclass ¶

23.8.0.1 Attributes¶

23.9 ViewsValidationError ¶

23.10 generate_views(dataset, *, plan, seed=0, cache=True, fit_indices=None) ¶

23.10.0.1 Parameters¶

23.10.0.2 Returns¶

23.11 two_view_random_feature_split(*, preprocess=None, fraction=0.5, seed_offset=0, name_a='view_a', name_b='view_b') ¶

23.4 `ColumnSelectSpec` `dataclass` ¶

23.5 `ViewSpec` `dataclass` ¶

23.6 `ViewsError` ¶

23.7 `ViewsPlan` `dataclass` ¶

23.8 `ViewsResult` `dataclass` ¶

23.9 `ViewsValidationError` ¶

23.10 `generate_views(dataset, *, plan, seed=0, cache=True, fit_indices=None)` ¶

23.11 `two_view_random_feature_split(*, preprocess=None, fraction=0.5, seed_offset=0, name_a='view_a', name_b='view_b')` ¶