23. Skip to content

23. Views API

This page documents the views API. For workflows, see Views how-to.

23.1 What it is for

The views brick creates multiple feature views for multi-view SSL methods like co-training. [1][2]

23.2 Examples

Generate two random feature views:

from modssc.data_loader import load_dataset
from modssc.views import generate_views, two_view_random_feature_split

ds = load_dataset("toy", download=True)
plan = two_view_random_feature_split(fraction=0.5)
views = generate_views(ds, plan=plan, seed=0)
print(list(views.views.keys()))

Create a custom plan with explicit indices:

from modssc.views import ColumnSelectSpec, ViewSpec, ViewsPlan

plan = ViewsPlan(
    views=(
        ViewSpec(name="view_a", columns=ColumnSelectSpec(mode="indices", indices=(0, 1, 2))),
        ViewSpec(name="view_b", columns=ColumnSelectSpec(mode="complement", complement_of="view_a")),
    )
)

The view plan schema is defined in src/modssc/views/plan.py. [2]

23.3 API reference

Multi-view feature generation.

This brick focuses on feature views (classic multi-view SSL methods such as Co-Training), not on augmentation-based multi-view training (handled by :mod:modssc.data_augmentation).

The core entry-point is :func:modssc.views.generate_views.

23.4 ColumnSelectSpec dataclass

How to select columns from a 2D feature matrix.

This is used to generate feature views (e.g. classic Co-Training), where each view sees a different subset of the features.

23.4.0.1 Notes

  • mode="complement" assumes the referenced view has already been resolved.
  • fraction is only used for mode="random".
Source code in src/modssc/views/plan.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
@dataclass(frozen=True)
class ColumnSelectSpec:
    """How to select columns from a 2D feature matrix.

    This is used to generate *feature views* (e.g. classic Co-Training),
    where each view sees a different subset of the features.

    Notes
    -----
    - `mode="complement"` assumes the referenced view has already been resolved.
    - `fraction` is only used for `mode="random"`.
    """

    mode: Literal["all", "indices", "random", "complement"] = "all"
    indices: tuple[int, ...] = ()
    fraction: float = 0.5
    complement_of: str | None = None
    seed_offset: int = 0

    def validate(self) -> None:
        if self.mode not in ("all", "indices", "random", "complement"):
            raise ViewsValidationError(f"Unknown ColumnSelectSpec.mode={self.mode!r}")

        if self.mode == "indices":
            if not self.indices:
                raise ViewsValidationError("ColumnSelectSpec(mode='indices') requires `indices`")
            if any(int(i) < 0 for i in self.indices):
                raise ViewsValidationError(
                    "ColumnSelectSpec.indices cannot contain negative values"
                )

        if self.mode == "random":
            f = float(self.fraction)
            if not (0.0 < f <= 1.0):
                raise ViewsValidationError("ColumnSelectSpec.fraction must be in (0, 1] for random")

        if self.mode == "complement" and not self.complement_of:
            raise ViewsValidationError(
                "ColumnSelectSpec(mode='complement') requires `complement_of`"
            )

23.5 ViewSpec dataclass

A single view definition.

Source code in src/modssc/views/plan.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
@dataclass(frozen=True)
class ViewSpec:
    """A single view definition."""

    name: str
    preprocess: PreprocessPlan | None = None
    columns: ColumnSelectSpec | None = None
    meta: dict[str, Any] | None = None

    def validate(self) -> None:
        if not str(self.name).strip():
            raise ViewsValidationError("ViewSpec.name cannot be empty")
        if self.columns is not None:
            self.columns.validate()
        if self.meta is not None and not isinstance(self.meta, dict):
            raise ViewsValidationError("ViewSpec.meta must be a dict when provided")

23.6 ViewsError

Bases: Exception

Base exception for the modssc.views brick.

Source code in src/modssc/views/errors.py
4
5
class ViewsError(Exception):
    """Base exception for the `modssc.views` brick."""

23.7 ViewsPlan dataclass

A plan that generates multiple views from the same dataset.

Source code in src/modssc/views/plan.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
@dataclass(frozen=True)
class ViewsPlan:
    """A plan that generates multiple views from the same dataset."""

    views: tuple[ViewSpec, ...]

    def validate(self) -> None:
        if len(self.views) < 2:
            raise ViewsValidationError("ViewsPlan must contain at least 2 views")
        names = [v.name for v in self.views]
        if len(set(names)) != len(names):
            raise ViewsValidationError("View names must be unique")
        for v in self.views:
            v.validate()

        # Complement dependency must point to a previous view in the tuple
        seen: set[str] = set()
        for v in self.views:
            if v.columns is not None and v.columns.mode == "complement":
                target = str(v.columns.complement_of)
                if target not in seen:
                    raise ViewsValidationError(
                        f"View {v.name!r} uses complement_of={target!r} but that view wasn't resolved yet. "
                        "Put the referenced view earlier in ViewsPlan.views."
                    )
            seen.add(v.name)

23.8 ViewsResult dataclass

Result of generate_views.

23.8.0.1 Attributes

views: Mapping of view name -> dataset where each split's .X is the view-specific feature matrix. columns: Mapping of view name -> selected column indices (sorted, unique). seed: Global seed used for any stochastic view operations (e.g. random column selection). plan: The input plan (validated). meta: Arbitrary metadata.

Source code in src/modssc/views/types.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
@dataclass(frozen=True)
class ViewsResult:
    """Result of `generate_views`.

    Attributes
    ----------
    views:
        Mapping of view name -> dataset where each split's `.X` is the view-specific feature matrix.
    columns:
        Mapping of view name -> selected column indices (sorted, unique).
    seed:
        Global seed used for any stochastic view operations (e.g. random column selection).
    plan:
        The input plan (validated).
    meta:
        Arbitrary metadata.
    """

    views: dict[str, LoadedDataset]
    columns: dict[str, np.ndarray]
    seed: int
    plan: ViewsPlan
    meta: dict[str, Any]

23.9 ViewsValidationError

Bases: 23.6 ViewsError, ValueError

Raised when a ViewsPlan / ViewSpec is invalid.

Source code in src/modssc/views/errors.py
8
9
class ViewsValidationError(ViewsError, ValueError):
    """Raised when a ViewsPlan / ViewSpec is invalid."""

23.10 generate_views(dataset, *, plan, seed=0, cache=True, fit_indices=None)

Generate multiple feature views from a dataset.

23.10.0.1 Parameters

dataset: Input dataset from :mod:modssc.data_loader (train/test splits). plan: ViewsPlan describing how to create each view. seed: Global seed controlling stochastic view operations (e.g. random feature split). cache: Passed through to :func:modssc.preprocess.preprocess when preprocessing is used. fit_indices: Indices (relative to the train split) to use when fitting preprocessing steps (e.g. PCA). Defaults to np.arange(len(train)).

23.10.0.2 Returns

ViewsResult Each view is returned as a LoadedDataset where .train.X and .test.X are view-specific feature matrices, while labels / edges / masks are preserved.

Source code in src/modssc/views/api.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
def generate_views(
    dataset: LoadedDataset,
    *,
    plan: ViewsPlan,
    seed: int = 0,
    cache: bool = True,
    fit_indices: np.ndarray | None = None,
) -> ViewsResult:
    """Generate multiple feature views from a dataset.

    Parameters
    ----------
    dataset:
        Input dataset from :mod:`modssc.data_loader` (train/test splits).
    plan:
        ViewsPlan describing how to create each view.
    seed:
        Global seed controlling stochastic view operations (e.g. random feature split).
    cache:
        Passed through to :func:`modssc.preprocess.preprocess` when preprocessing is used.
    fit_indices:
        Indices (relative to the *train* split) to use when fitting preprocessing steps
        (e.g. PCA). Defaults to ``np.arange(len(train))``.

    Returns
    -------
    ViewsResult
        Each view is returned as a `LoadedDataset` where `.train.X` and `.test.X` are view-specific
        feature matrices, while labels / edges / masks are preserved.
    """

    start = perf_counter()
    plan.validate()

    dataset_fp = None
    if hasattr(dataset, "meta") and isinstance(dataset.meta, dict):
        dataset_fp = dataset.meta.get("dataset_fingerprint")
    logger.info(
        "Views start: views=%s seed=%s cache=%s dataset_fp=%s",
        [v.name for v in plan.views],
        seed,
        bool(cache),
        dataset_fp,
    )

    train_y = _as_numpy(dataset.train.y)
    n_train = int(train_y.shape[0])
    if fit_indices is None:
        fit_indices = np.arange(n_train, dtype=np.int64)

    views: dict[str, LoadedDataset] = {}
    columns: dict[str, np.ndarray] = {}
    n_features_map: dict[str, int] = {}

    for view in plan.views:
        view_start = perf_counter()
        # 1) Optional preprocessing (cached, deterministic)
        ds = dataset
        if view.preprocess is not None:
            res = run_preprocess(
                ds, plan=view.preprocess, seed=int(seed), fit_indices=fit_indices, cache=bool(cache)
            )
            ds = res.dataset

        X_train = _as_numpy(ds.train.X)
        X_test = _as_numpy(ds.test.X) if ds.test is not None else None

        if X_train.ndim != 2:
            raise ViewsValidationError(
                f"View {view.name!r}: expected train.X to be 2D, got shape={X_train.shape}"
            )
        if X_test is not None and X_test.ndim != 2:
            raise ViewsValidationError(
                f"View {view.name!r}: expected test.X to be 2D, got shape={X_test.shape}"
            )

        n_features = int(X_train.shape[1])
        cols = _resolve_columns(
            spec=view.columns,
            n_features=n_features,
            seed=int(seed),
            view_name=str(view.name),
            resolved=columns,
            n_features_map=n_features_map,
        )
        n_features_map[str(view.name)] = n_features
        columns[str(view.name)] = cols

        X_train_v = X_train[:, cols]
        X_test_v = X_test[:, cols] if X_test is not None else None

        # 2) Preserve y/edges/masks (do NOT copy large arrays)
        train_split = Split(
            X=X_train_v,
            y=ds.train.y,
            edges=ds.train.edges,
            masks=ds.train.masks,
        )
        test_split = (
            Split(X=X_test_v, y=ds.test.y, edges=ds.test.edges, masks=ds.test.masks)
            if ds.test is not None
            else None
        )

        # 3) Meta
        meta: dict[str, Any] = dict(ds.meta) if isinstance(ds.meta, dict) else {}
        meta.setdefault("views", {})
        meta["views"][str(view.name)] = {
            "columns": cols.tolist(),
            "columns_mode": (view.columns.mode if view.columns is not None else "all"),
            "preprocess": (asdict(view.preprocess) if view.preprocess is not None else None),
        }
        if view.meta:
            # view-level metadata override/additions
            meta.setdefault("view_meta", {})
            meta["view_meta"][str(view.name)] = dict(view.meta)

        views[str(view.name)] = LoadedDataset(train=train_split, test=test_split, meta=meta)
        logger.debug(
            "View built: name=%s train_shape=%s test_shape=%s duration_s=%.3f",
            view.name,
            _shape_of(X_train_v),
            _shape_of(X_test_v),
            perf_counter() - view_start,
        )

    result = ViewsResult(
        views=views,
        columns=columns,
        seed=int(seed),
        plan=plan,
        meta={"n_views": len(views)},
    )
    logger.info(
        "Views done: count=%s duration_s=%.3f",
        len(views),
        perf_counter() - start,
    )
    return result

23.11 two_view_random_feature_split(*, preprocess=None, fraction=0.5, seed_offset=0, name_a='view_a', name_b='view_b')

Convenience helper for classic 2-view feature split.

The first view picks a random subset of columns, the second view is its complement.

Source code in src/modssc/views/plan.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def two_view_random_feature_split(
    *,
    preprocess: PreprocessPlan | None = None,
    fraction: float = 0.5,
    seed_offset: int = 0,
    name_a: str = "view_a",
    name_b: str = "view_b",
) -> ViewsPlan:
    """Convenience helper for classic 2-view feature split.

    The first view picks a random subset of columns, the second view is its complement.
    """

    a = ViewSpec(
        name=name_a,
        preprocess=preprocess,
        columns=ColumnSelectSpec(
            mode="random", fraction=float(fraction), seed_offset=int(seed_offset)
        ),
        meta={"role": "primary"},
    )
    b = ViewSpec(
        name=name_b,
        preprocess=preprocess,
        columns=ColumnSelectSpec(mode="complement", complement_of=name_a),
        meta={"role": "complement"},
    )
    plan = ViewsPlan(views=(a, b))
    plan.validate()
    return plan
Sources
  1. src/modssc/views/api.py
  2. src/modssc/views/plan.py