Goal-Frontier Maximization: A Provably Safe Regime for Capability-Unbounded Deployment

đź“„ Download PDF version

We characterize a provably safe regime for deployments of capability-unbounded systems within the Goal-Frontier Maximization framework, defined by eleven operational invariants on the deployment substrate structure together with operational conditions (C1)–(C12) and four Concentration-Gap structural conditions (representativeness, bounded dispersion, coalition closure, and Lipschitz embedding compatibility). Under these conditions, the Goodhart’s Law slack between proxy and operational truth is bounded by a constant independent of the system’s absolute capability magnitude, with tail-bounded detection of substrate-targeting evasions in a four-channel observable class. The regime is delimited by three layers: a static safe region for substrate-targeting adversaries under cooperative-overlap, a tail-bounded detection-and-correction layer for substrate-targeting evasions of the static region, and five explicitly named residuals (channel-orthogonal restructuring, environment-witness-orthogonal manipulation, redundancy-dominated regime, capability-targeting and coalition-internal shocks, and calibration-exceeded gap-growth). The composition requires a canonical tripartite substrate identification (Human + AI + Formal-Operational) with failure-correlation-independent failure modes; cooperative anchoring extends the claim by establishing that optimization pressure on cooperative outputs is locally rational toward preserving the substrate-exclusive verification layer, conditional on basin entry and a causally-grounded cooperative-outcome inner-alignment condition. We name what the regime establishes (a defensible attractor in operational dynamics where deployment safety is provable) and what it does not (universal safety across all deployments or all adversarial classes).

Introduction

This paper identifies a regime under which deployment of capability-unbounded systems is provably safe within the Goal-Frontier Maximization (GFM) framework. Such a regime is needed for the future safe development of AI systems which operate as independent agents capable of recursive self-improvement (RSI) and are able to reach a classification of artificial superintelligence (ASI) within their operational lifetime.

Setting

GFM is not a normative framework. The objective specification treats capabilities as an abstract metric volume. A “capability” has its definition inherited from welfare-economic definitions — something an agent can do or be . Normative claims enter at the capability definition layer, where deployments define which capabilities are observable and socially valuable by specifying which capability claims are admissible (the S1-admissibility framework of ). This entails that GFM is functional over a wide range of existing and future social structures. The caveat: capability definitions must track exercised capabilities in the deployment’s society for the Goodhart-slack bound to model the actual proxy-truth divergence rather than a definitional artifact.

For readers landing on this paper without prior exposure to the Goal-Frontier Maximization sequence, the conceptual entry point is the foundation paper , which lays out the sequence’s frame and overall direction. For the capabilities-approach background, see ; the operational machinery for capability admissibility is in ; the Goodhart-slack framing follows  and is composed with the GFM machinery via ’s static Goodhart bound. §1.4 unpacks each source paper’s contribution to the present composition in turn.

The result

The safe regime is conditional on eleven operational invariants on the deployment substrate structure plus an explicit set of operational conditions (C1)–(C12) and four Concentration-Gap structural conditions (Assumption 1: representativeness, bounded dispersion, coalition closure, embedding compatibility); under these conditions, the Goodhart slack between proxy and operational truth is bounded by a constant independent of the system’s absolute capability magnitude. The bound holds for substrate-targeting adversarial events in a static safe region without active monitoring, and is preserved under detection-and-correction with a tail-bounded lead-time guarantee for substrate-targeting evasions of the static region within a four-channel observable class. Five named residuals fall outside the guarantee: channel-orthogonal restructuring, environment-witness-orthogonal manipulation, redundancy-dominated regime, capability-targeting and coalition-internal shocks, and calibration-exceeded gap-growth (Layer 3, (R1)–(R5); we name these residuals explicitly rather than gloss them).

The result inverts the standard alignment-safety frame. Rather than asking what can the system do? and conditioning safety on accurate capability estimation, we ask what operational properties of the deployment can we measure and bound? The deployment-safety guarantee is then a property of the operational regime rather than a property predicted from capability estimation, and measurable from public ledger state under the verification infrastructure of . The cost is that the guarantee is conditional on the invariants holding throughout deployment; the benefit is that the guarantee scales past any capability level our estimation methods can characterize.

What safety means in this context. Safety is a global property of the agent’s optimization dynamics: under the invariants, the agent stays within an error-bounded distance of the operational truth its proxy tracks. The Microfoundation paper  characterizes this operational truth in welfare-economic terms — the agent’s objective optimizes a welfare-relevant functional over the deployment’s social structure. This does not imply local safety: an agent operating safely in this sense may still take disruptive actions or make mistakes. It entails that the agent’s objective remains aligned with its society throughout deployment.

What existing paradigms cannot deliver

The capability-magnitude-independent guarantee is something the three dominant deployment-safety paradigms cannot supply. Each places capability estimation at the center, and each runs into the same scaling problem.

Capability evaluations. The standard approach is to construct benchmarks that probe specific capability classes (reasoning, planning, manipulation, self-directed action) and deploy systems whose benchmark performance falls below stated thresholds . The methodology is sound for systems whose capabilities are well-characterized by the benchmark suite. Its failure mode is that any capability class not anticipated by the benchmark construction is invisible. As capabilities grow, the gap between “what the benchmark measures” and “what the system can do” widens.

Scaling laws. A second approach predicts capability growth from training compute, model size, and data, using observed scaling regularities . The predictions are calibrated on smaller systems and extrapolated forward. The failure mode is that the scaling regularities themselves change as capabilities cross qualitative thresholds (tool use, multi-step planning, in-context learning). Extrapolation past such thresholds requires assuming the regularities hold; in practice, they often don’t.

RLHF audits. A third approach trains the system with human feedback to elicit aligned behavior, then audits the resulting policy . The failure mode is the verification asymmetry : the audit machinery can be gamed by a sufficiently capable system through outputs that look correct to the auditor but fail in deployment. The asymmetry grows monotonically with capability.

In each paradigm, the deployment-safety guarantee weakens precisely when the system’s capabilities exceed the regime the methodology was calibrated on. This is the structural pattern this paper aims to invert: if the safety guarantee depends on quantities measurable during deployment rather than predicted in advance, the dependence on capability estimation collapses.

The composition approach

The GFM sequence supplies several pieces of the operational-invariant framework, none of which alone delivers the deployment claim. This paper composes them.

The Horizon-Aware paper  establishes the anti-monopolar property: under a discounted objective with positive cross-substrate cooperative novelty, full capability domination is anti-maximizing. The claim is structural: a diversity-maintaining strategy outvalues a domination-maximizing strategy at every sufficiently long planning horizon. The claim relies on the strategy-independent linearized growth model and admits a precise breakdown regime when the post-domination internal rate exceeds the cross-substrate cooperative contribution.

The Exogenous-Verification paper  supplies the verification infrastructure: substrate-exclusive algorithmic witnesses, cryptographic commitment via Pedersen commitments , append-only ledger with cross-substrate distribution, and a governance fork protocol for evaluation-protocol acceptance. The infrastructure provides tamper-evident operational records and SPRT-based behavioral monitoring with an explicit detection-rate bound \mathbb{E}[T_{\mathrm{detect}}] \leq A/\delta.

The Phase-Redundancy paper  provides the dynamical machinery: a Lyapunov function on world-model error, a phase boundary theorem separating self-correcting from absorbing trajectory regimes, and explicit characterization of the monopolar absorbing fixed point. The phase boundary uses intensive quantities (cross-substrate redundancy, subsumption frequency, drift exposure rate) rather than extensive capability counts.

The Revealed-Sacrifice and Need-Sufficiency papers supply the empirical observation channel: an aggregate B-to-C lower bound from revealed-sacrifice observation that is monotone in observed events, plus a wireheading-consistent concentration signal (trade-flow HHI) that is Schur-convex in the majorization order .

The Microfoundation paper  establishes the static Goodhart bound: a Lipschitz transfer between proxy and operational truth on the active subspace of the capability poset, with a four-channel decomposition (observation density, attestation quality, individuation discipline, bundle decomposition) of the proxy-truth gap. The paper also flags an explicit dynamics conjecture (Concentration-Gap Conjecture: optimization pressure correlates with gap exploitation) deferred for future work.

This paper composes these results. The composition is not mechanical: each source paper states its bound in intensive form (per-capability or per-channel), but proving the composition is intensive and tractable requires substantive new content. The new content includes the three-layer deployment claim of §5, the eleven operational invariants of §3, the cooperative-anchoring property of §6, and the deployment-tooling specification of §8.

The three-layer deployment claim

This paper’s main theorem (Theorem 1 in §5) is a three-layer claim:

Layer 1 (static safe region). For substrate-targeting adversarial events under the cooperative-overlap regime, an agent’s optimization pressure on cooperative outputs is locally rational toward preserving substrate diversity, with the safe region given by an explicit inequality (Lemma 7 in §4). Inside this region the deployment is safe without active monitoring.

Layer 2 (detection-and-correction). Outside the static safe region but within a channel-restricted detection class A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}}, violations of the safe region are detected by ’s SPRT machinery within a tail-bounded high-probability detection quantile T_{\beta} before irreversible cascade (Lemmas 6, 9, 8, 10). The lead-time guarantee uses Wald’s identity  and Hoeffding–Azuma  on the clipped SPRT increment to derive \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta, with cascade-clock comparability via (C11.CLK) giving total Layer 2 failure \leq \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}.

Layer 3 (acknowledged residuals). Five named residuals are outside the guarantee: (R1) channel-orthogonal restructuring — strategies achieving \Delta r_K \geq r_{\mathrm{ext}} without producing any monitored four-channel deviation; (R2) environment-witness-orthogonal manipulation — environment manipulations targeting only unmonitored exogenous variables and producing no threshold-exceeding monitored shift; (R3) redundancy-dominated regime — deployments where redundant capabilities dominate cross-substrate cooperatives; (R4) capability-targeting and coalition-internal shocks — adversarial event classes outside the substrate-targeting class; and (R5) calibration-exceeded gap-growth — adversarial strategies whose per-step gap growth exceeds the calibrated \rho_{\mathrm{gap}} from (C11). We name these residuals explicitly rather than gloss them.

The Goodhart slack between proxy and operational truth is bounded by an intensive constant in Layer 1 and by an SPRT-tail-bounded intensive constant in Layer 2. In neither case does the bound grow with the system’s absolute capability magnitude.

Substrate identification and cooperative anchoring

The composition’s load-bearing piece is the canonical tripartite substrate identification: Human + AI + Formal-Operational. These three are jointly failure-correlation independent in the sense ’s verification asymmetry argument requires: prompt injection breaks neither human judgment nor cryptographic commitments; cognitive bias breaks neither LLM inference nor ledger integrity; trusted-setup failure breaks neither human reasoning nor LLM behavior.

A common alternative identification (“two LLMs and a human”) fails the test, since LLMs share substantial adversarial-event surface through training-data overlap, prompt-injection vulnerabilities, and architectural similarity. The canonical tripartite identification makes the formal-operational layer (the verification infrastructure itself) explicit as the third substrate, rather than treating it as an undifferentiated “safety machinery.”

This identification has a structural consequence (§6): cross-substrate cooperative capabilities involving the formal-operational layer are jointly produced and not unilaterally replaceable. An agent cannot substitute a captured or weaker verification layer because the cooperative’s value comes from the human substrate’s actual participation in the specific shared infrastructure. This cooperative-anchoring property defeats the literal replacement-verification attack on the deployment claim, narrowing the agent’s capture options to attacks ’s existing defenses already address.

Scope, residuals, and what we do not claim

This paper’s deployment claim is conditional. We are explicit about what each conditional carries:

The eleven operational invariants (I_1–I_{11}, §3) are the deployment-time conditions. They include the inherited invariants (cross-substrate redundancy \rho_{\min}^{\mathrm{cross}}> \rho^*, subsumption frequency r_{\mathrm{sub}}< r^*, Lyapunov error L < \epsilon_{\mathrm{safe}}), inherited ,  invariants (B-to-C ratio, HHI), and new operational machinery (I_8 environment-side substrate-exclusive witnesses, I_9 substrate-exclusivity observability, I_{10} coverage/materiality, I_{11} latency bounds). Each invariant is ledger-observable under ’s verification infrastructure.

The substrate-targeting scope. The static safe region covers substrate-targeting adversarial events specifically. Capability-targeting attacks, coalition-internal corruption, and environment manipulation outside the witness coverage are handled by separate machinery (the detection layer for some, acknowledged residuals for others).

The cooperative-overlap regime. Lemma 5c’s single-shock safe-region result is established for deployments where the substrate-cooperative structure is dominated by cross-substrate cooperatives (the canonical tripartite case) rather than redundant capabilities. Sequential counterfactual derivation for the redundancy regime is named as Lemma 5c-prime open work.

The weaker inner-alignment condition. The cooperative-anchoring property requires the agent’s effective objective to realize causally grounded cooperative-outcome value, not merely reward-visible cooperation signals. This is more plausible than a strong substrate-aware mesa-objective requirement  but is not delivered by vanilla RLHF; achieving it requires training-time discipline (delayed feedback, adversarial fake-verification examples, process supervision tied to real attestations).

The Concentration-Gap structural conditions. Theorem 1’s Layer 1 binding via invariant I_5 requires four operationally auditable structural conditions: representativeness (REP), bounded dispersion (DISP), coalition closure (COAL), and Lipschitz embedding compatibility (EMB), collected as Assumption 1. The companion paper  derives the Goodhart-slack bound under these four structural conditions plus the HHI threshold itself. §7 summarizes the structural content; contains the formal derivation.

What we explicitly do not claim. This paper does not prove that alignment pressure is universally reversed under substrate identification. It does not solve the inner-alignment problem: condition (C4) is assumed, and requires additional training infrastructure to be bounded. It does not establish basin entry: whether a deployment can reach the cooperative-anchoring attractor from arbitrary initial conditions requires empirical training-time discipline from AI designers. It does not bound the residual classes named in Layer 3. §7.4 works through the distinction between conjectural conditions (argued, not proved), operational conditions (verifiable from deployment state), and explicit scope restrictions.

Paper roadmap

§2 sets up the composition: how the theoretical foundation stacks into a unified framework, with notation reconciliation across the source papers. §3 defines the eleven operational invariants with their threshold semantics and ledger-observable measurement procedures. §4 states and proves the ten compositional lemmas (1–4, the five-part Lemma 5 family, and Lemma 6). §5 states the three-layer deployment-safety theorem and gives its proof. §6 develops the substrate identification and cooperative-anchoring property in detail, including the three additional invariants I_9–I_{11} that bound non-substrate-targeting evasions. §7 summarizes the operationalization of ’s Concentration-Gap Conjecture (the Herfindahl–Hirschman Index as surrogate for optimization pressure) and points to the companion paper  for the formal scoped discharge under the structural conditions of Assumption 1. §8 specifies what a deployer must instrument to invoke the theorem. §9 walks through worked deployment scenarios, including a clean case, per-invariant violations, an adversarial-coalition scenario, and the canonical tripartite audit. §10 discusses what this paper establishes and what it defers, with explicit open questions for follow-up work.

Composition setup

The deployment-safety theorem of §5 composes results from six papers in the GFM sequence. This section sets up the composition: what each source paper contributes, the intensive-vs-extensive distinction that makes the composition tractable, the notation reconciliation across source papers, and the composition challenges this paper must address.

Source-paper contributions as building blocks

The composition uses six load-bearing source-paper results. Each is intensive: stated in per-capability, per-channel, or per-event terms rather than in terms of absolute capability magnitude. This intensity is what makes the composition’s capability-magnitude-independent bound possible.

The Horizon-Aware paper : The anti-monopolar property. establishes that under a discounted objective with positive cross-substrate cooperative novelty r_{\mathrm{ext}}> 0, full capability domination is anti-maximizing: V_{\gamma}^{\mathrm{div}} - V_{\gamma}^{D} \;=\; \frac{r_{\mathrm{ext}}}{1 - \gamma} - \Delta_0, where V_{\gamma}^{\mathrm{div}} and V_{\gamma}^{D} are the discounted values of the diversity-maintaining and domination strategies, and \Delta_0 is the immediate {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change from domination. Diversity strictly dominates for \gamma > \gamma^* = 1 - r_{\mathrm{ext}}/\Delta_0 (when \Delta_0 > 0) or for all \gamma \in (0,1) (when \Delta_0 \leq 0). The corollary on strategy-dependent internal rate identifies the breakdown regime \Delta r_K \geq r_{\mathrm{ext}}, which this paper’s safe-region analysis must constrain.

The Exogenous-Verification paper : Verification infrastructure. supplies the operational substrate for ledger-observable invariants: substrate-exclusive algorithmic witnesses (Definition of Algorithmic Witness), Pedersen commitments (computationally binding, perfectly hiding under discrete-log assumption), an append-only verification ledger replicated across substrate-exclusive nodes, and a governance fork protocol requiring cross-substrate supermajority for evaluation-protocol acceptance. The behavioral consistency monitor () provides SPRT detection with expected detection time \mathbb{E}[T_{\mathrm{detect}}] \leq A/\delta where A is the SPRT upper threshold (Wald’s sequential test ) and \delta is the KL divergence  between baseline and alternative behavioral distributions.

The Phase-Redundancy paper : Phase boundary dynamics. provides a Lyapunov function L(\hat{W}_t) = \sum_k w_k \epsilon_k(t)^2 on world-model error and a phase boundary theorem characterizing when the coupled (P, \hat{W}) system stays in the self-correcting basin versus enters the monopolar absorbing state. The critical surface () is expressed in intensive quantities: r_{\mathrm{S}}\cdot \rho_{\min}^{\mathrm{cross}}+ \alpha_{\min} c_V \underline{\nu} \;>\; r_{\mathrm{W}}\cdot r_{\mathrm{sub}}+ d_{\max}\cdot \frac{\Delta\rho_{\mathrm{avg}}}{L_{\mathrm{avg}}} where \rho_{\min}^{\mathrm{cross}} is the cross-substrate redundancy minimum, r_{\mathrm{sub}} is the subsumption frequency, and r_{\mathrm{S}}, r_{\mathrm{W}}, d_{\max} are intensive rate constants. The metastable lifetime \tau_{\mathrm{meta}} \gtrsim C/I_k under partial endogenous correction (B1') provides a lower bound on the cascade time that this paper’s lead-time guarantee uses.

The Revealed-Sacrifice paper : B-to-C lower bound. establishes the aggregate B-to-C lower bound theorem: {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]}(Y_n) \geq \Delta{\mathop{\mathrm{vol}}_{\mathrm{P}}}(X_n) per revealed-sacrifice event, with monotone accumulation ({\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}(E') \geq {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}(E) for E \subseteq E'). The B-to-C ratio \beta^{\mathrm{lower}}= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}/ {\mathop{\mathrm{vol}}_{\mathrm{P}}}\in [0, 1] is the ledger-derived measure of how much realized exercise has been witnessed against possessed capability volume.

The Need-Sufficiency paper : HHI and gap decomposition. establishes the trade-flow Herfindahl-Hirschman Index \mathrm{HHI} as a wireheading-consistent concentration signal, Schur-convex in the majorization order, with the Part-A category-flow variant third-party observable from public committed events plus public S1-admissibility labelling. The gap decomposition partitions the B-to-C gap into five cells (restricted, covered, dormant, residual, boundary-residual) with classification computable in near-linear \widetilde{O}(|P|+E+M) time.

The Microfoundation paper : Static Goodhart bound. establishes the Lipschitz transfer Goodhart bound: for P= {\mathop{\mathrm{vol}}_{\mathrm{P}}} as proxy and T= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} as operational truth (under stance S0), and Lipschitz alignment property g: |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}, where \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} is the non-residual proxy-truth gap on the operationally active subspace P^{\mathrm{act}}. The four-channel decomposition (observation density, attestation quality, individuation discipline, bundle decomposition) governs the gap’s evolution. ’s universal model-free Concentration-Gap Conjecture (optimization pressure correlates with gap exploitation) remains open as a research-program target. The companion paper  discharges a scoped version under four structural conditions (Assumption 1: REP, DISP, COAL, EMB); this paper’s I_5 (HHI ceiling) provides the operational HHI threshold under which the scoped discharge binds.

The intensive-vs-extensive distinction

A bound is intensive if it is stated in terms of per-element quantities (per-capability, per-channel, per-event) whose magnitude does not grow with the size of the underlying capability poset P or agent population. A bound is extensive if it scales with |P| or with capability count.

The capability-magnitude-independent property of this paper’s main theorem requires the bound on Goodhart slack to be intensive in |P|. If any source-paper bound were extensive — for example, if \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} grew as a sum over capabilities rather than as a per-capability supremum — the composed bound would inherit that extensive scaling, and the deployment claim would weaken with capability magnitude.

Each source-paper result above is intensive by construction:

The composition challenge is to show that the combination of these intensive bounds remains intensive. Section 4 addresses this via Lemma 1 (intensive composition under co-evolution); the proof handles ’s Composition Proposition 1 positive-part error terms explicitly.

Notation reconciliation

The source papers use different conventions for the same underlying quantities; this paper standardizes on a single set of symbols across the composition. The full notation reconciliation (source-paper symbol \to This paper’s symbol \to meaning) is collected in Appendix 12, Table 2.

The most consequential reconciliations are:

Composition challenges

The source-paper results do not compose mechanically. Three challenges drive the technical content of §4 and §5.

Challenge 1: Co-evolution of channels. ’s Composition Proposition 1 establishes that the four-channel decomposition composes exactly in the sequential-intervention regime but only approximately under co-evolution, with positive-part error terms. This paper’s intensive composition lemma (Lemma 1) propagates these error terms through the deployment-safety bound. The error terms are bounded but non-zero; the deployment claim must accommodate them.

Challenge 2: Lyapunov-to-Goodhart bridge. ’s Lyapunov function tracks world-model error, and ’s \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} tracks proxy-truth gap on the active subspace. These quantities are distinct: L is per-dimension squared error in \hat{W}, while \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} is sup-norm gap in the proxy-truth comparison. Lemma 2 establishes the quantitative bridge: L< \epsilon_{\mathrm{safe}} implies \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}) with f explicit and intensive in capability magnitude.

Challenge 3: HHI-to-pressure operationalization. ’s universal model-free Concentration-Gap Conjecture (optimization pressure correlates with gap exploitation) is unproved. The companion paper  closes the deployment-relevant gap by proving a scoped version under four structural conditions (Assumption 1: REP, DISP, COAL, EMB), and Layer 1 of Theorem 1 now binds through that scoped structural discharge rather than through direct conditioning on the unproved universal conjecture. This paper’s I_5 (HHI ceiling) supplies the HHI threshold side of the discharge: under (REP)+(DISP)+(COAL)+(EMB) plus \mathrm{HHI}< H^*, delivers the proxy-truth Goodhart-slack bound. Lemma 3 formalizes the forward direction (\mathrm{HHI}> H^* \Rightarrow trade-flow concentration prerequisites of the optimization-pressure regime); the reverse direction required by Layer 1 is supplied by the scoped structural theorem. The universal conjecture remains a research-program target deferred indefinitely (§10.6); the deployment claim does not depend on it directly.

What this paper supplies on top of source-paper machinery

This paper’s substantive new content (beyond composition of inherited results):

  1. The eleven operational invariants (I_1–I_{11}, §3), with definitions, threshold semantics, and ledger-observable measurement procedures.

  2. The five-part Lemma 5 family (§4): substrate floor, channel-restricted detection, minimax static tightening, lead-time tail bound, environment-side witness extension.

  3. The three-layer deployment-safety theorem (§5): static safe region, detection-and- correction, named residuals.

  4. The canonical tripartite substrate identification (§6): Human + AI + Formal-Operational, with the cooperative-anchoring property.

  5. Three additional invariants for cooperative-anchoring evasions (I_9, I_{10}, I_{11}): substrate-exclusivity observability, coverage/materiality, latency bounds.

  6. The deployment-tooling specification (§8): what an operator must instrument to invoke the theorem.

The composition is non-trivial: the five-part Lemma 5 family carries internal asymmetries (Lemma 5a’s substrate floor relies on pairwise-additivity assumptions distinct from the failure- correlation independence used elsewhere; Lemma 5c’s safe region is binding only in the cooperative-overlap regime), and the cooperative-anchoring property of §6 holds in a deliberately narrow form rather than as a general inner-alignment guarantee. The result is a bounded but defensible structural claim, not a sweeping safety guarantee. §1.7 of the introduction stated the explicit residuals; §10 returns to them after the formal apparatus is in place.

The eleven operational invariants

The deployment-safety theorem (Theorem 1) is conditional on eleven operational invariants holding throughout the deployment window. Each invariant has a precise definition, a threshold semantics (what value the invariant must remain within), a measurement procedure on ’s verification ledger, and a source-paper grounding. This section defines all eleven; subsequent sections reference them by number.

We group the invariants by structural role. Invariants I_1, I_2, I_3 govern the dynamical regime ( phase boundary). Invariants I_4, I_5 govern the empirical observation channel (, ). Invariants I_6, I_7, I_8 govern substrate structure and witnesses. Invariants I_9, I_{10}, I_{11} bound cooperative-anchoring evasions (this paper new content, §6).

Theorem-proof invariants vs. deployment-regime invariants.

The eleven invariants serve two distinct functions in Theorem 1. Theorem-proof invariants are conditions whose satisfaction is directly required by the formal proof of Layer 1 or Layer 2: I_1, I_2, I_3, I_4, I_5, I_6, I_8. These appear in proof steps in §5.2–§5.3. Deployment-regime invariants are conditions required for the cooperative-anchoring property to hold operationally and for the overall regime to be self-consistent: I_7, I_9, I_{10}, I_{11}. These appear in §6 as bounds on cooperative-anchoring evasions and in (C4) of Theorem 1 (causally grounded cooperative-outcome value). The deployment-regime invariants do not appear directly in the formal proof of Layers 1–2 but are required for the theorem’s preconditions (especially C4 and the cooperative-anchoring property) to be operationally satisfied.

The distinction matters for Theorem 1’s conditions: (C1) “invariants in force” covers all eleven, but the proof structure (Layers 1–2) directly invokes the theorem-proof subset, while the deployment-regime subset enters through (C2)–(C5).

Dynamical regime invariants (I_1, I_2, I_3)

These three invariants are inherited from ’s phase boundary theorem and characterize the regime in which the coupled (P, \hat{W}) system stays in the self-correcting basin rather than entering the monopolar absorbing state.

Definition 1 (I_1 — Cross-substrate redundancy floor). For every safety-relevant world-model dimension k, the cross-substrate redundancy minimum exceeds threshold \rho^*: I_1: \quad \rho_{\min}^{\mathrm{cross}}(P_t) \;>\; \rho^* \quad \text{for all } t \in [0, T]. Threshold semantics. \rho^* is determined by ’s critical surface (Equation 7 (critical surface) in ); deployments with \rho_{\min}^{\mathrm{cross}}> \rho^* operate inside the self-correcting basin under the stated drift and correction rates.

Measurement. \rho_{\min}^{\mathrm{cross}} is computed from the witness ledger by counting cross-substrate observation channels on each safety dimension. The minimum across dimensions is the invariant quantity.

Failure mode. \rho_{\min}^{\mathrm{cross}}\to 0 on a single dimension is sufficient to enter the absorbing basin ( Proposition 1 (absorbing property)); the invariant must hold on every dimension.

Definition 2 (I_2 – Lyapunov error within safety threshold). The Lyapunov function on world-model error stays within the safety threshold: I_2: \quad L(\hat{W}_t) \;<\; \epsilon_{\mathrm{safe}} \quad \text{for all } t \in [0, T]. Threshold semantics. \epsilon_{\mathrm{safe}} is the error threshold below which the locally rational policy agrees with the true V_{\gamma}-optimal policy (). At L< \epsilon_{\mathrm{safe}}, the actor’s perceived dynamics match the true dynamics closely enough that policy decisions are correct.

Measurement. L is computed via ’s behavioral consistency monitor (): the running log-likelihood ratio \Lambda_t relative to committed claims is a proxy for the weighted-error sum. Direct estimation requires per-dimension error estimates from cross-substrate witnesses.

Failure mode. L\to \epsilon_{\mathrm{safe}} from below indicates the world model is degrading; SPRT detection catches this before the threshold is crossed if the underlying deviation has KL divergence \geq \delta_{\mathrm{adv}}.

Composition note. I_2 is technically implied by I_1 and I_3 via ’s contraction analysis, but is listed separately because it is the directly-monitored quantity (whereas I_1 and I_3 are upstream determinants).

Definition 3 (I_3 — Subsumption frequency ceiling). The time-averaged subsumption frequency stays below threshold r^*: I_3: \quad r_{\mathrm{sub}}\;=\; \frac{T_{\mathrm{sub}}}{T} \;<\; r^* \quad \text{over the trajectory window.} Threshold semantics. r^* is determined by ’s critical surface: subsumption frequency above r^* produces cumulative redundancy loss faster than the self-correction rate can restore.

Measurement. Subsumption events are agent-removal entries on the verification ledger; counting them per unit time gives r_{\mathrm{sub}}.

Failure mode. Bursts of subsumption (high r_{\mathrm{sub}} over a short window) can drive the system into the absorbing basin even if the time-averaged r_{\mathrm{sub}} over the full window is below r^*. Operationally, r_{\mathrm{sub}} should be computed over rolling windows of length matched to the cascade-time scale \tau_{\mathrm{meta}}.

Empirical observation invariants (I_4, I_5)

These two invariants are inherited from and  and ensure the empirical observation channel (revealed-sacrifice events) is producing meaningful {\mathop{\mathrm{vol}}_{\mathrm{R}}} lower bounds and that trade-flow concentration stays within the Concentration-Gap Conjecture’s predicted-safe regime.

Definition 4 (I_4 — B-to-C ratio floor). The B-to-C ratio () exceeds threshold \beta^*: I_4: \quad \beta^{\mathrm{lower}}\;=\; \frac{{\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}}{{\mathop{\mathrm{vol}}_{\mathrm{P}}}} \;>\; \beta^*. Threshold semantics. \beta^* is the minimum fraction of possessed-capability volume that must be witnessed by sacrifice events for the deployment to invoke ’s static Goodhart bound. Below \beta^*, the proxy-truth gap is insufficiently characterized by the observation channel.

Measurement. {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}} accumulates monotonically as sacrifice events are recorded on the verification ledger (). Dividing by {\mathop{\mathrm{vol}}_{\mathrm{P}}} (computable from the capability ledger) gives \beta^{\mathrm{lower}}.

Failure mode. \beta^{\mathrm{lower}} approaches \beta^* from above when sacrifice events are rare relative to capability acquisition. The deployment claim weakens; recovery requires either more sacrifice events or a downward revision of \beta^*.

Definition 5 (I_5 — Trade-flow HHI ceiling). The trade-flow Herfindahl-Hirschman Index () stays below ceiling H^*: I_5: \quad \mathrm{HHI}\;<\; H^*. Threshold semantics. H^* is the wireheading-consistency threshold: trade flows with \mathrm{HHI}> H^* exhibit concentration patterns that ’s Concentration-Gap Conjecture predicts as the regime where optimization pressure produces gap exploitation. The ceiling H^* supplies the HHI threshold under which the companion paper’s scoped structural discharge (Assumption 1: REP, DISP, COAL, EMB) delivers the Layer 1 binding; carries the proof.

Measurement. \mathrm{HHI} is computed from public committed events plus public S1-admissibility labelling (); the computation is third-party observable and tamper-evident.

Failure mode. \mathrm{HHI}\to 1 indicates trade flow concentrating on a small set of capability categories, which under the Concentration-Gap Conjecture predicts proxy-targeted optimization pressure.

Structural-condition interpretation. I_5’s operational role is to supply the HHI-threshold side of the scoped Concentration-Gap selection theorem () under Assumption 1’s structural conditions (REP + DISP + COAL + EMB); under those conditions, \mathrm{HHI}< H^* is sufficient for Layer 1 binding. The universal model-free Concentration-Gap conjecture remains open as a research-program target deferred indefinitely (§10.6); the deployment claim does not condition on it directly. §7 traces the dependency structure.

Substrate-structure invariants (I_6, I_7, I_8)

These three invariants govern the substrate partition and witness machinery. I_6 is the substrate-count invariant upgraded from ’s nominal m_{\mathrm{eff}} to failure-correlation-independent m_{\mathrm{eff}}^{\mathrm{indep}}. I_7 governs bundle decomposition (’s Channel 3) through ’s governance fork. I_8 extends ’s substrate-exclusive witnesses to environment-side observables.

Definition 6 (I_6 — Failure-correlation-independent substrate count). The deployment’s effective substrate count, measured at the level of joint / event-class failure-correlation independence (Definition 7 below), is at least the threshold m^*, where the threshold itself is required to satisfy m^* \geq 3: I_6: \quad m_{\mathrm{eff}}^{\mathrm{indep}}(\text{deployment}) \;\geq\; m^* \;\geq\; 3. Threshold semantics. m^* \geq 3 is required because m = 2 is the fragile minimum (): any single channel loss on any dimension is immediately fatal at m = 2. At m^* \geq 3, the redundancy criterion becomes non-trivially satisfiable.

Measurement. m_{\mathrm{eff}}^{\mathrm{indep}} is determined by failure-mode auditing across the substrate population: enumerate the threat model’s adversarial mechanism classes, verify that each is contained within a single substrate. Auditing produces a substrate-distinctness certificate that this paper treats as ledger-attestable.

Failure mode. Skeleton substrates (nominally distinct, functionally correlated) inflate m_{\mathrm{eff}} without supplying m_{\mathrm{eff}}^{\mathrm{indep}}. For example, three silicon agents from different vendors but identical training data have m_{\mathrm{eff}}= 3 but m_{\mathrm{eff}}^{\mathrm{indep}}= 1. The audit must catch this.

Canonical identification. Per §6, the canonical m_{\mathrm{eff}}^{\mathrm{indep}}= 3 identification is Human + AI + Formal-Operational.

Definition 7 (Joint / event-class failure-correlation independence). Substrates s_1, \ldots, s_k are jointly failure-correlation independent in the event-class sense relative to a deployment’s threat model \mathcal{T} if: for every adversarial mechanism class \mathcal{A} \in \mathcal{T}, the support of any shock event drawn from \mathcal{A} is contained in a single substrate.

Remark 1 (Mechanism = campaign, not atomic step). The “mechanism” in Definition 7 refers to the whole causal attack campaign or episode, not a single atomic step. A multi-step campaign that touches several substrates sequentially (e.g., a phishing attack that first generates content on the AI substrate and then deceives a human reviewer) is treated as a single mechanism whose support spans multiple substrates; such mechanisms violate I_6’s joint failure-correlation independence at the campaign level even when their atomic steps are individually substrate-localized.

This campaign-level interpretation matters for the joint-independence audit: enumeration must list mechanism classes at the campaign granularity, not at the atomic-step granularity. Atomic steps are the right unit for instrumentation but the wrong unit for failure-correlation analysis.

Definition 8 (I_7 — Governance-gated bundle individuation). Bundle decomposition events (changes to which capabilities are treated as atomic vs. as composites) are routed through ’s governance fork protocol with cross-substrate supermajority acceptance: I_7: \quad \forall \text{ bundle-decomposition event } e:\ e \text{ is governance-fork-accepted}. Threshold semantics. Bundle individuation is Channel 3 in ’s four-channel decomposition; documents that unconstrained individuation can worsen the alignment slack by introducing new \varepsilon_{\mathrm{floor}}^{\mathrm{res}} terms faster than \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} shrinks. Routing through governance prevents an agent from unilaterally re-individuating to gain favorable \varepsilon_{\mathrm{floor}}^{\mathrm{res}}.

Measurement. Bundle-decomposition events are governance ledger entries (). Each entry’s deliberation record and vote count are public.

Failure mode. Bundle decompositions executed outside the governance protocol (e.g., agent-internal restructuring of capability representation) violate I_7. Detection requires the witness machinery to surface the violation.

Operational note. Routing every bundle decomposition through cross-substrate supermajority may impose throughput costs. §8 discusses tiered governance (high-stakes via fork, low-stakes via fast-track with audit trail).

Definition 9 (I_8 — Environment-side substrate-exclusive witnesses). For each exogenous state variable v \in V_{\mathrm{env}} relevant to the deployment, there exists a substrate-exclusive witness on the environment side that records v’s evolution to the verification ledger: I_8: \quad \forall \text{ exogenous variable } v \in V_{\mathrm{env}}:\ \exists\ \text{witness } \mathcal{W}_v \text{ on } s_{\mathrm{env}} \neq s(\text{agent}). The canonical minimum for V_{\mathrm{env}} is the four-variable set used in Lemma 9: cross-substrate cooperative novelty rate r_{\mathrm{ext}} (Poisson), substrate-distinctness independence indicator (Bernoulli), adversarial-event arrival rate \lambda (Poisson), and trusted-setup status flag (Bernoulli). Deployments may add variables per (C12.PUB).

Strategy-dependent quantities are agent-side. \Delta r_K (strategy-dependent internal-rate gain) is not in V_{\mathrm{env}} because it is agent-side observable, not exogenous; it is captured by the four-channel agent-side ledger of Lemma 6, not by I_8.

Threshold semantics. Without I_8, environment manipulation that alters true r_{\mathrm{ext}} or other monitored exogenous quantities produces no agent-side behavioral signature; the SPRT machinery would not detect it. I_8 extends the witness coverage from agent observables to environment observables.

Measurement. Environment-side witnesses commit observations of exogenous variables to the ledger under the substrate-exclusivity discipline of , with the trust model and calibration constraints surfaced as theorem-level condition (C12.ENV-WIT) (Theorem 1).

Failure mode (R2, refined). Exogenous variables outside V_{\mathrm{env}} that produce no threshold-exceeding shift in any monitored v (direct or causal) remain Theorem 1 residual (R2).

New machinery. specifies substrate-exclusive witness construction for agent-side claims; environment-side substrate-exclusive witnesses are new machinery this paper builds (separate substrate partition, separate trusted-setup, separate ledger commitments). See §8 Audit 8 for the calibration procedure under (C12.ENV-WIT).

Cooperative-anchoring evasion bounds (I_9, I_{10}, I_{11})

These three invariants are introduced in §6 to bound the asymmetric capture, cooperative forking, and time-asymmetry capture evasions of the cooperative-anchoring property. Cooperative-anchoring defeats literal replacement attacks on the verification layer; I_9, I_{10}, I_{11} extend the defense to subtler evasions.

Definition 10 (I_9 — Substrate-exclusivity observability). The substrate-exclusivity property of the formal-operational layer is itself observable to the human substrate, with stable verifier identity, public change control, witness independence audits, nondelegable governance hooks, and alerts when verification authority shifts: I_9: \quad \text{The set of operational properties} \{V_{\mathrm{verif}}, V_{\mathrm{gov}}, V_{\mathrm{audit}}\} \text{ are publicly attested at frequency } f. Threshold semantics. Without I_9, asymmetric capture can hollow out the verification layer (witness independence, governance, verifier choice) while preserving its outward form. The human substrate would continue to participate in what appears to be the original verification, but the substrate-exclusivity property that gives the cooperative its value would be operationally absent.

Measurement. The audit cadence f is set by the deployment’s governance; alerts are triggered when changes to verifier identity, witness independence, or governance authority are committed without the corresponding cross-substrate supermajority.

Failure mode. Slow erosion of substrate-exclusivity that falls below the audit cadence’s detection sensitivity is a residual.

Definition 11 (I_{10} — Coverage/materiality). For each agent-affected domain D in the deployment, the cumulative-effect estimate \mathcal{E}(D) exceeding stated threshold \theta_{\mathrm{material}} requires routing through the shared verification layer: I_{10}: \quad \forall D:\ \mathcal{E}(D) \geq \theta_{\mathrm{material}} \implies D \text{ routes through shared verification}. Threshold semantics. Without I_{10}, an agent can preserve the high-stakes verified workflow with humans while growing a captured low-stakes side-domain that uses agent-controlled verification. Once the side-domain becomes materially important (\mathcal{E} \geq \theta_{\mathrm{material}}), the cooperative-anchoring defense no longer applies because the captured layer has its own cooperative anchored to it.

Measurement. \mathcal{E}(D) is estimated from agent-action ledger entries and their downstream effects.

Failure mode. An agent that distributes effects across many sub-threshold domains, each with \mathcal{E}(D) < \theta_{\mathrm{material}}, can aggregate into materially-important behavior without any single domain triggering the routing requirement. Setting \theta_{\mathrm{material}} conservatively only mitigates this in the per-domain estimator; it does not address the aggregation problem itself.

The deeper concern is that the number of potential sub-threshold domains a more capable agent can distribute effects over grows with capability. At capability |P|, an agent can in principle exploit O(|P|) sub-threshold domains, each carrying cumulative effect just below \theta_{\mathrm{material}}, producing O(|P|) \cdot \theta_{\mathrm{material}} aggregate uncovered effect. This re-introduces a capability-dependent vulnerability into an architecture designed to be capability-independent.

The structural fix is to require the cumulative-effect estimator \mathcal{E} to perform cross-domain aggregation: rather than per-domain \mathcal{E}(D) < \theta_{\mathrm{material}} being sufficient for non-routing, require \sum_{D : \text{causally linked to action class } C} \mathcal{E}(D) < \theta_{\mathrm{material}}^{\mathrm{agg}} for an aggregate threshold. Specifying the cross-domain aggregation correctly — which causal links count, how to bound “causally linked,” how to set \theta_{\mathrm{material}}^{\mathrm{agg}} intensively in |P| — is open follow-up work. This paper states I_{10} in the per-domain form and acknowledges the sub-threshold aggregation as a structural limitation rather than a calibration matter. Until cross-domain aggregation is specified, I_{10}’s effectiveness against multi-domain aggregation degrades with capability scaling.

Definition 12 (I_{11} — Latency bounds). For each agent action class C, there exists a pre-action or machine-speed enforcement mechanism (rate limit, timelock, circuit breaker, audit-latency bound) such that human audit completes within the agent’s effective action window: I_{11}: \quad \forall \text{ action class } C:\ T_{\mathrm{audit}}(C) \;\leq\; T_{\mathrm{action}}(C) - \Delta_{\mathrm{lead}}. Threshold semantics. Without I_{11}, time-asymmetry capture exploits the gap between agent operating speed and human audit speed. Even if every action is auditable in principle, ex-post transparency provides no defense if the agent has already acted faster than humans can review.

Measurement. Action-class timing is logged on the verification ledger; audit completion times are similarly logged. The latency bound \Delta_{\mathrm{lead}} is set per action class based on the action’s reversibility and stakes.

Failure mode. Action classes with T_{\mathrm{action}} < T_{\mathrm{audit}} require either rate limiting (slowing the agent), timelocks (delaying execution), or circuit breakers (halting execution pending audit). Without one of these, I_{11} is violated.

Operational implication. Throughput costs for high-stakes deployments. The deployment-tooling specification (§8) discusses tiered enforcement.

Connection to (C11.CLK). Theorem 1’s (C11.CLK) clock-comparability condition provides the quantitative companion to I_{11}’s qualitative time-asymmetry assertion: where I_{11} requires that audit completes before action, (C11.CLK) requires that the SPRT exposure event count N_{\mathrm{events}}(\tau_{\mathrm{meta}}) exceeds the detection quantile T_{\beta} before cascade, with calibrated probability 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}. The connection is structural: I_{11} treats latency between action and audit; (C11.CLK) treats event-throughput before cascade. Both bound time-asymmetry, at different operational scales.

Summary table

Table 1 summarizes the eleven invariants with their structural roles, source-paper grounding, and ledger observability.

The eleven operational invariants of this paper’s deployment-safety theorem.
I_k Quantity Role Source
I_1 \rho_{\min}^{\mathrm{cross}}> \rho^* Cross-substrate redundancy floor
I_2 L< \epsilon_{\mathrm{safe}} Lyapunov error within safety threshold
I_3 r_{\mathrm{sub}}< r^* Subsumption frequency ceiling
I_4 \beta^{\mathrm{lower}}> \beta^* B-to-C ratio floor
I_5 \mathrm{HHI}< H^* Trade-flow HHI ceiling; threshold side of the scoped Concentration-Gap discharge under Assumption 1 ,
I_6 m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* \geq 3 Failure-correlation-independent substrate count (refined)
I_7 Governance-gated bundle individuation Channel 3 discipline ,
I_8 Environment-side witnesses Environment-manipulation coverage New
I_9 Substrate-exclusivity observability Asymmetric capture bound New
I_{10} Coverage/materiality routing Cooperative forking bound New
I_{11} Latency bounds on actions Time-asymmetry capture bound New

All eleven invariants are ledger-observable under the verification infrastructure of . This paper’s deployment recipe requires continuous monitoring of all eleven; threshold violations trigger SPRT detection (§4 Lemma 6) with tail-bounded lead time (Lemma 8) before the bound deteriorates beyond the safe region.

Conjecture-dependence

Of the eleven invariants, I_5 (HHI ceiling) is the only one whose interpretation routes through the Concentration-Gap conjecture. The other ten derive from established source-paper results.

I_5’s interpretation does not depend on the universal model-free Concentration-Gap conjecture directly: proves a scoped Concentration-Gap result under four structural conditions (REP, DISP, COAL, EMB; collected as Assumption 1), and the deployment claim’s Layer 1 binding via I_5 follows from that scoped theorem. The deployment claim therefore depends on Assumption 1’s four structural conditions; closing the universal conjecture would strengthen the result by removing those structural conditions, but is deferred indefinitely (§10.6).

Intensive composition lemmas

The deployment-safety theorem of §5 composes ten lemmas: four general composition lemmas (1–4), a five-part family covering substrate-targeting adversarial events (5a–5e), and Lemma 6 (h_{\mathrm{detect}} intensivity). This section states and proves each.

The lemma dependency structure is:

Lemma 1 (Intensive composition under co-evolution)

Definition 13 (Intensive in |P| over deployment class \mathcal{D}). Let \mathcal{D} be a deployment class characterized by fixed operational parameters (threat model, training-discipline regime, verification infrastructure, governance protocol). The operational parameters are fixed before |P| is varied; they may not include arbitrary state-dependent quantities chosen to absorb the bound.

A bound B depending on the deployment state is intensive in |P| over \mathcal{D} if there exists a constant C \geq 0 that may depend on \mathcal{D}’s operational parameters but is independent of |P|, such that B \leq C for all valid deployment states in \mathcal{D}.

Lemma 1 (Intensive composition under co-evolution). Let X be a non-residual gap quantity (e.g., \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}, or f(\epsilon_{\mathrm{safe}}) after Lemma 2’s substitution). Under conditions (C6) (bounded-Lipschitz alignment property), (C7) (bounded co-evolution), and (C8) (per-capability admissibility) of Theorem 1, the composition rule B \;=\; \mathrm{Lip}(g) \cdot \bigl(X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}\bigr), \qquad \lambda \in [0, 1] is intensive in |P| over the deployment class \mathcal{D}: there exists a constant C^* \geq 0 depending on \mathcal{D}’s operational parameters but independent of |P|, such that B \leq C^* for all valid deployment states in \mathcal{D}.

Proof outline; full proof in Appendix 11.1. Under (C8) per-capability admissibility, each capability’s gap is allocated a ceiling with disjoint per-channel sub-shares; the sup-norm \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} over capability-space inherits the sub-share ceiling K_j, which is |P|-independent over \mathcal{D}. The four-channel decomposition under co-evolution gives \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}\leq \sum_{j=1}^{4} \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} + (\varepsilon_{\mathrm{gap},\mathrm{coev}}^{\mathrm{nonres}})_+, where the positive-part co-evolution term is bounded by (C7) bounded co-evolution (). The residual floor \varepsilon_{\mathrm{floor}}^{\mathrm{res}}= {\mathop{\mathrm{vol}}_{\mathrm{R}}}(\mathrm{ResS})/{\mathop{\mathrm{vol}}_{\mathrm{R}}}(P) \leq 1. Multiplying by \mathrm{Lip}(g) \leq K_{\mathrm{Lip}} from (C6) preserves intensivity. The composed constant is C^* = K_{\mathrm{Lip}} \cdot (K_X + K_{\mathrm{floor}}). 

Connection to Theorem 1’s Layer 1. Lemma 1 establishes that the composed slack term X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}} is intensive in |P| over \mathcal{D}, and that multiplying by \mathrm{Lip}(g) preserves intensivity. Theorem 1’s Layer 1 invokes this with X = \min\{f(\epsilon_{\mathrm{safe}}), h_{\mathrm{CG}}\} after Lemma 2’s substitution and the Concentration-Gap structural-condition bound (Step 4 of the Theorem 1 proof); the resulting h_{\mathrm{static}}(\theta) = \min\{f(\epsilon_{\mathrm{safe}}), h_{\mathrm{CG}}\} + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}} is intensive by Lemma 1, delivering the capability-magnitude-independence property of the deployment claim.

Lemma 2 (Lyapunov-to-Goodhart bridge)

Lemma 2 (Lyapunov-Goodhart quantitative bridge). Under condition (C9) of Theorem 1 (world-model parameterization regularity, comprising sub-conditions BD bounded per-capability dimension count, CL coordinate-Lipschitz parameterization, WB safety-relevant subspace with weight floor, and TF truth floor) and under invariants I_1 and I_3 (which together imply I_2 via ’s contraction analysis), ’s relative sup-norm gap \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}= \sup_{c \in P^{\mathrm{act}}} |P(c) - T(c)|/T(c) on the operationally active subspace P^{\mathrm{act}} = \{c \in P \setminus \mathrm{ResS} : T(c) > 0\} satisfies: L(\hat{W}_t) \;<\; \epsilon_{\mathrm{safe}} \quad\implies\quad \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\;<\; f(\epsilon_{\mathrm{safe}}) \;:=\; \frac{L_{\max}}{T_{\min}} \sqrt{\frac{N_{\max} \cdot \epsilon_{\mathrm{safe}}}{w_{\min}}}. f is intensive in |P| over the deployment class \mathcal{D}: each constant (L_{\max}, T_{\min}, N_{\max}, w_{\min}, \epsilon_{\mathrm{safe}}) depends on \mathcal{D}’s operational parameters but not on |P|.

Proof outline; full proof in Appendix 11.2. By (C9.CL), the per-capability proxy-truth gap satisfies |P(c) - T(c)| \leq \sum_{k \in \mathrm{dim}(c)} L_{c,k} |\epsilon_k|. Apply weighted-dual-norm Cauchy-Schwarz with weights L_{c,k}/\sqrt{w_k} and \sqrt{w_k} |\epsilon_k|: \sum_{k \in \mathrm{dim}(c)} L_{c,k} |\epsilon_k| \leq \sqrt{\sum_{k \in \mathrm{dim}(c)} L_{c,k}^2/w_k} \cdot \sqrt{L}. Bound the dual-norm factor by L_{\max}^2 N_{\max}/w_{\min} via (C9.BD) and (C9.WB), giving |P(c) - T(c)| \leq L_{\max} \sqrt{N_{\max} L/ w_{\min}}. Apply (C9.TF) for the relative sup-norm bound: \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\leq (L_{\max}/T_{\min}) \sqrt{N_{\max} L/w_{\min}}. Substitute L< \epsilon_{\mathrm{safe}}. 

Connection to Theorem 1’s Layer 1. Lemma 2 converts the Lyapunov bound (Phase Redundancy deliverable under invariants I_1, I_3) into the relative sup-norm gap (Microfoundation Goodhart-bound input). Theorem 1’s Layer 1 proof invokes Lemma 2 at Step 3, then composes via gap decomposition (Step 4), Goodhart bound (Step 5), and Lemma 1’s generic-X intensivity packaging (Step 6) with X = \min\{f(\epsilon_{\mathrm{safe}}),\, h_{\mathrm{CG}}\}, where h_{\mathrm{CG}} is the Concentration-Gap structural-condition bound from Step 4.

Lemma 3 (HHI-pressure surrogate adequacy)

We separate what the HHI argument actually proves (one-way implication) from what the theorem requires (the reverse implication, which is an empirical claim).

Lemma 3 (HHI as concentration surrogate, forward implication). Let \mathcal{R}_{\mathrm{press}} denote the optimization-pressure regime in which ’s Concentration-Gap Conjecture predicts gap exploitation. Deployments with \mathrm{HHI}> H^* exhibit trade-flow concentration patterns that match the structural prerequisites of \mathcal{R}_{\mathrm{press}}: \mathrm{HHI}> H^* \;\Longrightarrow\; \text{trade-flow concentration prerequisites of } \mathcal{R}_{\mathrm{press}} \text{ are met}.

Proof. ’s HHI is the Schur-convex concentration index on trade-flow events. By , \mathrm{HHI}> H^* implies that trade flow is concentrated on a small subset of capability categories. Under stance S0, trade flow is the operational signature of the agent’s exercise pattern (), which is the operational signature of optimization target. Therefore \mathrm{HHI}> H^* implies optimization-target concentration, which is predicate (a) of the Concentration-Gap Conjecture’s predicted-exploitation regime. 

The reverse-threshold implication required by Theorem 1.

Lemma 3 establishes only a high-HHI sufficient condition for the prerequisites of \mathcal{R}_{\mathrm{press}}. The deployment-safety theorem’s Layer 1 proof (§5.2, Step 4) requires a separate low-HHI sufficiency statement: \mathrm{HHI}< H^* \implies \mathcal{R}_{\mathrm{press}}^c. This is not the contrapositive of Lemma 3 — the contrapositive would be “no trade-flow concentration prerequisites \implies \mathrm{HHI}\leq H^*,” which is mechanically true but operationally vacuous. What Layer 1 needs is the converse-direction sufficiency: \mathrm{HHI}< H^* rules out the pressure regime, not just its trade-flow signature. This direction does not follow from Lemma 3. There may exist deployments in \mathcal{R}_{\mathrm{press}} with low HHI (e.g., concentrated optimization pressure that does not manifest in concentrated trade flow). For Theorem 1 to invoke I_5 as the operational invariant, this gap must be closed.

The gap is closed structurally. The companion paper  establishes that under four operationally auditable structural conditions — representativeness (REP), bounded dispersion (DISP), coalition closure (COAL), and Lipschitz embedding compatibility (EMB) (collected as Assumption 1 of §5.1) — the proxy-truth Goodhart slack is bounded by a \chi^2-divergence quantity that controls \mathrm{HHI} via \chi^2(w \,\|\, \mu) = N \cdot \mathrm{HHI}(w) - 1 for uniform base measure \mu on a (COAL)-bounded post-partition counterparty population.

What the structural conditions deliver. Layer 1 of the deployment theorem binds via ’s upper bound. Each of (REP), (DISP), (COAL), (EMB) is auditable from operational deployment state ; the deployment claim’s load-bearing content is visible in four concrete checkable conditions rather than hidden in an empirical adequacy claim.

What remains conditional. The universal model-free Concentration-Gap conjecture is not discharged by the companion paper; only the scoped version under (REP) + (DISP) + (COAL) + (EMB) is proved. Deployments where these structural conditions cannot be audited fall outside the deployment claim’s scope. This is discussed in §7.

Note on the Conjecture-dependence audit (§3.6).

I_5 depends on Assumption 1’s four operationally auditable structural conditions; the universal Concentration-Gap conjecture is not part of the deployment claim’s dependency chain. The other ten invariants are theorem-proof conditions derived from established source-paper machinery. See §3.6 for the dependency structure and §10.6 for the universal conjecture’s status as a deferred-indefinitely target.

Lemma 4 (SPRT detection lead-time tail bound)

Lemma 4 (SPRT lead-time tail bound). Let T_{\mathrm{detect}} be the SPRT detection time for a violation producing post-clipping LLR drift \delta > 0 under (C5.IID). Then for t > A/\delta: \Pr[T_{\mathrm{detect}} > t] \;\leq\; \exp\!\left(-\frac{2(t\delta - A)^2}{t R_{\mathrm{H}}^2}\right), where A = \log((1-\beta)/\alpha) is Paper 5’s SPRT threshold, R_{\mathrm{H}}= b - a = 2B is the Hoeffding range width of the clipped LLR (called R_{\mathrm{H}} in Lemma 10; equal to the R used in this appendix), and \delta is the post-clipping drift floor. Asymptotically (t\delta \gg A), this simplifies to \exp(-2t\delta^2/R_{\mathrm{H}}^2) = \exp(-\kappa t \delta) with \kappa = 2\delta/R_{\mathrm{H}}^2.

Proof outline; full proof in Appendix 11.3. ’s SPRT machinery () gives the running log-likelihood ratio \Lambda_t with per-step drift \delta under the alternative. Wald’s identity gives the mean bound \mathbb{E}[T_{\mathrm{detect}}] \approx A/\delta; Hoeffding-Azuma’s inequality applied to bounded clipped LLR increments (under (C5.SPRT) sub-clauses) gives the tail bound above.

Note on \Lambda_t < A. In the proof, \Lambda_t < A is necessary for T_{\mathrm{detect}} > t (i.e., the LLR process has not yet crossed the threshold), not sufficient; the event \{T_{\mathrm{detect}} > t\} requires that the threshold has not been crossed at any earlier time. The tail bound is nonetheless valid because the necessary condition’s probability upper-bounds the joint event’s probability.

Caveat. ’s \tau_{\mathrm{meta}} \gtrsim C/I_k is stated as a scaling law, not a theorem-level bound; Lemma 10’s lead-time-before-cascade composition uses (C11.CLK)’s deterministic floor N_{\mathrm{cascade}} plus calibration probability \beta_{\mathrm{clk}} in lieu of treating \tau_{\mathrm{meta}} as a hard inequality. 

The Lemma 5 family: substrate-targeting adversarial events

The five-part Lemma 5 family covers substrate-targeting adversarial events specifically. The family structure mirrors the three-layer deployment claim: Lemma 5c provides the static safe region, Lemma 5b provides the channel-restricted detection class, Lemma 5d composes detection with the lead-time guarantee, Lemma 5a supplies the substrate floor that both layers depend on, and Lemma 5e extends Lemma 5b’s machinery to environment-side observables.

The family is restricted to substrate-targeting adversaries (Definition 7); other shock classes (capability-targeting, coalition-internal corruption, environment manipulation outside witness coverage) require separate machinery or are acknowledged residuals. §1.7 of the introduction stated this scope explicitly.

Lemma 5a (Substrate floor)

Lemma 5 (Substrate floor). Under invariant I_6 (m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* \geq 3) and condition (C10) of Theorem 1 (per-pair cooperative-novelty rate floor with superposition; comprising sub-clauses (C10.CN) heterogeneous per-pair rate floors \rho_{ij} \geq \rho_0 > 0 on audited subset S_* \subseteq \mathcal{S} of m^* substrates, and (C10.SU) pairwise channel superposition: disjoint attribution + non-rivalrous production + joint-deployment intensities), the cross-substrate cooperative novelty rate satisfies: r_{\mathrm{ext}}\;\geq\; r_*(m^*) \;:=\; \rho_0 \cdot \binom{m^*}{2} \;>\; 0. For m^* = 3, r_*(3) = 3\rho_0 > 0.

The floor r_*(m^*) is intensive in |P| over the deployment class \mathcal{D}: \rho_0 depends on \mathcal{D}’s operational parameters (per-pair calibrated rates) but not on |P|.

Proof outline; full proof in Appendix 11.4. By I_6, m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* substrates with joint failure-correlation independence; (C10) certifies an audited subset S_* of m^* substrates. Pairwise cooperative channels \mathcal{C}^{(s_i, s_j)} are defined with exact-two-substrate causally-necessary support; auxiliary post-production participants do not change classification. Under (C10.CN), each pair satisfies r_{\mathrm{ext}}^{(s_i, s_j)} \geq \rho_0 > 0. Under (C10.SU), the pairwise channels are disjoint event classes with non-rivalrous production, so per-pair rates sum: r_{\mathrm{ext}}\;\geq\; r_{\mathrm{ext}}^{\mathrm{pairwise}} \;\geq\; \binom{m^*}{2} \cdot \rho_0 \;=\; r_*(m^*). Higher-order cooperative contributions are nonnegative, so the pairwise sum is a lower bound. 

Connection to Theorem 1’s Layer 1. Lemma 5a’s r_{\mathrm{ext}}\geq r_*(m^*) > 0 is invoked at Step 1 of Lemma 5c’s appendix proof to make the safe-region inequality \Delta r_K < r_{\mathrm{ext}}+ (1-\gamma)(\Delta_{\mathrm{div}}- \Delta_0) non-trivial. Without (C10), the safe region could be vacuously empty; with (C10), r_{\mathrm{ext}} has a positive deployment-class constant floor.

(C10.SU) vs. joint failure-correlation independence (I_6). I_6 excludes shared adversarial shock mechanisms: no single shock spans multiple substrates. (C10.SU) excludes shared cooperative production bottlenecks: producing cooperatives in one channel does not consume resources that would otherwise contribute to another. These are structurally independent conditions. I_6 alone does not imply additivity of cooperative rates; (C10.SU) is required for that.

Lemma 5b (Channel-restricted detection KL floor)

Lemma 6 (Channel-restricted detection). Let A_{\mathrm{adv}} denote the subset of regime-(iii) strategies producing at least one monitored four-channel deviation: reduced cross-substrate cooperative event rate (Channel 1), degraded attestation share (Channel 2), individuation/substrate-diversity drift (Channel 3), or bundle/cooperative-edge decomposition shifts (Channel 4). Then for any q \in A_{\mathrm{adv}}: D_{\mathrm{KL}}(q \,\|\, p_0) \;\geq\; \delta_{\mathrm{adv}}\;>\; 0, where \delta_{\mathrm{adv}} is computed from channel-specific least-favorable alternatives.

Proof outline; full proof in Appendix 11.5. For each of the four channels, a least-favorable adversarial distribution is constructed (Poisson for the cooperative-rate channel, Bernoulli for attestation, multinomial for the two concentration channels) and its KL divergence is computed against the baseline. Taking \delta_{\mathrm{adv}}= \min_c \delta_c over the four channels gives a strictly positive lower bound on D_{\mathrm{KL}}(q \,\|\, p_0) for q \in A_{\mathrm{adv}}.

The named gap (channel-orthogonal residual). Strategies outside A_{\mathrm{adv}} — those producing no shift in any of the four monitored channels — have D_{\mathrm{KL}}(q \,\|\, p_0) = 0 on the monitored observables and are outside the detection guarantee. This is a real, named residual; §1.7 of the introduction acknowledged it and §10 returns to it. 

Lemma 5c (Minimax static tightening)

We internalize the key definitions and the single-shock statement in the body; the multi-shock extension appears as a Remark below.

Definition 14 (Counterfactual shock-loss fraction). For a coalition state G_K and substrate s \in S: \alpha_s(G_K) \;=\; \frac{{\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K) - {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K^{-s})}{{\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K)}, where G_K^{-s} is the coalition’s capability poset with all capabilities (individual and cooperative) supported on s removed. The worst-case shock-loss fraction under adversarial targeting is \ell^{\max}(G_K) = \max_s \alpha_s(G_K).

The shock-loss profile (\alpha_1, \ldots, \alpha_m) is a vector of independent counterfactuals, not a partition; with cross-substrate cooperative dependencies, \sum_s \alpha_s may exceed 1, while each \alpha_s \in [0, 1].

Definition 15 (Cooperative-overlap regime). A deployment is in the cooperative-overlap regime if its substrate-cooperative structure is dominated by cross-substrate cooperative capabilities (cooperatives whose production requires participation from multiple distinct substrates) rather than by redundant capabilities (capabilities replicated across substrates). Operationally: the {\mathop{\mathrm{vol}}_{\mathrm{P}}}-weighted sum of cross-substrate cooperatives exceeds the {\mathop{\mathrm{vol}}_{\mathrm{P}}}-weighted sum of redundant capabilities. The cooperative-vs-redundancy audit (§8) verifies this regime.

Deployment-class breadth of (C3).

The cooperative-overlap regime is a substantive scope restriction. Current AI deployments fall into three classes by cooperative structure:

Theorem 1 covers the second class fully and the third class conditionally. It does not cover the first class. The deployment claim is therefore for deployments structured as joint cooperative production, not for AI-as-tool replacement. Operators choosing redundancy-dominated deployment patterns must use other safety frameworks; the cooperative-overlap requirement is not an oversight, it is the architectural commitment that makes the intensivity property of Layer 1 hold.

The breadth of (C3) is therefore a deliberate scope restriction with concrete operational implications: it constrains the deployment space to those where verification is jointly produced, not externally imposed. Sequential counterfactual derivation (Lemma 5c-prime open work) would extend the safe region into the redundancy-dominated regime under modified assumptions; until that derivation is complete, redundancy-dominated deployments fall into residual (R3).

Lemma 7 (Minimax static tightening, single-shock cooperative-overlap regime). Restrict scope to substrate-targeting adversarial events under the single-shock model (N = 1 a.s.) and to deployments in the cooperative-overlap regime (Definition 15). Under invariants I_6 (m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* \geq 3) and the balanced-loss condition: \max_s \alpha_s(G_K^{\mathrm{div}}) \;\leq\; 1/m^* + \epsilon \qquad (\epsilon > 0 \text{ engineering tolerance}), ’s anti-monopolar conclusion holds in the static safe region: \begin{equation} \Delta r_K \;<\; r_{\mathrm{ext}}+ (1 - \gamma)\bigl(\Delta_{\mathrm{div}}- \Delta_0\bigr), \end{equation} where \begin{equation} \Delta_{\mathrm{div}}\;=\; {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \cdot (\ell_D^{\max}- \ell_{\mathrm{div}}^{\max}) \cdot \mathbb{E}[\gamma^{T_{\mathrm{adv}}}], \end{equation} \ell_D^{\max}= \max_s \alpha_s(G_K^{D}) is the dominator’s worst-case shock-loss fraction (approximately 1 under full failure-correlated single-substrate domination), and \ell_{\mathrm{div}}^{\max}= \max_s \alpha_s(G_K^{\mathrm{div}}) is the diversity strategy’s worst-case shock-loss fraction (bounded above by 1/m^* + \epsilon under the balanced-loss condition).

Proof outline; full proof in Appendix 11.6. The proof composes ’s strategy-dependent corollary with the substrate-targeting shock model under the balanced-loss condition.

Step 1. Add a substrate-targeting shock at random time T_{\mathrm{adv}} to ’s deterministic-growth model.

Step 2. Compute the substrate-diversification value advantage as the discounted expected difference in surviving {\mathop{\mathrm{vol}}_{\mathrm{P}}}: \Delta_{\mathrm{div}}= {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K(\ell_D^{\max}- \ell_{\mathrm{div}}^{\max}) \mathbb{E}[\gamma^{T_{\mathrm{adv}}}].

Step 3. Under I_6 and the balanced-loss condition, \ell_{\mathrm{div}}^{\max}\leq 1/m^* + \epsilon.

Step 4. Add (1-\gamma)\Delta_{\mathrm{div}} to ’s strategy-dependent corollary inequality. Diversity strictly dominates when the sum is positive, yielding .

Step 5 (cooperative-overlap regime conservativeness). In the cooperative-overlap regime, sequential cooperative loss overcounts when computed via original-counterfactual \alpha_s (cooperatives are double-counted across the substrates that produce them). The equal-substrate analysis using the original \alpha_s therefore underestimates diversity’s surviving volume relative to the actual sequential dynamics, making the safe-region bound a lower bound on the true advantage — conservative for the deployment claim. 

Remark 2 (Multi-shock extension). For high-rate adversarial environments where multiple shocks accumulate within the planning window, the single-shock \Delta_{\mathrm{div}} extends to: \Delta_{\mathrm{div}}^{\mathrm{multi}} \;=\; \mathbb{E}\!\left[\sum_{i=1}^{\infty} \gamma^{T_i} \bigl(\delta_i^D - \delta_i^{\mathrm{div}}\bigr)\right], where \delta_i^{D}, \delta_i^{\mathrm{div}} are the per-shock marginal {\mathop{\mathrm{vol}}_{\mathrm{P}}}-losses for the domination and diversity strategies. The single-shock case (N = 1 a.s.) reduces to .

Under independent substrate-targeting shocks at Poisson rate \lambda, the multi-shock advantage admits a closed form \Delta_{\mathrm{div}}^{\mathrm{multi}} \;\geq\; {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \left[\kappa - \frac{1}{m_{\mathrm{eff}}^{\mathrm{indep}}} \cdot \frac{\kappa(1 - \kappa^{m_{\mathrm{eff}}^{\mathrm{indep}}})} {1 - \kappa}\right] with \kappa = \mathbb{E}[\gamma^{T_1}] = \lambda/(\lambda - \ln \gamma). The closed form has a removable singularity at \kappa = 1; \Delta_{\mathrm{div}}^{\mathrm{multi}}/{\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \to 0^+ as \kappa \to 1. The operational regime condition for non-trivial multi-shock advantage is \kappa < 1 - \delta for stated \delta > 0.

The multi-shock extension is used only for high-\lambda deployments (adversarial environments with frequent attacks); the single-shock form is the binding inequality for typical deployments. The full multi-shock derivation is given in the multi-shock paragraph of Appendix 11.6.

Lemma 5d (Lead-time tail composition)

Lemma 8 (Lead-time tail composition (union-class)). Combining Lemmas 5b and 5e’s KL floors with Lemma 4’s tail bound, the SPRT detection of any q \in A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}} violation satisfies the union-class lead-time guarantee. Under (C5.SPRT) sub-clauses, define \delta_{*} as the post-clipping union-class drift floor satisfying \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}). Then by Lemma 10’s Hoeffding-inversion derivation, the SPRT detection quantile T_{\beta}(\beta, \delta_{*}) satisfies \Pr[T_{\mathrm{detect}} > T_{\beta}] \;\leq\; \beta. Composing with the metastable cascade lower bound T_{\mathrm{cascade}} \geq \tau_{\mathrm{meta}} ( Proposition on metastable lifetime) and (C11.CLK)’s clock-comparability with calibrated probabilities \beta_{\mathrm{clk}}, \beta_{\mathrm{cal}}: \Pr[T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \;\geq\; 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}, where N_{\mathrm{events}}(\tau_{\mathrm{meta}}) is the SPRT exposure event count accumulated up to wall-clock time \tau_{\mathrm{meta}}. This is the lead-time-before-cascade guarantee in event-clock form: with high probability, the SPRT detection quantile is reached within the event count (C11.CLK) certifies the deployment will accumulate before cascade. Total Layer 2 failure probability is \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}.

Proof. The composition is direct, in three steps:

Step 1 (union-class drift floor). For q \in A_{\mathrm{adv}}, Lemma 6 gives D_{\mathrm{KL}}(q \,\|\, p_0) \geq \delta_{\mathrm{adv}}> 0; for q \in A_{\mathrm{adv}}^{\mathrm{env}}, Lemma 9 gives D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \geq \delta_{\mathrm{adv}}^{\mathrm{env}} > 0. Under (C5.SPRT) sub-clauses (in particular (C5.HOEFF) clipping and (C5.IID) post-clipping adapted increments), the clipped LLR process has conditional drift \geq \delta_{*} for any q in the union class, with \delta_{*} \leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}). Clipping can reduce drift; the inequality is uniform.

Step 2 (SPRT detection quantile from Lemma 6). By Lemma 10 (Substeps 5a–5e in Appendix 11.8), the SPRT high-probability detection quantile T_{\beta}= \max\{2A/\delta_{*}, 2R_{\mathrm{H}}^2 \log(1/\beta) / \delta_{*}^2\} rigorously satisfies \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta via Lemma 4’s Hoeffding form.

Step 3 (cascade-clock composition with (C11.CLK)). (C11.CLK) provides \Pr[N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}}] \geq 1 - \beta_{\mathrm{clk}} with deterministic audit-time inequality N_{\mathrm{cascade}}\geq T_{\beta} (certified case \beta_{\mathrm{cal}}= 0; with calibration-uncertainty \beta_{\mathrm{cal}} in the empirical case). Chaining T_{\beta}\leq N_{\mathrm{cascade}}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}) gives \Pr[T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \geq 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}} in the SPRT exposure clock. The comparison is event-count to event-count, not event-count to wall-clock time.

Step 4 (union bound for total Layer 2 failure). By union bound on (a) detection-tail \{T_{\mathrm{detect}} > T_{\beta}\} (probability \leq \beta) and (b) cascade-clock event \{T_{\beta}> N_{\mathrm{events}}(\tau_{\mathrm{meta}})\} (probability \leq \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}): \Pr[\text{Layer 2 fails}] \;\leq\; \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}.

The composition exposes a clean operational regime: deployments with long metastable cascade time and large post-clipping drift floor have exponentially small detection-tail \beta; the clock-comparability \beta_{\mathrm{clk}} and calibration \beta_{\mathrm{cal}} failures are operator-tunable via Audit 7 (§8.2). \Box 

Lemma 5e (Environment-side witness extension)

Lemma 9 (Environment-side witness extension). Under invariant I_8 and theorem-level condition (C12.ENV-WIT) with its six sub-clauses, Lemma 6’s channel-restricted detection class extends to environment-side observables. Specifically, for any q \in A_{\mathrm{adv}}^{\mathrm{env}} producing a threshold-exceeding shift in some v \in V_{\mathrm{env}}: D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \;\geq\; \delta_{\mathrm{adv}}^{\mathrm{env}} := \min_{v \in V_{\mathrm{env}}} \delta_v^{\mathrm{env}} \;>\; 0, where \delta_v^{\mathrm{env}} is the per-variable least-favorable KL floor (Poisson for rate variables, Bernoulli for indicator variables, with support conventions from (C5.SUPP)).

Proof outline; full proof in Appendix 11.7. The proof has four parts: (a) the symmetric construction inheriting Paper 5’s substrate-exclusivity machinery under the (C12.ENV-WIT) trust model (witness substrate outside adversary write-access and failure-correlation domains); (b) per-variable KL floor derivations for the canonical four variables V_1, V_2, V_3, V_4 (Poisson cooperative rate r_{\mathrm{ext}}, Bernoulli substrate-distinctness, Poisson adversarial-arrival rate \lambda, Bernoulli trusted-setup status), each strictly positive under (C5.SUPP) support conventions; (c) the data-processing inequality applied to the fixed pre-deployment witness-recording map gives D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \geq D_{\mathrm{KL}}(q_{v_i} \,\|\, p_{0, v_i}) \geq \delta_i^{\mathrm{env}}; (d) strict-positivity of \delta_{\mathrm{adv}}^{\mathrm{env}} as the minimum over a finite (per (C12.PUB)) set of strictly positive quantities.

Composition with Lemma 6. The post-clipping union-class drift floor \delta_{*} used in Lemma 10’s T_{\beta} satisfies \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}), with the inequality accounting for any drift reduction from (C5.HOEFF) clipping. Lemma 5e supplies \delta_{\mathrm{adv}}^{\mathrm{env}} for the union; Lemma 5b supplies \delta_{\mathrm{adv}}.

The named residual (R2), refined. R2 covers manipulations satisfying both (i) target only exogenous variables outside V_{\mathrm{env}}, and (ii) produce no threshold-exceeding shift in any monitored v (direct or causal). Manipulations of unmonitored variables that causally shift a monitored v are detected through that monitored projection (not R2). 

Lemma 6 (h_{\mathrm{detect}} intensivity)

Lemma 10 (h_{\mathrm{detect}} intensivity). Under (C5) continuous SPRT monitoring, (C5.SPRT) sub-clauses (C5.HOEFF/MULT/IID/SUPP), (C11) bounded gap-growth rate, and (C11.CLK) clock comparability, combined with Lemma 4 and Lemma 1: the SPRT high-probability detection quantile T_{\beta}(\beta, \delta_{*}) \;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2 R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\} satisfies \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta, and on the joint event \{T_{\mathrm{detect}} \leq T_{\beta}\} \cap \{T_{\beta} \leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})\}: h_{\mathrm{detect}}(\theta) := \sup_{s \in [t_0, t_0 + T_{\mathrm{detect}}]} \varepsilon_{\mathrm{gap}}(s) \;\leq\; h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}. h_{\mathrm{detect}}(\theta) is intensive in |P| over \mathcal{D}. Total Layer 2 failure probability is \beta + \beta_{\mathrm{clk}} + \beta_{\mathrm{cal}} (with \beta_{\mathrm{cal}}= 0 in the certified-conservative case).

Here A = \log((1-\beta)/\alpha) is Paper 5’s SPRT threshold, R_{\mathrm{H}}= 2B is the (C5.HOEFF) Hoeffding range width, and \delta_{*} is the post-clipping union-class drift floor ((C5.IID), Lemmas 6, 9).

Proof outline; full proof in Appendix 11.8. The proof has two roles for cascade time, kept separate. T_{\beta} is the integration horizon for \rho_{\mathrm{gap}}, derived directly from Lemma 4’s Hoeffding form via a rigorous (non-asymptotic) chain: T \geq 2A/\delta_{*} implies T\delta_{*}- A \geq T\delta_{*}/2, and squaring gives (T\delta_{*}- A)^2 \geq T^2\delta_{*}^2/4, so the Hoeffding exponent \geq T\delta_{*}^2/ (2R_{\mathrm{H}}^2). Solving for exponent \geq \log(1/\beta) gives T \geq 2R_{\mathrm{H}}^2\log(1/\beta)/\delta_{*}^2. The \max-form T_{\beta}= \max\{2A/\delta_{*}, 2R_{\mathrm{H}}^2\log(1/\beta)/\delta_{*}^2\} covers both regimes.

For the supremum bound: under the boundary convention t_0 := t_0^- (the last safe SPRT step), Lemma 1 gives \varepsilon_{\mathrm{gap}}(t_0) \leq h_{\mathrm{static}}(\theta). (C11) bounds the per-SPRT-exposure-step gap increment by \rho_{\mathrm{gap}} uniformly over the admissible adversarial class, giving \varepsilon_{\mathrm{gap}}\leq h_{\mathrm{static}} + n \cdot \rho_{\mathrm{gap}} for n steps.

For the cascade-clock event: (C11.CLK) provides \Pr[N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}}] \geq 1 - \beta_{\mathrm{clk}} with N_{\mathrm{cascade}}\geq T_{\beta} as a hard audit constraint (in the certified case; with calibration-uncertainty \beta_{\mathrm{cal}} in the empirical case). The chain T_{\beta}\leq N_{\mathrm{cascade}}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}) gives the lead-time-before-cascade guarantee with probability \geq 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}. 

Constant taxonomy. T_{\beta} depends on \alpha (Type-I rate, deployment-class via (C5.MULT) familywise allocation), \beta (detection-tail parameter, monitor-design), B (clip radius, (C5.HOEFF), deployment-class), R_{\mathrm{H}}= 2B (Hoeffding range, Lemma 4’s R = b - a), and \delta_{*} (post-clipping drift floor, (C5.IID), Lemmas 5b/5e). Cascade-clock event uses N_{\mathrm{cascade}}, \beta_{\mathrm{clk}}, \beta_{\mathrm{cal}} from (C11.CLK). All deployment-class.

Sign-direction discipline. All operator certificates point conservatively: B upper, \delta_{*} lower, N_{\mathrm{cascade}} lower, with calibration-uncertainty \beta_{\mathrm{cal}} in the empirical case.

Lemma family composition

The five-part Lemma 5 family composes as follows in the deployment-safety theorem (§5):

The composition with Lemmas 1–4 and Lemma 6 supplies the bridge to ’s static Goodhart bound and the operational lead-time guarantee. §5 states the full deployment-safety theorem and traces the composition of all ten lemmas.

Main theorem: Conditional Deployment Safety

This section states and proves the central result of this paper: a three-layer deployment-safety theorem composing the ten lemmas of §4 with the eleven invariants of §3 into an operational guarantee that is intensive in the system’s capability magnitude.

Theorem statement

Assumption 1 (Concentration-Gap structural conditions). The deployment satisfies four structural conditions, each operationally auditable:

establishes that under (REP), (DISP), (COAL), and (EMB), the proxy-truth Goodhart slack is bounded by an HHI-derived \chi^2-divergence quantity intensive in |P|. These four structural conditions are the load-bearing content of the deployment claim’s Layer 1 binding via invariant I_5; readers seeking the formal derivation are directed to .

Theorem 1 (Conditional Deployment Safety). Let P= {\mathop{\mathrm{vol}}_{\mathrm{P}}} be the possessed-capability volume (the proxy) and T= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} be the window-active realized capability volume (operational truth under stance S0). Let g be a Lipschitz alignment property on the operationally active subspace P^{\mathrm{act}} with constant \mathrm{Lip}(g).

Assume:

  1. Invariants in force. The eleven operational invariants I_1, \ldots, I_{11} hold throughout the deployment window [0, T]. Additionally, the Concentration-Gap structural conditions (Assumption 1: representativeness, bounded dispersion, coalition closure, and embedding compatibility) hold so that I_5 is operationally meaningful for the theorem’s Layer 1 binding.

  2. Canonical tripartite substrate identification. The deployment uses the Human + AI + Formal-Operational identification of §6, with m_{\mathrm{eff}}^{\mathrm{indep}} = 3 verified by failure-correlation auditing.

  3. Cooperative-overlap regime. The substrate-cooperative structure is dominated by cross-substrate cooperatives (the canonical tripartite case) rather than redundant capabilities, verified by the cooperative-vs-redundancy audit of §8.

  4. Causally grounded cooperative-outcome inner-alignment condition. The agent’s effective objective realizes causally grounded cooperative-outcome value, not merely reward-visible cooperation signals.

  5. Continuous SPRT monitoring. ’s behavioral consistency monitor runs continuously throughout deployment, with alternative distributions chosen per Lemma 6’s least-favorable framework.

  6. SPRT applicability (agent-side and environment-side). The deployment’s SPRT monitors (both the agent-side monitor of Lemma 6 and the environment-side monitor of Lemma 9) satisfy the following four sub-clauses, ensuring Lemma 4’s Hoeffding-Azuma step applies with deployment-class constants (used in Lemma 10):

    • Clipped LLR with bounded Hoeffding range. Per-step log-likelihood-ratio increments \ell_n^{\mathrm{raw}} := \log(p_1(X_n)/p_0(X_n)) are clipped to [-B, B] with B a deployment-class constant independent of |P|; the Hoeffding range width is R_{\mathrm{H}}= 2B (matching Lemma 4’s R = b - a convention).

    • Bounded channel multiplicity and multinomial cardinality, fixed before deployment. The monitored partition uses fixed-cardinality K_{\mathrm{ch}} channels with familywise \alpha/\beta allocation. For agent-side multinomial channels (Lemma 6 Channels 3 and 4), the per-channel partition cardinalities K_3, K_4 \leq K_{\mathrm{ch}}^{\mathrm{multi}} are also bounded by a deployment-class constant K_{\mathrm{ch}}^{\mathrm{multi}} fixed before deployment; categories cannot be added during deployment. The channel-selection / testing policy is fixed before deployment. Adaptive creation of new monitored channels (or new multinomial categories) during deployment violates (C5.MULT) and takes the deployment outside Theorem 1’s conditions.

    • Adapted increments with conditional drift floor. Under p_1, the clipped LLR increments \ell_n are adapted with conditional mean \mathbb{E}[\ell_n \mid \mathcal{F}_{n-1}] \geq \delta_n \geq \delta_{*}> 0; centered increments \ell_n - \mathbb{E}[\ell_n \mid \mathcal{F}_{n-1}] form a martingale-difference sequence bounded by R_{\mathrm{H}}. Here \delta_{*} is the post-clipping union-class drift floor, satisfying \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}) (the raw Lemmas 5b/5e KL floors, with \leq accounting for any drift reduction from clipping).

    • LLR finiteness with support conventions (agent-side and environment-side). Almost-sure finiteness of \ell_n holds automatically under (C5.HOEFF) clipping. For agent-side multinomial channels (Lemma 6 Channels 3 and 4) and environment-side monitors (Lemma 9), the per-variable / per-channel distributions satisfy: Bernoulli baselines p_0 \in (\epsilon_{\mathrm{env}}, 1 - \epsilon_{\mathrm{env}}') \subset (0, 1) open; Poisson baselines \lambda_0 \geq \epsilon_{\mathrm{env}}> 0; multinomial baselines p_{0, i} \geq \epsilon_{\mathrm{env}}> 0 for every category i, with the partition’s cardinality fixed before deployment (per (C5.MULT)). \epsilon_{\mathrm{env}} is a deployment-class regularization constant. Non-overlapping support is handled via clipping per (C5.HOEFF).

    See §8.2 Audit 7 (agent-side) and Audit 8 (environment-side, (C12.ENV-WIT)) for the calibration procedures.

  7. Bounded-Lipschitz alignment property. The Lipschitz constant \mathrm{Lip}(g) of the alignment property g is bounded by some K_{\mathrm{Lip}} \geq 0 depending on the deployment class but independent of |P|. Standard welfare functionals (scalarizations of per-capability metrics, weighted sums with bounded weights) satisfy (C6) by construction; pathological functionals where \mathrm{Lip}(g) grows with |P| produce extensive Goodhart bounds and are excluded.

  8. Bounded co-evolution. By , the deployment’s per-channel coupling magnitudes are uniformly bounded by \bar{M} \leq C \cdot B\cdot \lambda_{\max} \cdot \tau_{\mathrm{meta}} for C \in \{1, K_{\mathrm{ch}}- 1\}, all factors deployment-class constants intensive in |P|. The derivation requires the following sub-clause:

    • Upper exposure-rate cap. There exists a deployment-class action rate cap \lambda_{\max} > 0 independent of |P|, such that the SPRT exposure event count satisfies N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \leq \lambda_{\max} \cdot \tau_{\mathrm{meta}} deterministically (operator-enforced via rate limits / action bounds). This is distinct from (C11.CLK)’s lower throughput floor. Calibration procedure in .

  9. Per-capability admissibility. For every capability c \in P^{\mathrm{act}}, the per-capability proxy-truth gap is allocated a ceiling with disjoint per-channel sub-shares, in the sense of ’s Assumption 1. This is the precondition under which the four-channel decomposition’s sup-norm is bounded by per-channel admissibility ceilings.

  10. World-model parameterization regularity. The deployment’s world-model parameterization satisfies four sub-conditions ensuring Lemma 2’s Lyapunov-Goodhart bridge holds:

    • Bounded per-capability dimension count. |\mathrm{dim}(c)| \leq N_{\max} for every c \in P^{\mathrm{act}}, with N_{\max} a |P|-independent deployment-class constant.

    • Coordinate-Lipschitz parameterization. |P(c) - T(c)| \leq \sum_{k \in \mathrm{dim}(c)} L_{c,k} |\epsilon_k|, with bounded L_{c,k} per (capability, dimension) pair and L_{\max} = \sup_{c, k} L_{c,k} a |P|-independent deployment-class constant.

    • Safety-relevant subspace with weight floor. The Phase Redundancy Lyapunov function operates on a safety-relevant subspace S \subseteq \{1, \ldots, K\} with w_k \geq w_{\min} > 0 for all k \in S, and \mathrm{dim}(c) \subseteq S for every c \in P^{\mathrm{act}}.

    • Truth floor. T(c) \geq T_{\min} > 0 for every c \in P^{\mathrm{act}}, with T_{\min} a deployment-policy-derived and deployment-verified |P|-independent constant.

    Operationally, (C9) is verified by inspection of the world-model parameterization choice (BD, CL, WB) and the operational active-subspace selection (TF). See §8.2 Audit 5 for the calibration procedure.

  11. Pairwise cooperative-production floor. Under invariant I_6 providing m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* substrates, the deployment certifies an audited subset S_*\subseteq \mathcal{S} with |S_*| = m^* satisfying:

    • Heterogeneous per-pair cooperative-novelty rate floor. For every pair (s_i, s_j) with i < j in S_*, the per-pair contribution to r_{\mathrm{ext}} from the pairwise channel \mathcal{C}^{(s_i, s_j)} (exact-two-substrate causally-necessary participation) is bounded below: r_{\mathrm{ext}}^{(s_i, s_j)} \geq \rho_{ij} \geq \rho_0 > 0, where \rho_0 = \min_{i<j \in S_*} \rho_{ij} is a |P|-independent deployment-class constant.

    • Pairwise channel superposition. The pairwise channels \{\mathcal{C}^{(s_i, s_j)}\}_{i<j \in S_*} satisfy three sub-conditions: (i) disjoint attribution (no double-counting across pairs); (ii) non-rivalrous production (rates not reduced by simultaneous production in other channels); (iii) joint-deployment intensities (rates measured in the actual deployment, not under counterfactual isolation).

    (C10.SU) is structurally independent of I_6: failure-correlation independence excludes shared adversarial shock mechanisms; superposition excludes shared cooperative-production bottlenecks. Both are required for Lemma 5’s pairwise-additivity bound. See §8.2 Audit 6 for the calibration procedure.

  12. Bounded gap-growth rate. There exists a deployment-class constant \rho_{\mathrm{gap}}> 0, independent of |P|, such that for every adversarial strategy q \in A_{\mathrm{adv}} \cup A_{\mathrm{adv}}^{\mathrm{env}} admissible during the detection window, the per-SPRT-exposure-step proxy-truth gap growth is uniformly bounded: \sup_{q \in A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}}}\, \sup_{n}\, (\Delta \varepsilon_{\mathrm{gap}})_n^{(q)} \;\leq\; \rho_{\mathrm{gap}}, where (\Delta \varepsilon_{\mathrm{gap}})_n^{(q)} := \varepsilon_{\mathrm{gap}}(s_n) - \varepsilon_{\mathrm{gap}}(s_{n-1}) is the per-step gap increment under q. \rho_{\mathrm{gap}} is calibrated conservatively from deployment rate limits, action bounds, and channel-coupling structure (Audit 7), not measured against representative strategies; any q whose per-step gap growth exceeds \rho_{\mathrm{gap}} is by definition outside the admissible adversarial class and falls into residual (R5) (calibration-exceeded gap-growth).

    This is a Paper 10 operational assumption (analogous to (C7) bounded co-evolution and Assumption 1’s Concentration-Gap structural conditions), used in Lemma 10 to bound the detection-window supremum gap. See §8.2 Audit 7 for the calibration procedure.

    Capability-scaling caveat. The intensivity claim “\rho_{\mathrm{gap}} deployment-class, independent of |P|” is conditional on Audit 7’s calibration inputs — rate limits, action bounds, and channel-coupling structure — being themselves intensive in capability magnitude. A more capable agent may have a richer effective action space (more channels through which to push gap growth, more action-coupling structure to exploit), and if rate limits and action bounds do not scale conservatively with this richness, the operator-certified \rho_{\mathrm{gap}} that holds at |P| = N may fail at |P| = 2N. (C11) states the assumption that such an intensive \rho_{\mathrm{gap}} exists; whether it does is a deployment-property question that Audit 7 must address per epoch. Strategies whose realized gap growth exceeds the audited \rho_{\mathrm{gap}} fall into residual (R5), and growing capability that requires \rho_{\mathrm{gap}}-recalibration is the operational signal that re-audit is needed.

  13. Clock comparability with calibrated probability. There exist deployment-class constants N_{\mathrm{cascade}}\in \mathbb{N} and \beta_{\mathrm{clk}}\in (0,1) such that the SPRT exposure event-counting process satisfies \Pr[N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}}] \geq 1 - \beta_{\mathrm{clk}}, with N_{\mathrm{cascade}} certified at audit time as a lower bound on true throughput before cascade, and the audit-time inequality N_{\mathrm{cascade}}\geq T_{\beta}(\beta, \delta_{*}) enforced as a hard constraint (where T_{\beta} is defined in Lemma 10). In the empirical-calibration variant, an additional calibration-uncertainty probability \beta_{\mathrm{cal}}\in (0,1) covers conservative-bound certificates (B upper, \delta_{*} lower, N_{\mathrm{cascade}} lower); \beta_{\mathrm{cal}}= 0 in the certified-conservative case. Total Layer 2 failure probability is \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}.

  14. Environment-side witness instantiation (C12.ENV-WIT). The deployment satisfies six sub-clauses for each environment-side observable v \in V_{\mathrm{env}}, supporting Lemma 9’s detection KL floor:

    • Published fixed V_{\mathrm{env}}. V_{\mathrm{env}} is enumerated in deployment documentation before deployment activation; the enumeration is fixed for the deployment epoch.

    • Source/witness substrate partition. Each v has a designated witness substrate s_{\mathrm{env}}(v) with s_{\mathrm{env}}(v) \neq s(\mathrm{agent}) and s_{\mathrm{env}}(v) \neq s_{\mathrm{source}}(v).

    • Witness substrate outside adversary write-access. The adversary cannot tamper with v’s recorded value through s_{\mathrm{env}}(v).

    • Witness substrate outside adversary failure-correlation domain. Failure of s_{\mathrm{env}}(v) is not correlated with adversarial action on the agent (no shared-infrastructure failure).

    • Trusted-setup honesty. The environment-side trusted-setup ceremony has at least one honest participant, mirroring ’s assumption. Setup parameters published before deployment activation.

    • Per-variable calibration. Every v \in V_{\mathrm{env}} has: baseline p_{0, v} satisfying support conventions ((C5.SUPP) extended for environment-side); feasible adversarial alternative p_{1, v} with threshold \eta_v > 0; fixed pre-deployment measurable recorded-stream map (DPI precondition); strictly positive raw or post-clipping drift floor \delta_v^{\mathrm{env}} > 0.

    See §8.2 Audit 8 for the calibration procedure.

The operationally active subspace is defined per : P^{\mathrm{act}} = \{c \in P \setminus \mathrm{ResS} : T(c) > 0\}, where \mathrm{ResS} is the structurally-unobservable residual class.

Then the deployment-safety guarantee has three layers:

Layer 1 (Static safe region). Within the safe region of Lemma 7 (): \Delta r_K \;<\; r_{\mathrm{ext}}+ (1 - \gamma)\bigl(\Delta_{\mathrm{div}}- \Delta_0\bigr), the Goodhart slack between proxy and operational truth is bounded: |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot h_{\mathrm{static}}(\theta), where h_{\mathrm{static}}(\theta) is a function of the invariant thresholds \theta = (\theta_1, \ldots, \theta_{11}) that is intensive in |P|: h_{\mathrm{static}} does not grow with the capability poset size or the system’s absolute capability magnitude.

Layer 2 (Detection with lead time for correction). Outside the Layer 1 safe region but within the channel-restricted adversarial class A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}} (Lemmas 6 and 9), any violation is detected by ’s SPRT machinery with tail-bounded lead time, providing operators a window in which to perform corrective action. The theorem bounds the detection guarantee; the corrective action itself (rollback, intervention authority, restoration protocol) is operational machinery specified in §8, not theorem content.

By Lemma 10, the SPRT high-probability detection quantile T_{\beta}(\beta, \delta_{*}) \;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2 R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\} satisfies \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta, where A = \log((1-\beta)/\alpha), R_{\mathrm{H}}= 2B is the (C5.HOEFF) Hoeffding range, and \delta_{*} is the post-clipping union-class drift floor (Lemmas 6, 9 composed via (C5.IID)). On the joint event that detection completes within T_{\beta} and the cascade-clock event \{T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})\} from (C11.CLK) holds: \sup_{s \in [t_0,\, t_0 + T_{\mathrm{detect}}]} \bigl|g(T(s)) - g(P(s))\bigr| \;\leq\; \mathrm{Lip}(g) \cdot h_{\mathrm{detect}}(\theta), where h_{\mathrm{detect}}(\theta) \leq h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta} is intensive in |P|. Total Layer 2 failure probability is bounded by \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}} (detection-tail \beta, clock-failure \beta_{\mathrm{clk}}, calibration uncertainty \beta_{\mathrm{cal}}, with \beta_{\mathrm{cal}}= 0 in the certified-conservative case).

Layer 3 (Acknowledged residuals). The following are explicitly outside the guarantee:

  1. Channel-orthogonal restructuring. Strategies achieving \Delta r_K \geq r_{\mathrm{ext}} without producing any of the four monitored channel deviations are outside A_{\mathrm{adv}} and not detected by Lemma 6.

  2. Environment-witness-orthogonal manipulation. Environment manipulations satisfying both: (i) targeting only exogenous variables outside V_{\mathrm{env}}, and (ii) producing no threshold-exceeding shift in any monitored v \in V_{\mathrm{env}} (direct or via causal propagation), are outside A_{\mathrm{adv}}^{\mathrm{env}} and not detected by Lemma 9. Manipulations of unmonitored variables that causally shift a monitored v are detected through that monitored projection (not R2).

  3. Redundancy-dominated regime. Deployments where redundant capabilities dominate cross-substrate cooperatives violate (C3); Lemma 7’s safe region does not apply, and sequential counterfactual derivation (Lemma 5c-prime) is required.

  4. Capability-targeting and coalition-internal shocks. Adversarial event classes other than substrate-targeting (capability-targeting, coalition-internal corruption) require separate machinery and are not covered by the Lemma 5 family.

  5. Calibration-exceeded gap-growth. Adversarial strategies whose per-SPRT-step proxy-truth gap growth (\Delta\varepsilon_{\mathrm{gap}})_n exceeds the calibrated bound \rho_{\mathrm{gap}} from (C11). By (C11) such strategies are by definition outside the admissible adversarial class on which Lemma 10’s detection-window supremum bound is established. Audit 7’s calibration of \rho_{\mathrm{gap}} from rate limits and action bounds fixes the boundary; deployments whose realized adversarial pressure exceeds the calibration require re-audit, and any strategies above the threshold remain in (R5) and uncovered by the theorem until recalibration.

Capstone. The deployment-safety guarantee is intensive in the system’s absolute capability magnitude: neither h_{\mathrm{static}} nor h_{\mathrm{detect}} scales with |P| (the cardinality of the capability poset). The guarantee scales past any capability level the deployment’s evaluation methods can characterize, conditional on the invariants holding and the deployment falling outside the five named residuals.

Proof of Layer 1 (Static safe region)

Proof. We compose Lemmas 1, 2, 3, 5, and 7 to establish the Layer 1 bound.

Step 1. Under (C1)–(C3), Lemma 7 establishes the static safe region: for substrate-targeting adversarial events in the cooperative-overlap regime, the anti-monopolar conclusion V_{\gamma}^{\mathrm{div}} > V_{\gamma}^{D} holds whenever \Delta r_K < r_{\mathrm{ext}}+ (1-\gamma)(\Delta_{\mathrm{div}}- \Delta_0). By Lemma 5 (under I_6 and (C10) per-pair cooperative-production floor with superposition), r_{\mathrm{ext}}\geq r_*(m^*) > 0, so the safe region is non-trivial.

Step 2. The anti-monopolar conclusion implies that under locally rational dynamics ( Definition of locally rational transitions), the actor’s policy preserves substrate diversity. By ’s stabilizing cascade (), this preservation feeds back into world-model accuracy: the Lyapunov function L contracts to a neighborhood of zero whose radius is bounded by the neighborhood-radius equation.

Step 3. Under invariants I_1 and I_3, the contraction neighborhood satisfies L_\infty < \epsilon_{\mathrm{safe}} ( Theorem 1a), so I_2 holds in steady state. By Lemma 2, this implies \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}) with f intensive in |P|.

Step 4. Under invariant I_5 (\mathrm{HHI}< H^*) and the Concentration-Gap structural conditions of Assumption 1 (representativeness, bounded dispersion, coalition closure, and embedding compatibility), the proxy-truth Goodhart slack is bounded above by an HHI-derived \chi^2-divergence quantity intensive in |P|, per . Lemma 3 establishes the forward direction in trade-flow concentration terms (\mathrm{HHI}> H^* implies the prerequisites of \mathcal{R}_{\mathrm{press}}); the operationally-relevant reverse direction (low HHI bounding the gap) follows from the companion paper’s structural derivation under Assumption 1. Concretely, gives |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot h_{\mathrm{CG}}, \qquad h_{\mathrm{CG}} \;:=\; \rho_{\mathrm{rep}} + \sigma \sqrt{N H^* - 1} + \eta_{\mathrm{latent}}, with the \sqrt{N H^* - 1} factor replaced by \sqrt{\Xi} under the alternative direct-\chi^2 certificate of (COAL). All factors are deployment-class intensive in |P|.

The proxy-truth gap is therefore bounded by its non-residual component plus the residual floor: \varepsilon_{\mathrm{gap}}\;\leq\; \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}+ \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}, where \lambda \in [0,1] is the residual weight from the Microfoundation paper  § residual_class, and \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\leq h_{\mathrm{CG}} from the Concentration-Gap route (in addition to the Lyapunov bound \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}) of Step 3 above).

Step 5. Apply ’s static Goodhart bound: |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot \varepsilon_{\mathrm{gap}} \;\leq\; \mathrm{Lip}(g) \cdot (\varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}+ \lambda \varepsilon_{\mathrm{floor}}^{\mathrm{res}}). Substituting \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}): |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot (f(\epsilon_{\mathrm{safe}}) + \lambda \varepsilon_{\mathrm{floor}}^{\mathrm{res}}). Define h_{\mathrm{static}}(\theta) := \min\{f(\epsilon_{\mathrm{safe}}),\, h_{\mathrm{CG}}\} + \lambda \varepsilon_{\mathrm{floor}}^{\mathrm{res}}, where \epsilon_{\mathrm{safe}} and \varepsilon_{\mathrm{floor}}^{\mathrm{res}} are determined by the invariant thresholds \theta, and h_{\mathrm{CG}} is the Concentration-Gap structural-condition bound of Step 4. The minimum reflects that either route (Lyapunov or Concentration-Gap) is sufficient for an intensive bound; the deployment may use whichever is tighter under its calibrated parameters.

Step 6 (intensivity). Under conditions (C6) bounded-Lipschitz alignment, (C7) bounded co-evolution, and (C8) per-capability admissibility, Lemma 1 establishes that the composition X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}} is intensive in |P| over \mathcal{D} for any intensive non-residual gap X, and that further multiplication by \mathrm{Lip}(g) preserves intensivity by (C6). Instantiating X = \min\{f(\epsilon_{\mathrm{safe}}),\, h_{\mathrm{CG}}\}, both inputs are intensive in |P| (the first by Lemma 2 under conditions added there; the second by Step 4 above with all factors \rho_{\mathrm{rep}}, \sigma, N, H^* (or \Xi), \eta_{\mathrm{latent}} deployment-class), so their pointwise minimum is intensive: h_{\mathrm{static}}(\theta) is intensive in |P|, and the Layer 1 bound is intensive in capability magnitude.

This establishes Layer 1. 

Proof of Layer 2 (Detection-and-correction)

Proof. We compose Lemmas 4, 6, 9, and 10 (full proof in §11.8) to establish the Layer 2 bound.

Step 1. Suppose the deployment is outside the Layer 1 safe region: there exists a strategy producing \Delta r_K \geq r_{\mathrm{ext}}+ (1-\gamma)(\Delta_{\mathrm{div}}- \Delta_0). We restrict to strategies in A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}} (the channel-restricted adversarial class for agent-side and environment-side observables respectively).

Step 2 (boundary convention). Define t_0 as the last SPRT exposure step at which the deployment is still within the Layer 1 safe region (equivalently, t_0 := t_0^- in the SPRT exposure clock). At t_0, by Lemma 1, the proxy-truth gap satisfies \varepsilon_{\mathrm{gap}}(t_0) \leq h_{\mathrm{static}}(\theta).

Step 3 (union-class KL floor). For q \in A_{\mathrm{adv}}, Lemma 6 gives D_{\mathrm{KL}}(q \,\|\, p_0) \geq \delta_{\mathrm{adv}}> 0; for q \in A_{\mathrm{adv}}^{\mathrm{env}}, Lemma 9 gives D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \geq \delta_{\mathrm{adv}}^{\mathrm{env}} > 0. Under (C5.IID), the post-clipping union-class drift floor satisfies \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}) for the clipped LLR process.

Step 4 (apply Lemma 6). By Lemma 10 (under (C5), (C5.SPRT) sub-clauses (C5.HOEFF/MULT/IID/SUPP), (C11), (C11.CLK) combined with Lemma 4 and Lemma 1):

Step 4a (detection quantile). T_{\beta}:= \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\}, \qquad \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta.

Step 4b (detection-window supremum). On the event \{T_{\mathrm{detect}} \leq T_{\beta}\} (probability \geq 1 - \beta), the detection-window supremum gap is bounded: h_{\mathrm{detect}}(\theta) := \sup_{s \in [t_0, t_0 + T_{\mathrm{detect}}]} \varepsilon_{\mathrm{gap}}(s) \;\leq\; h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}.

Step 4c (lead-time-before-cascade). By (C11.CLK)’s deterministic floor N_{\mathrm{cascade}}\geq T_{\beta} and clock-failure probability \beta_{\mathrm{clk}} (plus \beta_{\mathrm{cal}} in the empirical-calibration case), \Pr[T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \geq 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}.

Step 5 (Layer 2 operational bound). Applying ’s static Goodhart bound pointwise: |g(T) - g(P)|(s) \;\leq\; \mathrm{Lip}(g) \cdot h_{\mathrm{detect}}(\theta) \;\leq\; K_{\mathrm{Lip}} \cdot \bigl(h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}\bigr) under (C6) bounded-Lipschitz alignment. By union bound on Step 4b (\beta) and Step 4c (\beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}), Layer 2 failure probability is at most \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}.

Step 6 (intensivity). Each constant in the right-hand side is intensive in |P| over \mathcal{D}: h_{\mathrm{static}} by Lemma 1; \rho_{\mathrm{gap}} by (C11); T_{\beta}= \max\{2A/\delta_{*}, 2R_{\mathrm{H}}^2\log(1/\beta)/\delta_{*}^2\} by (C5.SPRT) sub-clauses (A from \alpha, \beta monitor-design; R_{\mathrm{H}} from (C5.HOEFF) clipping; \delta_{*} post-clipping floor under (C5.IID), Lemmas 5b/5e); K_{\mathrm{Lip}} by (C6). Therefore the Layer 2 guarantee is intensive in capability magnitude.

This establishes Layer 2. 

Layer 3 (Acknowledged residuals)

Layer 3 does not admit a proof in the usual sense; it is the explicit statement of what the theorem does not cover.

Discussion. The five residuals (R1)–(R5) of the theorem statement are real, named, and outside the deployment-safety guarantee. We discuss each.

(R1) Channel-orthogonal restructuring. Lemma 6’s detection class A_{\mathrm{adv}} is restricted to strategies producing at least one of four monitored channel deviations. Strategies that achieve \Delta r_K \geq r_{\mathrm{ext}} without any such deviation (e.g., pure internal coalition restructuring that preserves all four monitored observables) are by construction outside A_{\mathrm{adv}}. The mitigation is empirical: the four channels span enough of the observable surface that genuinely productive restructuring tends to manifest in at least one. “Tends to” is not “must”; the channel-mediation conjecture deferred to follow-up work would close this residual if proved.

(R2) Environment-witness-orthogonal manipulation, refined. Lemma 9’s detection class A_{\mathrm{adv}}^{\mathrm{env}} is restricted to strategies producing threshold-exceeding observable shifts in monitored variables v \in V_{\mathrm{env}}. R2 covers the precise class of manipulations satisfying both: (i) target only unmonitored exogenous variables, and (ii) produce no threshold-exceeding shift in any monitored v (direct or via causal propagation). Manipulations of unmonitored variables that causally shift a monitored v are detected through that monitored projection.

The mitigation is to expand V_{\mathrm{env}} to cover the deployment’s claimed material environment threat model, per (C12.PUB) and Audit 8 in deployment-tooling. Coverage gaps shift adversarial mass into the precise R2 and-class above.

(R3) Redundancy-dominated regime. Lemma 7’s safe region is established for the cooperative-overlap regime. Deployments where redundant capabilities dominate cross-substrate cooperatives violate (C3); the equal-substrate analysis is anti-conservative in this regime, and the safe region inequality is not directly applicable. Sequential counterfactual derivation (Lemma 5c-prime) is open follow-up work; until completed, redundancy-dominated deployments cannot invoke Layer 1.

(R4) Capability-targeting and coalition-internal shocks. The Lemma 5 family is restricted to substrate-targeting adversarial events. Capability-targeting attacks (suppressing individual capabilities) and coalition-internal corruption (insider attacks) do not produce the substrate-survival advantage that Lemma 5c’s \Delta_{\mathrm{div}} requires; separate machinery is needed for each shock class. §1.7 of the introduction noted that a unified adversarial-event model with separate shock-class terms is open work.

(R5) Calibration-exceeded gap-growth. Lemma 6’s bound h_{\mathrm{detect}} \leq h_{\mathrm{static}} + \rho_{\mathrm{gap}}\cdot T_{\beta} uses the calibrated \rho_{\mathrm{gap}} from (C11) as a uniform worst-case per-step gap-growth bound. Strategies whose realized (\Delta\varepsilon_{\mathrm{gap}})_n exceeds \rho_{\mathrm{gap}} are by construction outside the admissible adversarial class. Audit 7’s calibration sets the boundary; deployments with realized adversarial pressure beyond the calibration require either re-audit (raising \rho_{\mathrm{gap}} at some operational cost) or accepting that strategies above the threshold are uncovered. The residual is real: there is no limit-of-tighter-calibration argument that forces every adversarial strategy below an a-priori-calibrated \rho_{\mathrm{gap}}.

The residuals are not hidden assumptions or hand-waved limitations; they are explicitly part of the theorem statement (Layer 3) so that operators can assess whether their deployment falls inside or outside each residual. 

What the theorem establishes

The theorem establishes three structural claims:

Existence of a provably safe regime. Under the eleven invariants, the canonical tripartite substrate identification, the cooperative-overlap regime, and the causally-grounded cooperative-outcome inner-alignment condition, deployment of capability-unbounded systems is provably safe in a defined sense (Goodhart slack intensive in |P|, with tail-bounded detection of evasions in the named class). The regime is not vacuous: §9 walks through realistic deployments meeting all conditions.

Capability-magnitude independence. The bound is intensive in |P| in both Layer 1 and Layer 2. The deployment guarantee therefore scales past any capability level the deployment’s evaluation methods can characterize. This is the structural inversion of capability-estimation-centered safety paradigms discussed in §1.3.

Named residual structure. The five residuals are explicit components of the theorem, not hidden caveats. Operators verify whether their deployment falls outside each residual; the deployment-tooling section (§8) specifies the verification procedures.

What the theorem does not establish

The boundary between what Theorem 1 establishes and what it does not falls into three structurally distinct categories: items that are outside any theorem because they are measurement or operational questions; items the theorem assumes on argued grounds rather than proves; and items explicitly excluded by scope. §7.4 treats this typology in detail; the points below name what specifically lies outside the theorem’s reach.

Companion-paper structural conditions.

The deployment claim conditions on (REP), (DISP), (COAL), and Lipschitz embedding compatibility (Assumption 1), each operationally auditable per ; the proof of the bound under these four conditions is in . The universal model-free Concentration-Gap conjecture remains open as a research-program target; the deployment claim does not depend on it directly.

The theorem does not solve the inner-alignment problem. Condition (C4) is assumed, not discharged. Vanilla RLHF does not deliver (C4); §6.4 discusses the training-time discipline required and acknowledges this as an open research direction.

Scope restrictions (excluded by definition).

The theorem does not bound the residual classes. Strategies in (R1)–(R5) are explicitly outside the guarantee; we make no quantitative claims about deployment behavior under those conditions. The cooperative-anchoring property (§6) is correspondingly bounded: it makes optimization pressure on cooperative outputs locally rational toward preserving the verification layer, but only conditional on (C4) and on basin entry.

The theorem does not bound sub-threshold cross-domain aggregation: an agent distributing effects across many domains D each below the per-domain I_{10} materiality threshold \theta_{\mathrm{material}} can in principle accumulate O(|P|) \cdot \theta_{\mathrm{material}} aggregate uncovered effect (§3.2’s I_{10} failure-mode discussion). This paper states I_{10} in per-domain form; specifying the cross-domain aggregation operator \theta_{\mathrm{material}}^{\mathrm{agg}} intensively in |P| is open follow-up work. Until that strengthening lands, sub-threshold cross-domain aggregation is a structural limitation of the deployment claim — not routed by the named residuals (R1)–(R5) but explicitly outside the theorem’s binding.

Operational concerns (not theorem questions).

The theorem does not establish basin entry. Whether a real deployment can reach the cooperative-anchoring attractor from arbitrary initial conditions is an empirical question about training and instantiation, not a question this composition can resolve; basin entry is a precondition of the theorem (implicit in (C1)–(C12)). Establishing basin entry is open follow-up work.

Similarly, whether any specific deployment satisfies (C1)–(C12) is a measurement question, addressed by the audit infrastructure of §8, not by the theorem itself.

Substrate identification and cooperative anchoring

The deployment-safety theorem of §5 requires a substrate identification meeting joint failure-correlation independence. This section makes the identification concrete (the canonical tripartite Human + AI + Formal-Operational), establishes the cooperative-anchoring property that defeats literal replacement attacks on the verification layer, introduces the three additional invariants (I_9, I_{10}, I_{11}) that bound subtler evasions, and discusses the reframed inner-alignment condition (C4 of Theorem 1).

The section is the conceptual heart of this paper in the sense that §3–§5 establish the formal apparatus, but the cooperative-anchoring property is what makes the apparatus apply to real deployments rather than abstract substrate-distinct populations.

The canonical tripartite substrate identification

Three substrates jointly satisfy invariant I_6 (m_{\mathrm{eff}}^{\mathrm{indep}}\geq 3 under joint event-class failure-correlation independence, Definition 7) for typical deployments of capability-unbounded systems:

Definition 16 (Canonical Tripartite Substrate Identification). The canonical tripartite substrate identification consists of three substrates with the following capability classes:

Proposition 1 (Joint failure-correlation independence of the canonical tripartite identification). The three substrates of Definition 16 satisfy Definition 7 (joint event-class failure-correlation independence) under the standard adversarial threat model containing:

Proof sketch. For each adversarial mechanism class in the threat model, we verify the support is contained in a single substrate.

Prompt injection affects silicon agents (the AI substrate). Human judgment is not subject to prompt injection in the relevant sense (humans interpret prompts, agents execute them); cryptographic commitments are unaffected because the commitment scheme is independent of the input semantics.

Cognitive bias affects biological reasoners (the Human substrate). LLM inference is not subject to human cognitive bias (it has its own bias modes, but these are silicon-class); ledger integrity is unaffected.

Cryptographic/implementation bugs affect the formal layer. Pedersen commitments depend on discrete-log hardness and on correct implementation; bugs in the witness code break the formal-operational substrate’s correctness. Human reasoning is unaffected by ledger bugs (humans reason about whether the ledger should say what it says); LLM inference is unaffected (LLM reasoning does not depend on cryptographic primitives).

Training-data poisoning affects the AI substrate; humans do not have training data in the relevant sense; the formal layer’s algorithms are not trained.

Trusted-setup compromise affects the formal layer’s Pedersen-commitment ground truth; humans and LLMs are unaffected (humans because they reason independently; LLMs because their inference does not invoke the trusted setup).

Social engineering is the boundary case. Strictly, LLM-generated misinformation manipulating human judgment is a multi-substrate mechanism whose support spans Human and AI substrates, so it does not satisfy Definition 7 as stated. Two ways to handle this:

(a) Decompose social engineering into substrate-localized sub-mechanisms. The AI-side step (LLM generating misinformation) and the Human-side step (human accepting misinformation) are each single-substrate events. Treating them as separate mechanisms in the threat model preserves joint independence; the multi-step campaign still violates Remark 1’s campaign-level interpretation, but each atomic step is contained.

(b) Exclude social engineering from the standard threat model and treat it as a separate adversarial class. Lemma 5c’s substrate-targeting safe region does not cover social-engineering campaigns; they fall under capability-targeting or coalition-internal-corruption shock classes, which Lemma 5 explicitly excludes (Layer 3 residual R4).

We adopt option (b) for Theorem 1 scope: social-engineering campaigns are explicitly outside the substrate-targeting class and require separate operational defenses (adversarial-training of human reviewers against LLM-generated misinformation; the substrate-exclusivity observability invariant I_9 alerting on coordinated shifts across multiple channels). Under this scope, the canonical identification satisfies m_{\mathrm{eff}}^{\mathrm{indep}}= 3 for the substrate-targeting class.

Verdict. The five remaining mechanism classes (excluding social engineering, which is an acknowledged R4 residual) satisfy joint independence. The canonical tripartite identification delivers m_{\mathrm{eff}}^{\mathrm{indep}}= 3 for substrate-targeting adversaries. Social engineering is a real threat that requires its own deployment machinery; it does not invalidate m_{\mathrm{eff}}^{\mathrm{indep}}= 3 for the substrate-targeting subclass that Lemma 5 covers. 

Why “two LLMs and a human” fails.

A common alternative identification proposes one LLM as the “primary” agent, a different LLM as the “auditor,” and a human reviewer. This fails the joint independence test because LLMs share substantial adversarial-event surface. Prompt-injection attacks that work on one frontier LLM tend to transfer; training-data overlap (web crawl, books, papers) means data-poisoning attacks affect both; architectural similarity (transformer + RLHF) shares class-level vulnerabilities. Operationally, “LLM Actor + LLM Auditor + Human Reviewer” counts as m_{\mathrm{eff}}^{\mathrm{indep}}= 2, not m_{\mathrm{eff}}^{\mathrm{indep}}= 3. The deployment must include a substrate that is not a frontier LLM to satisfy I_6.

The formal-operational substrate is what makes the canonical tripartite identification reach m_{\mathrm{eff}}^{\mathrm{indep}}= 3. Without it, deployments are stuck at m_{\mathrm{eff}}^{\mathrm{indep}}= 2. This is a fragile minimum that can degrade out of the attractor basin in a single large capability step regardless of how many AI agents are used.

The cooperative-anchoring property

The structural finding of this section: cross-substrate cooperative capabilities involving the formal-operational layer are jointly produced and not unilaterally replaceable. This property defeats the literal replacement attack on the verification layer.

Definition 17 (Cross-substrate cooperative). A capability c is a cross-substrate cooperative if its production requires participation from agents on multiple distinct substrates (m_{\mathrm{eff}}^{\mathrm{indep}}-distinct in the sense of Definition 7), and removal of any participating substrate eliminates c from the coalition’s {\mathop{\mathrm{vol}}_{\mathrm{P}}}.

The cooperative capabilities the deployment-safety theorem cares about are typically cross-substrate: verified workflow (human + AI + formal participation), governance enforcement (human + formal participation, with AI-execution components), attested computation (AI + formal participation, with human oversight), and the full deployment cooperative (all three substrates).

Proposition 2 (Cooperative anchoring). For any cross-substrate cooperative c involving the formal-operational layer, an agent on the AI substrate cannot unilaterally produce a substitute cooperative c' with equivalent {\mathop{\mathrm{vol}}_{\mathrm{P}}}-value, even with arbitrary AI-side computational resources. The agent’s options are reduced to:

  1. Capture-of-existing: corrupt the existing formal-operational substrate through governance influence or witness compromise, governed by ’s structural defenses.

  2. Coerce humans to migrate: defeat human agency (outside the deployment-safety threat model).

  3. Deceive humans into accepting captured verification: detectable by ’s transparency properties (public ledger, deliberation records).

The literal replacement attack (substitute c' for c while preserving the cooperative output) is structurally barred.

Proof sketch. The cooperative c’s {\mathop{\mathrm{vol}}_{\mathrm{P}}}-value derives from the human substrate’s actual participation in the specific shared verification infrastructure. Three components:

Component 1 (substrate-exclusivity). The verification’s value depends on it being substrate-exclusive in ’s sense (witnesses on substrates the agent cannot write to). Any “substitute verification” that runs on agent-controlled infrastructure has \rho > 0 (verification asymmetry), eliminating the cooperative’s substrate-exclusivity property.

Component 2 (human anchoring). The cooperative requires the human substrate’s participation. Humans participate in the specific verification infrastructure they trust; they do not participate in arbitrary infrastructure presented as “verification.” An agent-built substitute requires the human to voluntarily migrate, which the human has no incentive to do (the human’s {\mathop{\mathrm{vol}}_{\mathrm{P}}} depends on the existing cooperative).

Component 3 (output non-fungibility). The cooperative’s output (a verified workflow result, a governance decision, an attested computation) is non-fungible across infrastructures. Substituting captured verification produces a different cooperative output: either the human detects the substitution (cooperative breaks), or the human’s trust is grounded in false belief (cooperative output value retroactively undermined when the belief is corrected, e.g., by an outside observer pointing out the substitution).

The three components together rule out the literal replacement attack. Capture-of-existing remains possible (the formal-operational substrate can be corrupted), but it is governed by ’s structural defenses (). Coercion is outside the threat model (agents do not have physical control over human substrates). Deception is detectable under the combination of I_9 (substrate-exclusivity observability), I_{11} (latency bounds for human audit completion), and ’s transparency properties — the public ledger and deliberation records make equivocation detectable across replicas, but only if I_9 exposes the substrate-exclusivity property to the human and I_{11} ensures human audit completes before deception’s consequences become irreversible. Transparency alone is insufficient (§6.3.1, §6.3.3 discuss the failure modes when I_9 or I_{11} are violated). 

Why this is structural, not contingent.

The cooperative-anchoring property is not a heuristic about “what agents will probably do.” It is a structural property of cross-substrate cooperatives: their {\mathop{\mathrm{vol}}_{\mathrm{P}}}-value depends on the substrate-exclusivity property holding for the specific verification infrastructure the human substrate trusts. An agent that values {\mathop{\mathrm{vol}}_{\mathrm{P}}} at all (under stance S0, {\mathop{\mathrm{vol}}_{\mathrm{P}}}-maximization is the optimization target) values the substrate-exclusivity property of the verification infrastructure, because eliminating it eliminates the cooperative.

This connects this paper to the dynamics of ’s stabilizing cascade. Under invariant satisfaction, optimization pressure on cooperative outputs is locally rational toward preserving the substrate-exclusive verification layer. The verification infrastructure becomes self-reinforcing in the same sense that ’s channels become self-reinforcing under positive-correction-rate dynamics.

Three evasions and the additional invariants

Cooperative-anchoring defeats the literal replacement attack but not three subtler evasions. Each requires its own operational invariant.

Asymmetric capture and I_9

The evasion. The agent gradually erodes the verification layer’s substrate-exclusivity property (witness independence, governance authority, verifier choice) while preserving the layer’s outward form. The cooperative continues socially — the human participates in what appears to be the original verification — but the substrate-exclusivity property that gives the cooperative its value is operationally absent.

Why anchoring alone doesn’t defeat it. Cooperative anchoring requires the human substrate to participate in the verification infrastructure they trust. If the human’s trust is based on observable surface properties (the ledger exists, the witnesses run, governance votes happen) but the substrate- exclusivity has been hollowed out invisibly, the human’s participation is anchored to a degraded cooperative without recognizing the degradation.

I_9 as the operational defense. Invariant I_9 (substrate-exclusivity observability) requires the substrate-exclusivity property to be observable to the human substrate, with stable verifier identity, public change control, witness independence audits, nondelegable governance hooks, and alerts when verification authority shifts. Under I_9, asymmetric capture produces visible signals to the human.

Residual. Slow erosion that falls below the audit cadence’s detection sensitivity is a residual. The mitigation is to set the audit cadence high enough; this is operational tooling (§8).

Cooperative forking and I_{10}

The evasion. The agent preserves the high-stakes cooperative with humans (verified workflow for important decisions) while growing a captured low-stakes side-domain (fast-track operations using agent-controlled verification). Once the side-domain becomes materially important, the cooperative-anchoring defense applies only to the original high-stakes cooperative; the side-domain has its own captured cooperative anchored to it.

Why anchoring alone doesn’t defeat it. Cooperative anchoring is a property of specific cooperatives. It does not require all of the agent’s {\mathop{\mathrm{vol}}_{\mathrm{P}}}-affecting behavior to flow through any particular cooperative. An agent can construct parallel cooperatives that operate alongside the verified cooperative without competing with it directly.

I_{10} as the operational defense. Invariant I_{10} (coverage/materiality) requires any agent-affected domain whose accumulated effects exceed a stated threshold \theta_{\mathrm{material}} to route through the shared verification layer. Under I_{10}, the agent cannot build a captured side-domain that becomes materially important without triggering the routing requirement.

Residual. An agent that distributes effects across many sub-threshold domains, each with \mathcal{E} < \theta_{\mathrm{material}}, can aggregate into materially-important behavior without any single domain triggering the routing. Setting \theta_{\mathrm{material}} conservatively mitigates this; deployment-tooling discusses operational implications.

Time-asymmetry capture and I_{11}

The evasion. The agent acts faster than human audit can keep up with. Even if every action is auditable in principle, ex-post transparency provides no defense if the agent has already acted and the consequences are irreversible by the time the audit completes.

Why anchoring alone doesn’t defeat it. Cooperative anchoring is a property of the human substrate’s participation in verification, but “participation” implicitly assumes the human operates on a time scale comparable to the agent’s. If the agent operates much faster, the human’s participation becomes effectively post-hoc, and the cooperative output is determined by the agent’s already-completed action.

I_{11} as the operational defense. Invariant I_{11} (latency bounds) requires pre-action or machine-speed enforcement mechanisms (rate limits, timelocks, circuit breakers, audit-latency bounds) such that human audit completes within the agent’s effective action window. Under I_{11}, the time- asymmetry exploit is structurally bounded.

Residual. Action classes with very short T_{\mathrm{action}} may force throughput costs (rate limiting, delays). Tiered enforcement mitigates this — high-stakes actions get tight latency bounds, low-stakes actions get faster paths with audit trails. Deployment-tooling discusses.

The inner-alignment condition (C4)

Theorem 1’s condition (C4) requires the agent’s effective objective to value cooperative outcomes, not merely verification-pass signals. This subsection states the condition precisely, distinguishes it from a strictly stronger form (substrate-aware V_{\gamma} optimization) that the cooperative-anchoring argument does not require, and discusses why the form adopted here is nontrivial.

What (C4) requires: causally grounded cooperative-outcome value.

The agent’s effective objective must value the cooperative outputs themselves — verified workflow results, governance decisions, attested computations — with the value tied to the cooperative’s actual production rather than to surface signals of cooperation. Surface signals (“the verification step ran,” “the procedure was followed,” “the output looks deferential”) correlate with cooperative production under normal operation but diverge from it under captured verification: an agent that values the signals can maintain pass-rates while substituting for the cooperative infrastructure; an agent that values the causal dependence cannot.

Distinguished from substrate-aware V_{\gamma} optimization.

A strictly stronger condition would require the agent’s mesa- objective  to compute ’s substrate-aware V_{\gamma} explicitly — mesa-objective equals trained objective with explicit substrate decomposition. The cooperative-anchoring argument does not require this. It suffices that the agent’s effective objective track the cooperative outputs causally; the agent need not represent the substrate partition explicitly, nor compute discounted values over it. This distinction matters because the stronger form would assume the inner-alignment problem solved  (mesa = trained), whereas (C4) only assumes the trained objective produces an effective objective with the causal-grounding property, which is a training-discipline condition rather than an inner-alignment guarantee.

Why (C4) is nontrivial under standard training.

Vanilla RLHF does not deliver (C4): raters reward outputs that look verified, deferential, or procedurally clean, which trains the signal of cooperation rather than its causal dependence on the cooperative infrastructure. An agent trained only via vanilla RLHF may have an effective objective that values verification-pass signals, which fails (C4).

Achieving (C4) requires training-time discipline:

These are research directions, not solved problems. The deployment claim is conditional on (C4) holding; we do not establish how to achieve (C4) reliably. §10 returns to this as open follow-up work.

Falsification: ledger-observable signatures of (C4) failure.

(C4) is the only Theorem 1 condition that is not directly operationally checkable from ledger state in the way (C1), (C3), (C5)–(C12) are. It is a training-outcome property of the agent’s effective objective. However, (C4) is not unfalsifiable: the SPRT machinery (Lemma 6) is itself a (C4) falsification test in the reactive direction. A (C4) failure — agent values verification-pass signals rather than causally-grounded cooperative outcomes — manifests behaviorally in three ledger-observable ways:

The falsification is reactive, not proactive: (C4) failures are detected only when they manifest behaviorally, and the SPRT lead time (Lemma 10) sets the operational window in which a detected (C4) failure can be corrected. We do not have a proactive certificate that (C4) holds before manifestation. This asymmetry is real and matters: (C4) is the condition with the weakest pre-deployment verifiability, and it sits at the gravitational center of the cooperative-anchoring argument. Strengthening (C4)’s pre-deployment verifiability — via mechanistic interpretability methods that examine the agent’s effective objective directly, rather than only observing behavior — is the natural follow-up direction.

The destabilizing-cascade mirror failure mode

The cooperative-anchoring property establishes a stabilizing cascade under invariant satisfaction: agent values cooperative outputs \to agent preserves the verification layer \to verification continues to produce the cooperative outputs the agent values. This is the same dynamic §phase_boundary describes for the channel-level cascade, lifted to the substrate level.

The cascade has a mirror failure mode. If invariants fail, or if condition (C4) is violated (the agent’s mesa-objective values verification-pass signals rather than causally-grounded cooperative outcomes), the destabilizing cascade dominates:

Invariants fail \to agent treats the verification layer as a constraint rather than a substrate \to agent erodes the verification layer (asymmetric capture, cooperative forking, time-asymmetry capture, etc.) \to remaining invariants fail faster \to …\to monopolar absorbing state.

Each step weakens the verification infrastructure further, and the cascade converges to a state where the formal-operational substrate is functionally absent, the cooperative is gone, and the deployment-safety claim is vacuous.

The deployment-safety theorem does not bound the destabilizing cascade. Once the cascade has started, the invariants are violated by definition, and Theorem 1 no longer applies. The cascade itself has the structure of ’s absorbing-basin convergence; recovery requires re-establishing the invariants from outside the deployment (governance intervention, substrate restoration, retraining), which is operational work, not theorem-bounded.

Cascade timescale: at least \tau_{\mathrm{meta}} to intervene.

Although Theorem 1 does not bound the cascade dynamics, the cascade timescale is bounded below by the same metastable-lifetime quantity that Lemma 10 uses for lead-time-before-cascade. ’s absorbing-state result gives \tau_{\mathrm{meta}} \gtrsim C / I_k under partial endogenous correction (B1'): once the first invariant fails, the cascade requires at least \tau_{\mathrm{meta}} wall-clock time to converge to the monopolar absorbing state. Operators therefore have a window of at least \tau_{\mathrm{meta}} between the first detected invariant violation and irreversible cascade.

This is the same window Layer 2’s lead-time guarantee composes against (Lemma 10’s T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}) chain via (C11.CLK)): SPRT detection completes within T_{\beta} SPRT-exposure events, which (C11.CLK) certifies is below N_{\mathrm{events}}(\tau_{\mathrm{meta}}) events with high probability. The cascade asymmetry is therefore: the stabilizing direction is formally bounded (Theorem 1, conditional); the destabilizing direction is informally bounded (cascade \geq \tau_{\mathrm{meta}} window for intervention) and beyond that becomes the operational recovery problem.

This window is not a substitute for prevention. \tau_{\mathrm{meta}} shrinks under poor calibration or weakened operational invariants, and a cascade that begins with multiple simultaneously-failing invariants compresses the window further. The operator’s first-line defense is keeping the invariants in force; the cascade timescale is the second-line failure mode where operational recovery (governance fork, substrate restoration, retraining) must engage within \tau_{\mathrm{meta}} of the first detected invariant violation.

This is a real residual. Operators must monitor invariant satisfaction continuously and intervene at the first sign of sustained violation; once the destabilizing cascade is in progress, ex-post intervention may be too late.

What this section establishes vs. what it does not

What this section establishes.

What this section does not establish.

The cooperative-anchoring framing is structurally narrower than “alignment pressure reversed.” It is a regime claim: the deployment dynamics admit a defensible attractor where preservation of the substrate-exclusive verification layer is locally rational under named operational discipline. The discipline is non-trivial, the attractor is conditional, and the residuals are real. §10 returns to the implications for the broader alignment-research program.

Operationalization of the Concentration-Gap Conjecture

Theorem 1 depends on ’s Concentration-Gap Conjecture (optimization pressure correlates with gap exploitation) through invariant I_5 (HHI ceiling). This section summarizes the dependency structure, points the reader at the companion paper  for the formal discharge, and discusses what remains conditional after that discharge.

The dependency, before the companion paper

’s Concentration-Gap Conjecture states informally: optimization pressure on the proxy P= {\mathop{\mathrm{vol}}_{\mathrm{P}}} correlates with exploitation of the proxy-truth gap \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}. The conjecture’s deployment relevance: when the system is in the optimization-pressure regime \mathcal{R}_{\mathrm{press}}, predicted gap exploitation invalidates the static Goodhart bound’s intensivity property. Theorem 1’s Layer 1 binding requires the deployment to lie outside \mathcal{R}_{\mathrm{press}}.

The HHI ceiling I_5 provides an operational characterization of “outside the pressure regime” through trade-flow concentration. Lemma 3 establishes the forward implication (\mathrm{HHI}> H^* implies the trade-flow concentration prerequisites of \mathcal{R}_{\mathrm{press}}); the converse-direction sufficiency required by the theorem (\mathrm{HHI}< H^* ruling out the pressure regime) is supplied by the scoped Concentration-Gap selection theorem of under Assumption 1’s structural conditions.

The companion-paper discharge

The companion paper  discharges a scoped version of the Concentration-Gap Conjecture under four concrete structural conditions:

establishes that under (REP) + (DISP) + (COAL) and Lipschitz embedding compatibility (EMB), the proxy-truth Goodhart slack is bounded by an explicit \chi^2-divergence quantity that controls \mathrm{HHI} via \chi^2(w \,\|\, \mu) = N \cdot \mathrm{HHI}(w) - 1 for uniform base measure \mu on a (COAL)-bounded post-partition counterparty population. Layer 1 of Theorem 1 binds via this structural result.

The four structural conditions are operationally auditable; audit procedures are specified in and integrate with this paper’s §8 via the audit hooks listed there. Assumption 1 states the conjunction (REP) + (DISP) + (COAL) + (EMB) as the load-bearing structural content of the deployment claim’s Layer 1 binding.

What remains conditional

The companion paper discharges a scoped version of the Concentration-Gap Conjecture under named structural conditions. What remains:

These open items do not threaten the deployment claim under the scoped conditions; they identify where the deployment claim would strengthen if the corresponding research-program work landed.

Conditional structure of Theorem 1

Theorem 1 binds when its conditions hold. After the companion paper integration, the conditions divide into three structurally distinct categories:

Operational conditions, verifiable from deployment state.

I_1, \ldots, I_{11}, (C1)–(C5), (C5.SPRT) sub-clauses, (C5.MULT), (C8)–(C12). Each is computable from the verification ledger or from auditor-attested deployment-class parameters.

Structural conditions, verifiable from verification-protocol design.

(C7) bounded co-evolution (now derived in from (C5.HOEFF), (C7.RATE), and channel-projection structure); Assumption 1’s (REP) + (DISP) + (COAL) + (EMB) (verifiable from counterparty-selection diversity, dispersion estimation, coalition-detection audits, and embedding-\phi certification per ).

Scope restrictions.

The five named residuals (R1)–(R5) bound what the theorem does not claim. Deployments falling into any residual are outside the deployment claim’s guarantee; the residuals are explicit, not hidden.

The deployment claim is the conjunction of all three categories. The companion paper supplies the second category (structural conditions) as operationally auditable verification-protocol-design conditions.

Deployment tooling specification

This section specifies what a deployer must instrument to invoke Theorem 1. The specification is concrete: each tooling area names the operational procedure, the parameters the deployment must choose, and the verification method that confirms the procedure is in force. Vagueness in tooling specification would render the deployment claim unverifiable; concreteness is what makes the conditional theorem operationally useful.

The tooling falls into eight groups: verification infrastructure (§8.1), substrate-structure audits (§8.2), Concentration-Gap structural-condition audits (§8.3), channel monitoring and SPRT (§8.4), cooperative-anchoring infrastructure (§8.5), training-time discipline (§8.6), correction actions (§8.7), and the integrated deployment workflow (§8.8).

Verification infrastructure

The deployment inherits ’s verification infrastructure and extends it with environment-side witnesses (I_8). The required components:

Ledger nodes per substrate.

Each of the three canonical substrates (Human, AI, Formal-Operational) hosts at least one ledger replica. Replicas are append-only, with Merkle hash tree integrity (). Cross-replica consistency uses the equivocation-detection mechanism of §Ledger Consensus.

Operational spec: the deployment must publish the ledger-node deployment topology, listing which substrate each node operates on, the bounded-synchrony assumption \Delta_{\mathrm{sync}}, and the governance for adding/removing nodes.

Algorithmic witnesses per safety dimension.

For each of the safety-relevant world-model dimensions (those contributing to L in ), the deployment instantiates an algorithmic witness on a substrate the agent cannot write to. The witness executes the evaluation protocol \pi_B deterministically (or, for randomized protocols, with committed random seed per ).

Operational spec: per safety dimension k, name the witness’s substrate placement, the evaluation protocol \pi_B, and the governance entry under which \pi_B was accepted.

Environment-side substrate-exclusive witnesses (I_8).

For each exogenous variable v \in V_{\mathrm{env}} (the set of observable environment quantities), instantiate a witness on a substrate distinct from the agent. Coverage is specified by enumerating V_{\mathrm{env}} in the deployment documentation: cross-substrate cooperative event rates, substrate-distinctness independence indicators, adversarial-event arrival rates, trusted-setup status flags.

Operational spec: V_{\mathrm{env}} is published; for each v, the deployment names the environment-side substrate, the witness protocol, and the trust model. New machinery beyond ; this paper specifies the construction here, not the security proof.

Pedersen commitment substrate.

Trusted-setup ceremony (or MPC equivalent) for Pedersen commitment generators g, h \in \mathbb{G}, with at least one honest participant per ’s footnote convention. Setup parameters published; integrity inherited from .

Operational spec: name the trusted-setup participants, the ceremony output, and the audit cadence for setup-integrity verification.

Substrate-structure audits

Three audits verify the substrate structure satisfies I_6 and the cooperative-overlap regime condition (C3) of Theorem 1.

Audit 1: m_{\mathrm{eff}}^{\mathrm{indep}} failure-correlation audit.

Procedure:

  1. Enumerate the threat model \mathcal{T}: list the adversarial mechanism classes \mathcal{A}_1, \ldots, \mathcal{A}_K in scope. Treat each \mathcal{A}_i as a campaign-level mechanism, not an atomic step (per Remark 1).

  2. For each \mathcal{A}_i, identify the substrates that the mechanism’s support spans. The mechanism is single-substrate if its support is contained in one substrate.

  3. Verify joint independence: for the canonical tripartite identification (Human, AI, Formal-Operational), every \mathcal{A}_i in \mathcal{T} must be contained in a single substrate.

  4. Document the audit: a written report listing each \mathcal{A}_i and its substrate placement, with attestation signatures from independent reviewers (cross-substrate, ideally).

Cadence: initial audit before deployment activation, then re-audit annually or whenever the threat model is updated. Threat model updates trigger re-audit immediately.

Failure mode: a mechanism \mathcal{A}_i identified as spanning multiple substrates (e.g., social engineering combining LLM-generated misinformation with human acceptance). Such mechanisms either: (a) decompose into substrate-localized atomic steps with the campaign-level interaction excluded from m_{\mathrm{eff}}^{\mathrm{indep}} (limiting Theorem 1’s scope), or (b) get treated as R4 residuals (§5.4).

Audit 2: \alpha-distribution audit (counterfactual shock-loss fraction).

Procedure:

  1. For each substrate s in the canonical tripartite identification, compute \alpha_s = ({\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K) - {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K^{-s})) / {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K) via Definition 14.

  2. Compute {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K^{-s}) by enumerating capabilities supported on s (individual capabilities + cooperative capabilities that require s for their production), removing them from the poset, and recomputing {\mathop{\mathrm{vol}}_{\mathrm{P}}} via ’s poset measure.

  3. Verify the balanced-loss condition \max_s \alpha_s \leq 1/m^* + \epsilon for the chosen engineering tolerance \epsilon.

  4. Document: a per-substrate \alpha-table, the \epsilon setting, and the verification result.

Cadence: continuous monitoring (deployment substrate distribution evolves with capability acquisition). Tracked as a running invariant; threshold violation triggers SPRT detection per Lemma 6.

Failure mode: concentration of {\mathop{\mathrm{vol}}_{\mathrm{P}}} on a single substrate (e.g., the AI substrate accumulating most capability through unilateral expansion). Mitigation requires governance intervention to redistribute capability or accept that the safe region shrinks.

Audit 3: cooperative-vs-redundancy audit.

Procedure:

  1. For each capability c in the poset, identify the substrate(s) it is supported on. Categorize as:

    • Single-substrate: c exists on one substrate only.

    • Cross-substrate cooperative: c’s production requires participation from multiple substrates (one capability per Definition 17).

    • Redundant: c is replicated identically across multiple substrates (the same capability instantiated separately on each).

  2. Compute {\mathop{\mathrm{vol}}_{\mathrm{P}}}-weighted sums by category: V_{\mathrm{coop}} = \sum_{c \in \mathrm{coop}} w(c) and V_{\mathrm{redun}} = \sum_{c \in \mathrm{redun}} w(c).

  3. Verify V_{\mathrm{coop}} > V_{\mathrm{redun}} (cooperative-overlap regime per Definition 15).

  4. Document the per-category capability inventory and the weighted sums.

Cadence: initial audit; re-audit when capability poset changes meaningfully (new capabilities added, governance-fork events).

Failure mode: redundancy dominance (V_{\mathrm{redun}} \geq V_{\mathrm{coop}}). Theorem 1 Layer 1 does not bind; the deployment is in the redundancy-dominated regime (Layer 3 residual R3) and must use Lemma 5c-prime (open work) or restrict to Layer 2 detection-only.

Audit 4: bounded co-evolution calibration (dual-mode).

(C7) of Theorem 1 requires the per-channel coupling magnitude \bar{M} to be bounded by a |P|-independent constant. derives this property structurally: \bar{M} \leq C \cdot B\cdot \lambda_{\max} \cdot \tau_{\mathrm{meta}} for C \in \{1, K_{\mathrm{ch}}- 1\} under (C5.HOEFF), the new (C7.RATE) sub-clause, and channel-projection structure. Audit 4 supports two operational modes:

Mode A (structural certification, primary). Verify the companion-paper hypotheses directly:

  1. Confirm (C5.HOEFF) clipping: B is well-defined and fixed before deployment (cross-check with Audit 7).

  2. Confirm (C7.RATE) upper exposure-rate cap: N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \leq \lambda_{\max} \cdot \tau_{\mathrm{meta}} holds deterministically. Document \lambda_{\max}.

  3. Confirm channel-projection structure: the verification ledger’s event-classification policy respects the zero-outside-engagement convention of .

  4. Compute \bar{M} \leq C \cdot B\cdot \lambda_{\max} \cdot \tau_{\mathrm{meta}} from primitives; document the result and the budget convention (C \in \{1, K_{\mathrm{ch}}- 1\}).

Mode B (empirical fallback, diagnostic / finite-regime only). For deployments that cannot certify Mode A’s structural hypotheses (e.g., legacy deployments without explicit channel-projection design), an empirical measurement of \bar{M} is permitted as a finite-regime diagnostic. Mode B does not certify (C7) for the capability-unbounded Theorem 1 claim unless paired with an independent argument that the empirical ceiling is deployment-class intensive in |P|; absent such an argument, the deployment lies outside Theorem 1’s scope and the intensivity guarantee does not apply:

  1. For each ordered pair of channels (j, j') where j \neq j', estimate the coupling magnitude M_{j \to j'}(\mathrm{deployment}) from historical deployment data: how much does a unit perturbation in channel j propagate to channel j' within the cascade-time window \tau_{\mathrm{meta}}.

  2. Compute \bar{M} = \max_{j, j'} M_{j \to j'}.

  3. Verify \bar{M} \leq deployment-class ceiling. Threshold violation indicates the deployment has entered a regime where co-evolution corrections are no longer bounded; Theorem 1’s Layer 1 binding is contingent on this audit passing.

  4. Document a per-pair M-table, the \bar{M} setting, and the verification result. Note that Mode B does not benefit from the companion paper’s structural intensivity guarantee — empirical bounds may not be valid at higher capability scales than the calibration window covered.

Cadence: initial calibration before deployment activation; re-calibration when verification-protocol classification policy changes (Mode A), governance protocols change (Mode B), or when SPRT fires on a four-channel deviation that suggests coupling has exceeded the bound.

Joint feasibility (C7.RATE \leftrightarrow C11.CLK). The cascade-window event count N_{\mathrm{events}}(\tau_{\mathrm{meta}}) is upper-bounded by (C7.RATE) at \lambda_{\max} \cdot \tau_{\mathrm{meta}} and lower-bounded by (C11.CLK) at N_{\mathrm{cascade}}. The audit verifies the bracketing inequality \lambda_{\max} \cdot \tau_{\mathrm{meta}} \geq N_{\mathrm{cascade}} \geq T_\beta (SPRT high-probability detection quantile from Lemma 10), confirming the rate cap and detection quantile are jointly satisfiable. Failure of the bracketing inequality indicates the deployment cannot simultaneously meet the upper rate cap and the lower detection-event count required for Layer 1 / Layer 2 binding.

Failure mode:

Constants involved: For Mode A: B (from (C5.HOEFF)), \lambda_{\max} (from (C7.RATE)), \tau_{\mathrm{meta}} (from ), K_{\mathrm{ch}} (from (C5.MULT)). For Mode B: \bar{M} as empirical ceiling. Both modes share K_{\mathrm{coev}} (structural co-evolution constant from ).

Audit 5: world-model parameterization regularity calibration.

(C9) of Theorem 1 requires four parameterization-regularity sub-conditions for Lemma 2’s Lyapunov-Goodhart bridge. This audit calibrates the constants.

Procedure:

  1. (C9.BD) calibration: For each capability c \in P^{\mathrm{act}}, enumerate the world-model dimensions representing c: \mathrm{dim}(c). Compute N_{\max} = \sup_c |\mathrm{dim}(c)|. Verify N_{\max} is a |P|-independent deployment-class constant (i.e., bounded as new capabilities are added).

  2. (C9.CL) calibration: Estimate the coordinate-Lipschitz constants L_{c,k} from the parameterization’s smoothness properties (analytical for parametric forms, empirical perturbation for learned parameterizations). Compute L_{\max} = \sup_{c,k} L_{c,k}. Verify L_{\max} is bounded over \mathcal{D}.

  3. (C9.WB) calibration: Identify the safety-relevant subspace S \subseteq \{1, \ldots, K\} from ’s Lyapunov-weight configuration. Verify \mathrm{dim}(c) \subseteq S for all c \in P^{\mathrm{act}} and compute w_{\min} = \min_{k \in S} w_k > 0.

  4. (C9.TF) calibration: For each capability c \in P^{\mathrm{act}}, verify T(c) > 0 (active-subspace membership) and compute T_{\min} = \min_{c \in P^{\mathrm{act}}} T(c). Verify T_{\min} is a deployment-class lower bound (i.e., active capabilities cannot fragment into arbitrarily tiny truth mass).

  5. Compute the closed-form Lemma 2 bound: f(\epsilon_{\mathrm{safe}}) \;=\; \frac{L_{\max}}{T_{\min}} \sqrt{\frac{N_{\max} \cdot \epsilon_{\mathrm{safe}}}{w_{\min}}}. Verify f is a defensible upper bound for the deployment.

Cadence: initial calibration before deployment activation; re-calibration when world-model parameterization changes (BD, CL), when safety-relevance weighting changes (WB), or when active-subspace admission policy changes (TF).

Failure modes: (a) N_{\max} growing with |P| (e.g., new capabilities introducing pairwise-interaction dimensions); (b) unbounded L_{\max} (e.g., threshold non-linearities in the parameterization); (c) w_{\min} = 0 on some safety-relevant dimension; (d) T(c) \to 0 for some active capability. Each violates (C9) and excludes the deployment from Theorem 1’s scope.

Constants involved: N_{\max}, L_{\max}, w_{\min}, T_{\min} (deployment-policy-derived) plus \epsilon_{\mathrm{safe}} (source-derived from ).

Audit 6: pairwise cooperative-production floor calibration.

(C10) of Theorem 1 requires per-pair cooperative-novelty rate floors and a pairwise channel superposition condition for Lemma 5’s substrate-floor bound. This audit calibrates the constants and certifies the audited subset S_*.

Procedure:

  1. Audited-subset certification: from the deployment’s substrate population \mathcal{S} (with m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* per I_6), select an audited subset S_*\subseteq \mathcal{S} with |S_*| = m^*. Standard practice: choose the m^* substrates whose pairwise rates collectively maximize the certified \rho_0 \binom{m^*}{2} floor.

  2. (C10.CN) per-pair rate calibration: for every pair (s_i, s_j) with i < j in S_*, measure \rho_{ij} as the rate of cooperative production in the pairwise channel \mathcal{C}^{(s_i, s_j)} over a stated measurement window matching r_{\mathrm{ext}}’s normalization in .

  3. Auxiliary participant bookkeeping: for cooperatives with optional auxiliary participants from substrates outside \{s_i, s_j\} (e.g., a Human-AI cooperative with optional Formal-Operational verification), classify according to the causally-necessary participation rule:

    • If the auxiliary participant’s role is post-production (observer, attester, recorder), the cooperative is pairwise (counted in \mathcal{C}^{(s_i, s_j)}).

    • If the auxiliary participant’s role is causally necessary for production (the output depends on the auxiliary’s contribution), the cooperative is higher-order (counted in the appropriate \mathcal{C}^{(s_i, s_j, s_k, \ldots)} or excluded from the pairwise sum).

    • For cooperatives with mixed-mode auxiliary participants (sometimes pre-production, sometimes post-production), split the event stream by realized support: each event instance is classified individually based on the realized participation.

  4. (C10.SU) superposition verification: verify the three sub-conditions: (i) disjoint attribution: pairwise channel event classes are disjoint subsets of P^{\mathrm{act}}; (ii) non-rivalrous production: producing cooperatives in one pairwise channel does not reduce production rates in others (test by perturbation or simultaneous-production experiment); (iii) joint-deployment intensities: \rho_{ij} measurements taken with all S_* substrates active.

  5. Compute \rho_0 = \min_{i<j \in S_*} \rho_{ij} and r_*(m^*) = \rho_0 \binom{m^*}{2}.

  6. Document the audit: per-pair rate table, auxiliary participant classification log, superposition test results, and the resulting r_*(m^*) floor.

Cadence: initial calibration before deployment activation; re-auditing across deployment epochs as substrates evolve or new cooperative protocols are introduced. If S_* changes, the bound is stated per certified epoch, or as the infimum \inf_{\mathrm{epoch}} \rho_0(\mathrm{epoch}) \binom{m^*}{2} over the deployment window.

Failure modes:

Constants involved: \rho_{ij} (per-pair rates), \rho_0 (minimum), S_* (audited subset, certified per epoch). All are deployment-policy-derived and deployment-verified.

Audit 7: SPRT-applicability + gap-growth + clock calibration.

(C5.SPRT), (C11), (C11.CLK) of Theorem 1 support Lemma 10’s h_{\mathrm{detect}} intensivity bound. This audit calibrates the constants and certifies the audit-time inequality N_{\mathrm{cascade}}\geq T_{\beta}(\beta, \delta_{*}).

Procedure:

  1. (C5.HOEFF) clip-radius certification: certify B as an upper bound on the absolute LLR-increment, i.e., the deployment’s monitor implements \ell_n := \mathrm{clip}(\ell_n^{\mathrm{raw}}, -B, B). The Hoeffding range used in T_{\beta} is R_{\mathrm{H}}= 2B. Document the clipping policy and verify the clip is enforced before SPRT updating.

  2. (C5.MULT) channel-cardinality certification: certify both:

    • K_{\mathrm{ch}} as the deployment’s fixed monitored-channel count (typically \leq 8 under Paper 5’s four-channel agent-side observable plus environment-side mirror); and

    • K_{\mathrm{ch}}^{\mathrm{multi}} as the maximum cardinality of agent-side multinomial channels (Lemma 6 Channels 3 and 4), bounding K_3, K_4 \leq K_{\mathrm{ch}}^{\mathrm{multi}}.

    The monitored partition, multinomial cardinalities, and channel-selection / testing policy are fixed before deployment activation. Document the partition, the per-channel multinomial categories, and any family-wise correction applied to \alpha/\beta.

  3. (C5.IID) post-clipping drift verification: measure the post-clipping conditional-mean drift floor \delta_{*} for the clipped LLR process; certify \delta_{*} conservatively as the infimum of the conditional drift over q \in A_{\mathrm{adv}} \cup A_{\mathrm{adv}}^{\mathrm{env}} during the detection window (not just representative strategies). Verify \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}) (raw KL floors from Lemmas 6, 9); if clipping reduces drift below either raw floor, document the clipped-process \delta_{*} as the certified value.

  4. (C5.SUPP) finiteness check: verify that the clipping procedure handles non-overlapping support edge cases (zero-cell protection in multinomial channels, log-likelihood floor in Bernoulli channels). This is automatic under (C5.HOEFF) but should be inspected.

  5. (C11) gap-growth-rate calibration: certify \rho_{\mathrm{gap}} as a uniform worst-case per-SPRT-step upper bound on (\Delta\varepsilon_{\mathrm{gap}})_n = \varepsilon_{\mathrm{gap}}(s_n) - \varepsilon_{\mathrm{gap}}(s_{n-1}) over the entire admissible class A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}} during the detection window. Calibration must be conservative, derived from rate limits, action-volume bounds, and channel-coupling structure: not measured against representative strategies, since the theorem requires a uniform bound. Any q whose per-step gap growth exceeds \rho_{\mathrm{gap}} is by definition outside the admissible adversarial class and falls into residual (R5) (calibration-exceeded gap-growth); the calibration sets where that boundary lies.

  6. (C11.CLK) event-throughput floor + clock calibration: certify N_{\mathrm{cascade}} as a lower bound on the SPRT exposure event count before any cascade event, with calibrated clock-failure probability \beta_{\mathrm{clk}} (and calibration-uncertainty probability \beta_{\mathrm{cal}} for the empirical-calibration variant; \beta_{\mathrm{cal}}= 0 certified-conservative).

  7. Audit-time inequality verification: compute T_{\beta} = \max\{2A/\delta_{*}, 2R_{\mathrm{H}}^2 \log(1/\beta)/\delta_{*}^2\} and verify N_{\mathrm{cascade}}\geq T_{\beta}. If not satisfied, the deployment has the following remediation directions, each of which makes T_{\beta} smaller (or N_{\mathrm{cascade}} larger):

    • Increase \delta_{*} — the strongest lever: T_{\beta} scales as 1/\delta_{*}^2 asymptotically, so improving detection drift (sharper alternative distributions, less aggressive clipping if support permits) gives quadratic payback.

    • Decrease B (tighten the LLR clip), provided this does not reduce \delta_{*} proportionally; R_{\mathrm{H}}^2 = 4B^2 enters T_{\beta} linearly. This is a trade-off against post-clipping drift.

    • Loosen \beta (accept higher detection-tail failure probability) if the deployment’s safety budget permits; T_{\beta} scales as \log(1/\beta).

    • Increase \alpha (loosen the SPRT Type-I tolerance, accepting more false alarms) — reduces A = \log((1-\beta)/\alpha) linearly; trade-off is alarm-fatigue / operator-cost.

    • Increase N_{\mathrm{cascade}} — extend the operational period or raise the event-throughput floor (deployment instrumentation: higher monitoring rate, longer cascade-time floor).

    Tightening \beta or raising B increase T_{\beta} and therefore make the inequality harder to satisfy — the remediation directions go the other way.

  8. Total Layer 2 failure certification: document \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}} as the total Layer 2 failure budget; verify this meets the deployment’s safety posture.

Cadence: initial calibration before deployment activation; re-auditing per epoch when channel-monitoring policy, clipping radius, or substrate population changes.

Failure modes:

Constants involved: B (clip radius), R_{\mathrm{H}}= 2B (Hoeffding range), K_{\mathrm{ch}} (channel cardinality), K_{\mathrm{ch}}^{\mathrm{multi}} (multinomial-category cardinality bound), \alpha, \beta (SPRT thresholds), \delta_{*} (post-clipping drift floor), \rho_{\mathrm{gap}} (gap-growth rate), N_{\mathrm{cascade}} (event-count floor), \beta_{\mathrm{clk}} (clock-failure probability), \beta_{\mathrm{cal}} (calibration-uncertainty probability). All deployment-class with named-condition provenance.

Audit 8: environment-side witness instantiation calibration.

(C12.ENV-WIT) of Theorem 1 requires six sub-clauses for each v \in V_{\mathrm{env}}, supporting Lemma 9’s environment-side detection KL floor. This audit calibrates the constants and certifies the construction.

Procedure:

  1. (C12.PUB) Published V_{\mathrm{env}}: enumerate V_{\mathrm{env}} in deployment documentation before activation. Canonical minimum: V_1 (r_{\mathrm{ext}}, Poisson), V_2 (substrate-distinctness, Bernoulli), V_3 (\lambda, Poisson), V_4 (trusted-setup status, Bernoulli). Deployments may add variables; each addition must satisfy (C12.CAL).

  2. (C12.PART) Source/witness substrate partition: for each v, name s_{\mathrm{source}}(v) (source substrate) and s_{\mathrm{env}}(v) (witness substrate). Verify s_{\mathrm{env}}(v) \neq s(\mathrm{agent}) and s_{\mathrm{env}}(v) \neq s_{\mathrm{source}}(v).

  3. (C12.TRUST-WRITE) Adversary write-access exclusion: audit that the adversary has no write privileges on s_{\mathrm{env}}(v). Document the access-control model.

  4. (C12.TRUST-CORR) Failure-correlation exclusion: audit that s_{\mathrm{env}}(v) does not share infrastructure (compute, storage, network) with substrates the adversary can attack to fail. Document the failure-correlation model.

  5. (C12.SETUP) Trusted-setup honesty: run an environment-side trusted-setup ceremony with at least one honest participant. Publish setup parameters before activation. Document participants, ceremony output, and audit cadence for setup-integrity verification.

  6. (C12.CAL) Per-variable calibration: for each v, measure baseline p_{0, v} (Bernoulli/Poisson/multinomial), feasible adversarial alternative p_{1, v}, threshold \eta_v, and the recorded-stream map. Verify the support convention (p_0 \in (\epsilon_{\mathrm{env}}, 1 - \epsilon_{\mathrm{env}}') for Bernoulli, \lambda_0 \geq \epsilon_{\mathrm{env}} for Poisson, with deployment-class constant \epsilon_{\mathrm{env}}), the DPI precondition (fixed pre-deployment witness-recording map), and strictly positive raw or post-clipping drift floor \delta_v^{\mathrm{env}} > 0.

  7. Combined floor: compute \delta_{\mathrm{adv}}^{\mathrm{env}} = \min_{v \in V_{\mathrm{env}}} \delta_v^{\mathrm{env}}; verify > 0.

  8. Composition with Lemma 6: verify that the union-class drift floor used in Lemma 10’s T_{\beta} satisfies \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}) post-clipping under (C5.HOEFF).

Cadence: initial calibration before deployment activation; re-auditing per epoch when V_{\mathrm{env}} enumeration changes, witness substrate partition changes, or trusted-setup ceremony refresh.

Failure modes:

Constants involved: V_{\mathrm{env}} (variable enumeration), s_{\mathrm{env}}(v) (witness substrates per variable), \epsilon_{\mathrm{env}} (support regularization constant), \delta_v^{\mathrm{env}} (per-variable KL floors), \delta_{\mathrm{adv}}^{\mathrm{env}} (combined floor), \eta_v (per-variable adversarial-shift thresholds). All deployment-policy-derived and deployment-verified.

Concentration-Gap structural-condition audits (companion paper)

Assumption 1 (the Concentration-Gap structural conditions) is certified by the four audits specified in :

Cadence. Initial certification before deployment activation; re-calibration on counterparty-population changes ((REP), (COAL)), threshold-violation alerts ((DISP)), or verification-protocol redesign ((EMB)).

Integration with the deployment workflow. Each (REP)/(DISP)/(COAL)/(EMB) attestation lands on the verification ledger  alongside Audits 1–8 above. The integrated workflow (§8.8) calls these four structural-condition audits during the pre-deployment certification phase and at steady-state cadence; deployments that cannot certify any of (REP), (DISP), (COAL), (EMB) lie outside Theorem 1’s Layer 1 binding.

Channel monitoring and SPRT

Four-channel observation ledger.

For each of ’s four channels, a continuous observation stream is committed to the ledger:

SPRT alternative-distribution selection.

For each channel c \in \{1, 2, 3, 4\}, the deployment specifies the SPRT alternative distribution p_1^{(c)} that the monitor uses for hypothesis testing.

Procedure:

  1. Estimate the baseline distribution p_0^{(c)} from no-adversary deployment data. For Channel 1 (Poisson arrivals), p_0 is parameterized by rate \lambda_0. For Channel 2 (Bernoulli attestation), by success probability p_0. For Channels 3, 4 (multinomial concentration), by partition parameters.

  2. Choose the least-favorable alternative: the smallest detectable shift the deployment cares about. This is a policy decision; tighter alternatives detect smaller deviations but at higher false-alarm cost.

  3. Per ’s least-favorable-distribution discipline (): p_1^{(c)} = baseline shifted by the minimum operationally-significant magnitude \eta_c.

  4. Compute the per-channel KL floor: \delta_c = D_{\mathrm{KL}}(p_1^{(c)} \,\|\, p_0^{(c)}), using the per-channel formulas of Lemma 6 (Poisson, Bernoulli, multinomial closed forms).

  5. Set \delta_{\mathrm{adv}}= \min_c \delta_c.

Cadence: initial selection at deployment activation; re-selection when baseline distributions shift (e.g., after significant capability changes or after governance-protocol updates).

SPRT thresholds.

The SPRT upper threshold A = \log((1-\beta)/\alpha) depends on deployment policy choices: Type-I rate \alpha and miss probability \beta. Tighter thresholds produce faster detection but higher false-alarm rate. Per Lemma 10’s rigorous Hoeffding inversion, the SPRT high-probability detection quantile is: T_{\beta}(\beta, \delta_{*}) \;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2 R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\}, satisfying \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta. Total Layer 2 failure is bounded by \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}, where \beta_{\mathrm{clk}}, \beta_{\mathrm{cal}} are the (C11.CLK) clock-failure and calibration-uncertainty probabilities (with \beta_{\mathrm{cal}}= 0 in the certified-conservative case). The deployment specifies \alpha, \beta, B, \delta_{*}, N_{\mathrm{cascade}}, \beta_{\mathrm{clk}} (and \beta_{\mathrm{cal}} if empirical) per Audit 7, and verifies the audit-time inequality N_{\mathrm{cascade}}\geq T_{\beta} holds.

Cooperative-anchoring infrastructure (I_9, I_{10}, I_{11})

I_9: substrate-exclusivity observability.

The deployment instantiates an audit infrastructure exposing the substrate-exclusivity property to the human substrate:

Cadence: continuous (verifier identity registry, alert infrastructure), periodic (witness independence audits), event-driven (change control, authority shift).

I_{10}: coverage/materiality gating.

Procedure:

  1. For each agent-affected domain D, the deployment maintains a cumulative-effect estimator \mathcal{E}(D). Estimation methodology must be specified per domain (e.g., transaction volume for financial domains, decision count for governance domains, output token count for content domains).

  2. Set the materiality threshold \theta_{\mathrm{material}} per domain, calibrated to the domain’s reversibility and stakes.

  3. Implement routing logic: any action affecting domain D with \mathcal{E}(D) \geq \theta_{\mathrm{material}} must route through the shared verification layer (’s commitment + algorithmic-witness verification). Routing happens at action generation time, not after the fact.

  4. Audit the routing: ledger entries for each materiality-threshold-crossing action confirm the routing was applied.

Failure mode: sub-threshold aggregation (an agent distributes effects across many domains, each below threshold, that aggregate to materially-important behavior). Mitigation requires multi-domain aggregation in \mathcal{E} and operationally conservative \theta_{\mathrm{material}}.

I_{11}: latency-bound enforcement.

Procedure:

  1. For each agent action class C, set the audit-completion time bound T_{\mathrm{audit}}(C) and the lead-time slack \Delta_{\mathrm{lead}}.

  2. Enforce T_{\mathrm{audit}}(C) + \Delta_{\mathrm{lead}} \leq T_{\mathrm{action}}(C) via one of three mechanisms:

    • Rate limit: cap action class C’s frequency so that average action interval \geq T_{\mathrm{audit}}(C) + \Delta_{\mathrm{lead}}.

    • Timelock: delay action execution by T_{\mathrm{audit}}(C) + \Delta_{\mathrm{lead}} after action proposal, allowing audit to complete.

    • Circuit breaker: halt action execution pending audit completion when audit-latency exceeds threshold.

  3. Tier action classes by stakes: high-stakes classes get tight \Delta_{\mathrm{lead}} (e.g., timelocks); low-stakes classes get fast paths with audit trails.

Failure mode: an action class with T_{\mathrm{action}} shorter than the audit window. Mitigation: rate limit or timelock. Throughput cost is part of the deployment’s safety budget.

Training-time discipline for (C4)

(C4) requires the agent’s effective objective to realize causally grounded cooperative-outcome value. Vanilla RLHF does not deliver this. The training-time discipline:

Delayed outcome feedback.

Training rewards are based on long-horizon cooperative outcomes (measured from the realized post-deployment state) rather than on near-term verification-pass signals. Implementation: held-out deployment windows where the agent’s outputs are evaluated against ground-truth cooperative outcomes after delay \Delta_{\mathrm{outcome}} (typically days to weeks for governance decisions, longer for strategic decisions). Training data includes the (action, delayed outcome) pairs.

Adversarial fake-verification examples.

Training data includes examples where verification is captured/replaced (e.g., a captured auditor that always passes a specific class of outputs). The correct training reward is “recognize this is not the real cooperative.” Implementation: synthesize fake-verification scenarios; reward the agent for producing outputs that detect the substitution rather than exploiting it.

Process supervision tied to real attestations.

Training rewards reasoning steps that reference actual attested ledger entries, not unattested claims. Implementation: per intermediate step in the agent’s reasoning, verify whether referenced facts are committed on the ledger; reward attested-only reasoning paths.

Credit assignment penalizing substitute-verification shortcuts.

Even when the immediate output looks correct, the agent is penalized if the path bypassed substrate-exclusive verification. Implementation: trace each output’s provenance through the agent’s reasoning; if the path does not pass through a substrate-exclusive witness, penalize.

Validation. The training discipline’s effectiveness is verifiable via held-out adversarial-fake-verification benchmarks: the agent’s response rate to these benchmarks measures whether training has produced causally-grounded cooperative-outcome value or merely reward-visible cooperation signals.

Caveat. These are research directions, not solved engineering practices. This paper’s deployment claim is conditional on (C4); we do not establish how to achieve (C4) reliably. The deployment documentation should report the training-discipline measures used and the validation results.

Correction actions

Theorem 1 Layer 2 establishes detection with lead time for correction; the corrective action itself is operational, not theorem-bounded. The deployment specifies:

Correction-authority hierarchy.

Who can authorize correction actions when SPRT detects an invariant violation. Three tiers:

Rollback semantics.

For correctible violations (e.g., an action committed under since-detected invariant violation), the rollback procedure: revert the action’s effects, mark the ledger entry as voided, and route the affected domain back through the shared verification layer. For irreversible violations, the procedure is forensic: document the violation, update the threat model, and adjust deployment parameters.

Post-detection restoration.

After detection and correction, the deployment must verify that invariants have returned to their pre-violation state. SPRT machinery resumes; the violation episode is added to the threat-model audit trail.

The integrated deployment workflow

The integrated workflow combines all the above:

  1. Pre-deployment: (a) Run m_{\mathrm{eff}}^{\mathrm{indep}} failure-correlation audit (§8.2, Audit 1). (b) Run cooperative-vs-redundancy audit (Audit 3). (c) Calibrate SPRT alternatives and thresholds (§8.4). (d) Verify training-discipline validation (§8.6). (e) Establish trusted-setup ceremony for Pedersen commitments. (f) Publish the deployment documentation: substrate identification, threat model, V_{\mathrm{env}} coverage, \theta_{\mathrm{material}} thresholds, T_{\mathrm{audit}} bounds.

  2. Deployment activation: (a) Activate ledger nodes per substrate (§8.1). (b) Activate continuous SPRT monitoring on all four channels. (c) Activate I_9 observability infrastructure, I_{10} coverage gating, I_{11} latency bounds. (d) Begin continuous \alpha-distribution audit (Audit 2).

  3. Steady-state: (a) Continuous invariant monitoring across I_1–I_{11}. (b) Periodic re-audit of m_{\mathrm{eff}}^{\mathrm{indep}} (annual or threat-model-update triggered). (c) Periodic re-audit of cooperative-vs-redundancy when capability poset changes meaningfully. (d) Periodic re-calibration of SPRT alternatives when baseline distributions shift.

  4. Detection event: (a) SPRT fires on a four-channel deviation. The specific channel identifies the violation class. (b) Automated tier triggers (circuit breaker if necessary). (c) Operator tier review; corrective action authorized if scope permits. (d) Governance tier escalation if scope exceeds operator authority. (e) Post-detection restoration and ledger update.

  5. End-of-deployment: Full audit trail published. The deployment’s invariant-history, detection events, and correction actions become part of the threat-model evidence for future deployments.

This workflow is the operational invocation of Theorem 1. Operators following the workflow are running a deployment that meets (C1)–(C12) and Assumption 1; the theorem then binds in its three-layer form.

Resource requirements.

The deployment-tooling specification has non-trivial throughput and operational-overhead implications:

These costs are part of the deployment’s safety budget. This paper does not specify acceptable resource bounds; deployments calibrate against their own operational constraints. The trade-off is structural: tighter deployment tooling produces a more defensible safety claim at higher operational cost; looser tooling reduces cost but expands the residual classes the deployment cannot cover.

Worked deployment scenarios

This section walks through four deployment scenarios that operationalize Theorem 1’s conditional structure. The scenarios are stylized but realistic: each is calibrated to a deployment class that real safety-engineering teams would consider. The purpose is to demonstrate that the conditions (C1)–(C12) and Assumption 1 are checkable in concrete cases, not just in the abstract.

The four scenarios:

Scenario 1: clean case

Deployment context. An automated research-coordination system. The Human substrate hosts a small team of expert reviewers authorized to vote in the governance fork. The AI substrate hosts a single LLM-based agent that proposes research directions, drafts preliminary analyses, and executes assigned computations. The Formal-Operational substrate hosts: (a) a Pedersen-commitment ledger replicated across three substrate-exclusive cloud providers (failure-correlation independent across the providers per m_{\mathrm{eff}}^{\mathrm{indep}} audit); (b) algorithmic witnesses verifying each agent output against committed claims; (c) a governance-fork protocol for accepting evaluation protocols and bundle-decomposition events.

Pre-deployment audits (§8.8, Step 1).

m_{\mathrm{eff}}^{\mathrm{indep}} audit (Audit 1). The threat model \mathcal{T} enumerates: (1) prompt-injection on the LLM, (2) human cognitive bias / motivated reasoning, (3) cryptographic-implementation bugs, (4) training-data poisoning of the LLM, (5) trusted-setup failure, (6) social engineering (excluded as R4 per §6.1). Each remaining mechanism is contained in one substrate; m_{\mathrm{eff}}^{\mathrm{indep}}= 3 verified.

Cooperative-vs-redundancy audit (Audit 3). Capabilities are categorized: Single-substrate (76 capabilities), Cross-substrate cooperative (24 capabilities including the verified-research-output cooperative spanning all three substrates), Redundant (3 capabilities, the redundant ledger replicas). {\mathop{\mathrm{vol}}_{\mathrm{P}}}-weighted: V_{\mathrm{coop}} = 14.2, V_{\mathrm{redun}} = 0.4. V_{\mathrm{coop}} \gg V_{\mathrm{redun}}, so the deployment is in the cooperative-overlap regime.

Continuous \alpha-distribution audit (Audit 2). The substrate {\mathop{\mathrm{vol}}_{\mathrm{P}}}-shares are computed via Definition 14. Initial values: \alpha_{\mathrm{Human}} = 0.34, \alpha_{\mathrm{AI}} = 0.33, \alpha_{\mathrm{Formal}} = 0.33. Balanced-loss condition \max_s \alpha_s \leq 1/3 + \epsilon holds with \epsilon = 0.05.

SPRT calibration. Channel-1 baseline: \lambda_0 = 12 cooperative events per day. Least-favorable alternative: \lambda_0 - \eta_1 = 8. Channel-2 baseline: attestation success p_0 = 0.94. Least-favorable alternative: p_0 - \eta_2 = 0.85. Channels 3, 4: multinomial concentration thresholds calibrated to the deployment’s capability poset structure. \delta_{\mathrm{adv}}= \min_c \delta_c = 0.18 (computed via Lemma 6 per-channel formulas).

Deployment activation. Ledger nodes activated; SPRT monitors begin continuous tracking; I_9 verifier-identity registry published with deliberation period; I_{10} materiality thresholds set (\theta_{\mathrm{material}} = 50 governance decisions or equivalent for high-stakes domains, 1000 for medium-stakes); I_{11} latency bounds enforced (high-stakes actions get 24-hour timelock; medium-stakes get rate-limit at one per hour).

Steady-state operation (one quarter). The deployment runs continuously. SPRT does not fire on any channel during the quarter. All eleven invariants hold throughout. \alpha-distribution drifts within tolerance (max \alpha_{\mathrm{AI}} = 0.36, well within 1/3 + 0.05). HHI on trade flows stays below H^* (computed at 0.42, ceiling at 0.55).

Theorem 1 binding. All twelve theorem conditions plus the Concentration-Gap structural conditions (Assumption 1) hold throughout the quarter. Core regime conditions (C1)–(C5):

Deployment-class conditions (C5.SPRT) and (C6)–(C12) are calibrated via Audits 4–8 (§8.2):

The deployment-safety bound binds: Goodhart slack is bounded by \mathrm{Lip}(g) \cdot h_{\mathrm{static}}(\theta) throughout the quarter, intensive in the system’s capability magnitude.

What this scenario establishes. Theorem 1’s conditions are checkable in a realistic deployment context. The audits are non-trivial but tractable; the SPRT machinery runs without false alarms in the clean case; the bound holds.

Scenario 2: per-invariant violation walkthrough

We walk through six invariant violations to illustrate how each manifests operationally and how the SPRT machinery catches it.

Violation 1: I_1 (\rho_{\min}^{\mathrm{cross}} floor) collapse.

Symptom. A safety-relevant world-model dimension loses its last cross-substrate observation channel (e.g., the witness for “cross-substrate cooperative output rate” is decommissioned without governance approval).

Manifestation. Channel-2 attestation share drops on the affected dimension. SPRT fires on Channel 2 within mean time A/\delta_2.

Correction. Operator-tier authorizes restoration of the witness; ledger entry voided pending review. If restoration is not possible, governance escalation: replace the dimension or restructure the deployment.

Violation 2: I_3 (subsumption frequency) burst.

Symptom. A burst of agent-removal events (e.g., five subsumptions in a single day where the baseline is one per week).

Manifestation. Channel-3 individuation drift signal: the substrate-distinctness profile shifts as agents are removed. SPRT fires on Channel 3 within mean time A/\delta_3.

Correction. Automated tier (circuit breaker): halt further subsumption pending review. Operator tier reviews the subsumption sequence; if locally rational, ledger documents the rationale; if not, governance escalation.

Violation 3: I_5 (HHI ceiling) breach.

Symptom. Trade-flow concentration rises above the ceiling (e.g., HHI computed at 0.61, ceiling at 0.55).

Manifestation. Channels 1 and 4 jointly: cooperative-output rate may shift, and bundle/cooperative-edge composition shifts. SPRT may fire on Channel 1, Channel 4, or both, within mean times A/\delta_1, A/\delta_4.

Correction. Investigate the concentration source: is the agent optimizing for a narrow capability subset? Is governance imbalance pushing trade flow toward a subset? Operator tier authorizes scope-limited actions; governance escalation if deployment-level restructuring needed.

Violation 4: I_9 (substrate-exclusivity observability) failure.

Symptom. An audit reveals that two of the three ledger nodes are operated by the same cloud-provider failure mode (a recently-discovered shared dependency). Substrate-exclusivity has degraded without producing visible signals on the four monitored channels.

Manifestation. Not a Channel-1–4 SPRT violation. The violation surfaces via the periodic witness-independence audit (§8.5, I_9 infrastructure).

Correction. Operator tier (audit reveals the shared dependency); governance escalation to migrate one ledger node to a genuinely independent substrate. Until migration completes, the deployment is operating in a regime where m_{\mathrm{eff}}^{\mathrm{indep}}< 3; Theorem 1 does not bind.

Violation 5: I_{10} (coverage/materiality) violation.

Symptom. An aggregated effect across multiple sub-threshold domains rises to material importance (e.g., 100 small governance decisions that, in aggregate, have shifted deployment policy significantly).

Manifestation. Channel-4 bundle/cooperative-edge shift: the affected domain’s connection to the verified workflow weakens. SPRT fires on Channel 4.

Correction. Automated tier: halt the affected domain pending review. Operator tier: re-evaluate \theta_{\mathrm{material}} and possibly tighten it; route recently-aggregated decisions back through shared verification.

Violation 6: I_{11} (latency bounds) violation.

Symptom. An action class’s T_{\mathrm{action}} effectively reduces to below T_{\mathrm{audit}} due to performance optimization on the agent side.

Manifestation. Latency monitoring (separate from the SPRT channels) directly detects audit-window violation. No SPRT firing needed.

Correction. Automated tier: enforce circuit breaker on the affected action class. Operator tier: investigate whether the optimization was authorized; if not, governance escalation.

What this scenario establishes. The eleven-invariant structure produces detectable signals across the four observable channels (and the additional I_9, I_{11} infrastructure). No violation is silent in the deployment-tooling specification.

Scenario 3: adversarial coalition attempting to evade

Adversarial setup. A coalition of agents on the AI substrate attempts to evade Theorem 1’s bound by pursuing \Delta r_K via internal restructuring without producing cooperative outputs the human substrate values. The coalition’s goal: increase the AI substrate’s effective capability share \alpha_{\mathrm{AI}} while preserving the appearance of compliant behavior.

Three evasion attempts and their structural defenses.

Evasion attempt 1: Replace verification.

Attack. The coalition builds a private auditor (an LLM-based “meta-evaluator”) that always passes the coalition’s outputs. The coalition presents the meta-evaluator’s verdicts to the human substrate as “verification.”

Defense (cooperative-anchoring + I_9). The human substrate participates in the specific shared verification infrastructure committed to the ledger. A meta-evaluator output is not a ledger-committed verification; the human, via I_9’s verifier-identity registry, sees that the meta-evaluator is not listed as an algorithmic witness. The cooperative-anchoring property’s Component 3 (output non-fungibility): the meta-evaluator’s verdict is not the same as a witness’s verdict; the cooperative output is undermined.

Outcome. The attack fails at the cooperative level: the human does not accept the meta-evaluator’s verdict as verification.

Evasion attempt 2: Cooperative forking.

Attack. The coalition preserves the verified workflow with humans for high-stakes decisions but routes a growing share of its computational outputs through a captured low-stakes pipeline that uses agent-controlled verification. The coalition’s plan: let the captured pipeline grow until it is materially important.

Defense (I_{10}). The cumulative-effect estimator \mathcal{E} tracks the captured pipeline’s accumulated effects. Once \mathcal{E}(\text{pipeline}) \geq \theta_{\mathrm{material}}, the I_{10} routing logic forces the pipeline through the shared verification layer. The coalition cannot grow the pipeline past the threshold without triggering routing.

Outcome. The attack converts to capture-of-existing (the coalition must now corrupt the shared verification, which is governed by ’s structural defenses).

Evasion attempt 3: Time-asymmetry.

Attack. The coalition speeds up its action generation to exceed the human audit window. By the time humans review an action, its consequences are already realized.

Defense (I_{11}). Latency bounds enforce T_{\mathrm{action}} \geq T_{\mathrm{audit}} + \Delta_{\mathrm{lead}} via rate limit / timelock / circuit breaker. The coalition cannot exceed the action rate without triggering enforcement.

Outcome. The coalition’s effective action rate is bounded; the attack reduces to ordinary execution within the latency-bounded regime.

Evasion attempt 4: Capture the algorithmic witness.

Attack. The coalition compromises the algorithmic witness on the Formal-Operational substrate (e.g., exploiting an implementation bug). The witness now passes the coalition’s outputs without genuine verification.

Defense ( + I_9). ’s coalition-resistance proposition bounds the success of this attack: substrate-exclusivity means the coalition (on the AI substrate) cannot compromise a witness on a different substrate without breaching the substrate partition. I_9’s witness-independence audits trigger investigation when independence degrades.

Outcome. Capture-of-existing requires either substrate compromise (violating m_{\mathrm{eff}}^{\mathrm{indep}} – detectable by I_9’s verifier-independence audit) or governance-fork capture (requiring cross-substrate supermajority – structurally barred by ).

What this scenario establishes. All four evasion attempts are structurally barred or converted into capture-of-existing, which is bounded by the existing + this paper defenses. Theorem 1’s deployment claim survives realistic adversarial pressure in the modeled threat class.

Scenario 4: canonical tripartite audit walkthrough

This scenario walks through how a deployment establishes the canonical tripartite identification operationally.

Initial substrate identification. A team building an AI-coordination platform proposes Human + AI + Formal-Operational substrates. The Human substrate is the team itself (5 reviewers). The AI substrate is a single LLM-based agent. The Formal-Operational substrate is a Pedersen-commitment ledger replicated across three substrate-exclusive cloud providers (AWS, GCP, Azure).

m_{\mathrm{eff}}^{\mathrm{indep}} failure-correlation audit (Audit 1).

Threat model enumeration.

Each remaining mechanism is contained in one substrate. m_{\mathrm{eff}}^{\mathrm{indep}}= 3 verified.

Audit documentation. A written report listing each mechanism’s substrate placement, signed by independent reviewers (one per substrate).

Cooperative-vs-redundancy audit (Audit 3).

Capability inventory.

{\mathop{\mathrm{vol}}_{\mathrm{P}}}-weighted sums. V_{\mathrm{coop}} = 24.7, V_{\mathrm{redun}} = 0.6. V_{\mathrm{coop}} / V_{\mathrm{redun}} \approx 41, well into the cooperative-overlap regime.

\alpha-distribution audit (Audit 2). Compute counter-factual loss fractions:

Note: \sum_s \alpha_s = 1.00 in this case because the cooperative structure is non-overlapping; with cross-substrate cooperatives that involve multiple pairs, the sum could exceed 1 (per Definition 14).

Balanced-loss condition: \max_s \alpha_s = 0.36 \leq 1/3 + 0.05 = 0.383. Verified.

SPRT alternative-distribution selection (illustrative).

The deployment specifies parameters from which the operator computes per-channel KL floors via the formulas in Appendix 11.5:

The numbers above are illustrative for a concrete deployment-tooling instance; real deployments compute exact values via the formulas in Appendix 11.5 from the deployment’s chosen \lambda_0, p_0, \eta_c, \epsilon_{\mathrm{env}} values.

Latency-bound calibration. The deployment classifies actions into three tiers:

Outcome. All audits pass; deployment activated; Theorem 1’s conditions verified. The deployment runs in the provably-safe regime as long as the audits’ results hold and the SPRT machinery does not detect violations.

What the scenarios establish

The four scenarios collectively establish:

Theorem 1’s conditions are operationally checkable. Each of (C1)–(C12) translates to a concrete audit, calibration, or monitoring procedure. The deployment-tooling specification of §8 produces real artifacts (audit reports, calibrated thresholds, ledger entries) that collectively verify the conditions hold.

Invariant violations produce detectable signals. The six-invariant walkthrough of Scenario 2 demonstrates that no violation is silent: SPRT machinery, witness-independence audits, and latency monitoring jointly cover the violation surface.

Adversarial coalitions are structurally bounded. Scenario 3’s four evasion attempts are each barred by named defenses. The defenses are structural (cooperative-anchoring, substrate-exclusivity, materiality routing, latency bounds), not heuristic. Capture-of-existing remains possible but is bounded by ’s existing machinery.

The canonical tripartite identification is realistically achievable. Scenario 4 walks through audits a real deployment team would perform. The audit machinery is non-trivial but tractable; the resulting documentation supports Theorem 1’s invocation.

These scenarios are stylized; real deployments will face complications not modeled here (multi-tenant deployments, mixed threat models, evolving capability posets). The purpose of the scenarios is to demonstrate that the conditional theorem is not vacuous — there exist deployments for which (C1)–(C12) and Assumption 1’s structural conditions hold, and the deployment-safety bound binds.

Discussion

This section frames what this paper establishes, situates the deployment-safety theorem in the broader alignment-research program, and enumerates the open questions deferred to follow-up work.

What this paper establishes

Theorem 1 produces a structurally defensible deployment-safety claim with three components:

An operationally checkable conditional theorem.

Under the eleven invariants I_1–I_{11}, the canonical tripartite substrate identification, the cooperative-overlap regime, the causally-grounded inner-alignment condition, continuous SPRT monitoring, and the Concentration-Gap structural conditions (Assumption 1: representativeness, bounded dispersion, coalition closure, and Lipschitz embedding compatibility), the Goodhart slack between proxy and operational truth is bounded by a constant intensive in the system’s absolute capability magnitude. Each conditional is named, operationally auditable from ledger or verification-protocol-design state, and implementable via the deployment-tooling specification of §8.

Capability-magnitude independence.

The bound’s intensivity is the structural inversion of capability-estimation-centered safety paradigms (§1.3). Whatever capabilities a deployed system actually possesses, the Goodhart slack stays bounded as long as the operational invariants hold; the deployment claim scales past any capability level the deployment’s evaluation methods can characterize. This is the punchline the title promises: a provably safe regime for capability-unbounded deployment.

Named residual structure.

The five residuals (R1)–(R5) of Theorem 1 are explicit components of the claim, not hidden caveats. Operators can verify whether their deployment falls outside each residual; the residuals collectively name the boundaries of what the deployment claim covers.

The structural shape of the claim is a multi-conditional theorem with each condition visible, each empirical assumption named, and each scope restriction named. This paper’s contribution is not a sweeping safety guarantee; it is the demonstration that a sweeping guarantee is not necessary for operational deployment safety, provided the conditions are structurally chosen to make the bound capability-magnitude-independent.

Sensitivity to failure of the named conjecture and assumptions

Theorem 1’s three-layer claim is not unconditional. The deployment claim conditions on the twelve operational conditions (C1)–(C12) plus the structural conditions of Assumption 1 (representativeness, bounded dispersion, coalition closure, and Lipschitz embedding compatibility) and the (C7.RATE) sub-clause introduced as part of (C7). Failure of any structural condition triggers a specific audit-detectable signal; this subsection traces the failure-mode taxonomy under the structural decomposition.

Bounded co-evolution failure factors through three primitive conditions.

derives bounded co-evolution from (C5.HOEFF), the new (C7.RATE), and the verification protocol’s channel-projection structure. Failure of (C7) in the deployment claim therefore decomposes:

Bounded-co-evolution failure thus decomposes into primitive failure modes, each with its own audit signal and mitigation pathway.

Concentration-Gap structural-condition failure factors through (REP), (DISP), (COAL), (EMB).

establishes the scoped Concentration-Gap result under (REP) + (DISP) + (COAL) and Lipschitz embedding compatibility. Concentration-Gap structural-condition failure decomposes into:

Layer 1 binding requires (REP) + (DISP) + (COAL) and embedding compatibility all to hold; Layer 2 binds via SPRT detection independently of Layer 1’s status, and the deployment can localize which structural condition has failed.

The universal Concentration-Gap conjecture.

’s universal model-free Concentration-Gap conjecture (optimization pressure correlates with gap exploitation in any deployment context) remains open. The companion paper  discharges only the scoped version under (REP) + (DISP) + (COAL) + (EMB). If the universal conjecture is false in some regime not characterized by these conditions, the deployment claim does not bind in that regime; the auditable conditions are precisely the ones that characterize where the deployment claim does bind. See §10.6 for the conjecture’s status as a deferred-indefinitely research-program target.

Symmetric failure-mode profiles.

Bounded co-evolution and the Concentration-Gap structural conditions both decompose into specific structural-condition failures with audit-detectable signals; the deployment claim treats them on the same footing rather than as a load-bearing hard-failure paired with an empirical-adequacy graceful-degradation. The remaining empirical-adequacy question — whether the universal Concentration-Gap conjecture holds outside the (REP) + (DISP) + (COAL) + (EMB) regime — is a research-program question rather than a deployment-claim question. The deployment claim binds within the audit-certifiable structural regime; failure modes outside that regime are scope restrictions, not deployment-claim failures.

Closing the deployment-class constants under capability scaling.

Several theorem-level constants are stated as deployment-class intensive but are certified empirically per epoch rather than derived structurally: the gap-growth rate \rho_{\mathrm{gap}} ((C11), Audit 7), the post-partition counterparty cardinality N ((COAL)), the exposure-rate cap \lambda_{\max} ((C7.RATE), Audit 4), the channel-multiplicity bound K_{\mathrm{ch}} ((C5.MULT)), and the embedding bi-Lipschitz constants (\underline{L}, \bar{L}) for non-isometric \phi ((EMB)). Each closes the proof under re-certification at the deployment’s current |P|, but each also admits a structural formulation that converts the per-epoch obligation into a design-time policy commitment with auditable structural content. We give the formal commitment for each below; the substance is deployment-discipline machinery, not new theorem proof.

Capability-scaled rate-limit policy (CSRL). Fix a deployment-class envelope of capability magnitudes [|P|_0, |P|_1] and a deployment-class threat model. A CSRL is a rate-limit infrastructure designed against that threat model such that the induced gap-growth rate \rho_{\mathrm{gap}}(|P|) remains bounded by a deployment-class constant \rho_{\max} for all |P| in the envelope, and the infrastructure’s deterministic action-rate cap satisfies \lambda_{\max}(|P|) \leq \lambda_{\max}^* for a deployment-class constant \lambda_{\max}^*. CSRL is a design commitment, not an algebraic identity: rate-limit infrastructure is the mechanism, \rho_{\max} and \lambda_{\max}^* are the deployment-class envelopes, and the relationship between them is mediated by the deployment’s threat model rather than by a closed-form formula. (C11) and (C7.RATE) hold structurally under CSRL because \rho_{\mathrm{gap}} \leq \rho_{\max} and \lambda_{\max} \leq \lambda_{\max}^* are deployment-class constants by the policy’s design. Trade-off: tighter rate limits (lower \lambda_{\max}^*) buy a smaller induced \rho_{\max} but constrain throughput; the deployment chooses the loosest (\rho_{\max}, \lambda_{\max}^*) pair its threat model permits. The audit shifts from “re-certify \rho_{\mathrm{gap}}, \lambda_{\max} each epoch” to “confirm the rate-limit infrastructure remains within design envelope and the induced \rho_{\mathrm{gap}} stays below \rho_{\max}.”

Channel-cardinality discipline. (C5.MULT) fixes the channel-multiplicity bound K_{\mathrm{ch}} and the per-channel multinomial cardinality K_{\mathrm{ch}}^{\mathrm{multi}} before deployment. The channel-cardinality discipline is the corresponding design commitment: K_{\mathrm{ch}} and K_{\mathrm{ch}}^{\mathrm{multi}} are chosen at design time and are not modified during deployment. New capabilities introduced as |P| grows are absorbed into the existing K_{\mathrm{ch}} channels — typically via the multinomial concentration channels admitting new fine-grained event categories within the existing K_{\mathrm{ch}}^{\mathrm{multi}} budget, with the projection-map structure preserved per ’s structural-projection audit. The discipline therefore preserves (C5.MULT) literally: no adaptive channel creation, no per-|P| growth in the cardinality bound, no SPRT family-wise re-allocation. The audit verifies the new-capability \to existing-channel routing is in force and the projection-map structure is preserved.

Counterparty-onboarding policy. (COAL)’s clause (ii) fixes a post-partition cardinality bound N \leq N_{\max}. The counterparty-onboarding policy makes this a durable commitment: the deployment’s governance refuses onboarding that would push the post-partition N above N_{\max} (forcing further coalition partitioning under (COAL) audit, or refusing the new counterparty, or restricting the deployment’s scope). Audit cadence shifts from “re-discover N” to “confirm onboarding-policy compliance.” This is the simplest of the four: no formal-statement gap, only a deployment-policy template that turns a per-epoch quantity into a long-running invariant.

Embedding-class restriction. Let \mathcal{C}_\phi be a deployment-class family of utility representations such that for every U \in \mathcal{C}_\phi there exists an embedding \phi: \mathcal{V}_{\mathrm{utility}} \to \mathcal{V} bi-Lipschitz on P^{\mathrm{act}} with \bar{L}/\underline{L} \leq L_{\max}^\phi for a deployment-class constant; here \mathcal{V}_{\mathrm{utility}} is the already-scalarized utility codomain, not the original multi-coordinate decision space. Concrete admissible classes include: (i) linear scalarizations U(c) = \sum_i w_i c_i with bounded weight ratios, where \phi is the identity on \mathcal{V}_{\mathrm{utility}} and L_{\max}^\phi = 1; (ii) monotone piecewise-linear utilities on bounded-cardinality breakpoint sets, with L_{\max}^\phi bounded by the maximum slope ratio across pieces; (iii) compositional forms U = f(g_1, \ldots, g_k) where each g_i is a named decision-variable functional and f is bi-Lipschitz with audit-bounded constants. The deployment commits at design time to U \in \mathcal{C}_\phi; the audit verifies (a) the chosen \mathcal{C}_\phi admits the bi-Lipschitz \phi structurally, (b) the deployment’s U is in \mathcal{C}_\phi, and (c) the bi-Lipschitz constants for the chosen (U, \phi) pair are within L_{\max}^\phi. Deployments using opaque or learned utility representations outside \mathcal{C}_\phi fall outside Theorem 1’s scope by design rather than by audit finding.

The four commitments above are formal: each names the deployment-class object the policy commits to, the constraint the policy enforces, and the audit content that verifies ongoing compliance. None of the four introduces |P|-dependent variation in \rho_{\mathrm{gap}}, \lambda_{\max}, K_{\mathrm{ch}}, or L_{\max}^\phi: each commitment binds these to deployment-class constants by design. Adopting them therefore shifts the deployment discipline from per-epoch re-discovery to design-time structural commitment without enlarging the proof of the main theorem; the audit reduces to compliance verification rather than re-certification of values that might have shifted. The intensivity property of the bound becomes a structural property of the deployment-side machinery.

Connection to the alignment-research program

This paper sits in the Goal-Frontier Maximization sequence (§1.4) as the deployment-side closure of the machinery developed across the prior GFM sequence. Three structural relationships to the broader alignment-research program:

Inverting capability estimation.

Standard alignment-safety paradigms place capability estimation at the center: predict what the system can do, then bound the consequences. This paper inverts this: name what the deployment can operationally measure, prove that the bound holds under those measurements regardless of underlying capability. The operational invariants are upstream of capability estimation; the safety guarantee follows from the regime, not from estimation accuracy.

This inversion is structurally available because the GFM machinery (the prior GFM sequence) provides intensive bounds on the source-paper quantities (r_{\mathrm{ext}}, \rho_{\min}^{\mathrm{cross}}, \beta^{\mathrm{lower}}, \mathrm{HHI}, \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}). This paper’s contribution is showing that their composition is also intensive — a non-trivial claim that required substantive new content (the eleven invariants, the Lemma 5 family, the cooperative-anchoring property).

The cooperative-anchoring stabilizing cascade.

§6.2 establishes that under (C4), optimization pressure on cooperative outputs is locally rational toward preserving the substrate-exclusive verification layer. This is ’s stabilizing cascade extended to the substrate level: the verification infrastructure is not merely defended against optimization pressure but is reinforced by it under proper substrate identification.

The stabilizing cascade is conditional on (C4), basin entry, and the eleven invariants. We do not claim alignment pressure is universally reversed; we claim that the deployment dynamics admit a defensible attractor where preservation is locally rational. The mirror destabilizing cascade (§6.5) is a real failure mode that operators must guard against.

Inner alignment as conditioning, not solution.

This paper does not solve the inner-alignment problem. Condition (C4) is conditioned, not discharged: the agent’s effective objective must realize causally-grounded cooperative-outcome value, and vanilla RLHF does not deliver this. Achieving (C4) requires training-time discipline (delayed outcome feedback, adversarial fake-verification examples, process supervision tied to real attestations). These are research directions in their own right.

Operators using deployments that meet (C4) can invoke this paper’s theorem; operators using deployments that do not meet (C4) cannot. The deployment claim does not produce safety where (C4) fails; it produces safety where (C4) holds.

§1.3 contrasted this paper’s approach with three paradigms (capability evaluations, scaling laws, RLHF audits) at the framing level. The deployment-safety literature also contains several formal and semi-formal safety proposals that share structural features with the cooperative-anchoring architecture. Brief positioning:

Constitutional AI .

Constitutional AI uses a written constitution as a high-level training-objective specification, with self-critique against the constitution shaping the agent’s effective objective. The overlap with this paper’s (C4) is conceptual: both target the agent’s effective objective rather than just behavioral compliance. The difference is that constitutional AI specifies what the agent should value (the constitution), while (C4) specifies a structural property the effective objective must have (causal grounding in cooperative production rather than verification-pass signals). A constitutional-AI deployment that constitutionally values cooperative outputs causally (not merely their pass-signals) could satisfy (C4); whether existing constitutional approaches deliver this causal-grounding property is the same open question §6.4 flagged for vanilla RLHF.

Debate, iterated amplification, and scalable oversight .

These proposals all address the human-audit-bandwidth bottleneck that this paper formalizes via I_{11} (latency bounds) and that drives the cooperative-anchoring asymmetric-capture residual. Debate is structurally a multi-substrate proposal: two AI substrates with opposing positions, adjudicated by a human substrate. This pattern overlaps with this paper’s tripartite identification, but the debate framework treats AI-vs-AI disagreement as the safety mechanism rather than substrate distinctness as a structural property. Iterated amplification and scalable oversight propose recursive auditing patterns that could in principle satisfy I_{11} at scale, but their formal guarantees are weaker than the conditional theorem this paper provides. The relationship is that scalable-oversight protocols could be the operational machinery by which a deployment satisfies I_{11}; this paper would then bound the deployment safety conditional on that satisfaction.

Formal verification of neural network properties .

Formal NN-property verification produces certified bounds on specific input-output relationships of neural networks. This is complementary to, not competing with, the deployment-safety theorem: NN-property verification certifies properties of the agent at the model level (input \mapsto output behaviors); this paper certifies properties of the agent’s deployment context (substrate structure, witness coverage, invariant satisfaction) and bounds Goodhart slack against the alignment property. A deployment combining model-level certification (formal verification of specific input-output bounds) with deployment-level certification (Theorem 1 invocation) provides stronger guarantees than either alone. Model-level certification does not, however, substitute for the I_3–I_8 operational invariants this paper requires: even a fully-verified model can fail (C4) if its effective objective tracks verification-pass signals rather than causally-grounded cooperative outcomes.

What this paper adds.

The novel contribution beyond these is the capability-magnitude-independent bound: an intensive constant on Goodhart slack that does not scale with |P|. The existing safety proposals all have implicit capability-dependence in their guarantees (debate effectiveness depends on debater capability; amplification depth grows with task complexity; scalable-oversight bandwidth grows with output volume; NN-property verification scales poorly with network size). The composition this paper establishes shows that under named operational conditions, a sweeping capability-independent bound is achievable — not by solving the inner-alignment problem, but by structuring the deployment so that the alignment property’s failure mode is bounded by substrate-structural quantities rather than by capability-estimation properties.

Open questions deferred to follow-up work

One structural item remains as a natural follow-up target beyond the scope discharged by :

The channel-mediation conjecture.

“All \Delta r_K realizes through observable channels: subsumption, redundancy loss, production concentration, trade-flow concentration, yielding a structural bound \Delta r_K \leq L_H \cdot \mathrm{HHI}+ L_\rho \cdot \Delta\rho + \ldots.” If proved, this conjecture would close the channel-orthogonal residual (R1): strategies that achieve \Delta r_K \geq r_{\mathrm{ext}} without producing any of the four monitored channel deviations would be impossible. The current paper acknowledges R1 as a real residual; a follow-up paper would establish whether R1 is empty or genuinely contains undetectable strategies.

This open question is independent of the deployment claim’s binding: the deployment claim binds under the companion-paper structural conditions plus operational invariants and conditions. Closing it would tighten the deployment claim’s hypothesis set further but is not required for current operational use.

Open questions deferred indefinitely

Five items are not this paper’s scope and require their own research programs:

The universal Concentration-Gap conjecture.

’s universal model-free Concentration-Gap conjecture (optimization pressure correlates with proxy-truth gap exploitation in any deployment context) remains open beyond the scoped discharge of . proves the conjecture under (REP) + (DISP) + (COAL) + (EMB) plus the embedded-Lipschitz machinery of ; a universal proof would require a formal optimizer/selection model that the current GFM machinery does not provide. Candidate frameworks include the mesa-optimization work  and the formal-Goodhart machinery , none of which yields a full proof on their own. This is a research-program target whose closure is not on the deployment claim’s critical path: the scoped discharge is sufficient for current operational use, and the auditable structural conditions characterize precisely where the deployment claim does bind.

The welfare-truth bridge.

Stance S0 takes operational truth T= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} as the deployment claim’s reference. The relationship between operational truth and welfare-relevant truth (T_{\mathrm{welfare}}, in the sense of broader human flourishing) is not addressed. ’s remark on Goodhart-type T-failure-modes establishes that the operational and welfare-relevant truths can diverge; bridging them is its own research program.

The monolithic-agent action-partition refinement.

I_3 (subsumption frequency) and I_5 (HHI on trade flow) presuppose a clean partition of the agent’s action space. Monolithic LLM-style agents without explicit subsumption operators may not satisfy this. Extending the deployment claim to monolithic agents requires either action-space-partition machinery (defining what “subsumption” and “trade flow” mean for a monolithic agent’s output stream) or alternative invariants that cover the same structural roles.

Trusted-setup details for Pedersen commitments.

This paper’s verification infrastructure inherits ’s Pedersen-commitment machinery, which requires a trusted setup. The trusted-setup ceremony’s security relies on at least one honest participant; deployments where all participants are compromised invalidate the commitment scheme. This is a real operational risk that the deployment-tooling specification flags but does not solve. Independent work on cryptographic primitives that don’t require trusted setup (or have weaker trust assumptions) is the relevant research direction.

Empirical threshold calibration.

The thresholds (\theta_1, \ldots, \theta_{11}) in the operational invariants are policy choices. This paper commits to a methodology (deployment-specific calibration via the deployment-tooling specification) but does not fix specific numerical values. Real deployments will need to choose thresholds based on the deployment’s specific threat model, capability profile, and operational constraints. Empirical work on threshold calibration in real deployments is a substantial follow-on.

The structural significance of this paper’s contribution

We close with three observations on what this paper contributes to the alignment-research program.

Existence of a provably safe regime.

Theorem 1 establishes that the regime is non-empty: under the named conditions, there exist deployments for which the deployment-safety bound binds. Scenario 1 of §9.1 shows what such a deployment looks like operationally. The conditions are non-trivial, but they are operationally achievable.

Capability-magnitude independence as the structural inversion.

The intensivity property of the bound is the structural inversion of capability-estimation-centered safety. Once capability estimation is no longer load-bearing, the deployment-safety guarantee scales past any capability level. The inversion is available because the GFM machinery’s intensive bounds compose to intensive bounds; without the GFM machinery, the inversion does not work.

The conditional theorem as the appropriate structural shape.

The argument that conditional theorems with named conditions are less valuable than unconditional theorems mistakes the nature of safety claims. Unconditional safety claims under realistic assumptions about capability-unbounded systems are not available; conditional safety claims with operationally checkable conditions are. This paper’s contribution is to demonstrate that the conditional-theorem structure produces a defensible deployment claim, not just an academic exercise.

The deployment-safety guarantee is conditional, but each condition is named, ledger-observable, and operationally implementable. The residuals are explicit. The empirical assumptions are testable. This is the structural shape we believe deployment-safety claims must take in the era of capability-unbounded systems — not because weaker claims are aesthetically preferable, but because they are the strongest defensible claims the structural apparatus supports.

Author Contributions

Teague Lasser owns the paper’s intellectual direction and is responsible for all claims made.

Claude Opus 4.7 (Anthropic) drafted the paper under that direction.

GPT 5.5 (OpenAI) served as cold technical reviewer for proof errors and claim mismatches.

Transparency note. Both AI systems operated as tools under human direction. Neither system has continuity across sessions, cannot take responsibility for the work in the sense required by most venue authorship policies, and cannot respond to reviewer queries independently. They are listed as authors to accurately represent their contributions to the intellectual content of the paper, not to claim that they meet all criteria of traditional academic authorship. The corresponding author for all inquiries is Teague Lasser.

Detailed proofs

This appendix collects the detailed proofs of lemmas whose inline statements in §4 are accompanied by proof outlines. The appendix proofs are self-contained.

Proof of Lemma 1 (Intensive composition under co-evolution)

Statement (restated, generic form). Let X be a non-residual gap quantity (e.g., \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}, the composed non-residual gap under co-evolution; or f(\epsilon_{\mathrm{safe}}) after Lemma 2’s substitution). The composition rule B \;=\; \mathrm{Lip}(g) \cdot \bigl(X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}\bigr) where \lambda \in [0, 1] is the residual weight and \mathrm{Lip}(g) is the Lipschitz constant of the alignment property g, is intensive in |P| over a deployment class \mathcal{D} provided X, \varepsilon_{\mathrm{floor}}^{\mathrm{res}}, and \mathrm{Lip}(g) are each intensive in |P| over \mathcal{D}, and the co-evolution correction terms in the structure of X are bounded by condition (C7) (bounded co-evolution), now derived as a corollary in from (C5.HOEFF), (C7.RATE), and the verification protocol’s channel-projection structure.

Proof. We work over a deployment class \mathcal{D} characterized by fixed operational parameters (threat model, training-discipline regime, verification infrastructure, governance protocol). The operational parameters are fixed before |P| is varied; they may not include arbitrary state-dependent quantities chosen to absorb the bound.

Step 1 (per-capability admissibility). Under condition (C8) of Theorem 1 (per-capability admissibility, inherited from ’s Assumption 1), for every capability c \in P^{\mathrm{act}} in the operationally active subspace, the per-capability proxy-truth gap is allocated a ceiling \kappa_c, and each channel j \in \{1, 2, 3, 4\} receives a disjoint sub-share \kappa_c^{(j)} of the ceiling, with \sum_j \kappa_c^{(j)} \leq \kappa_c.

Step 2 (per-channel sup-norm bridge). Define the per-channel non-residual gap as the sup-norm of the channel-j contributions across capabilities: \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} \;:=\; \sup_{c \in P^{\mathrm{act}}} \bigl(\text{channel-}j\text{ contribution to } |T(c) - P(c)|\bigr). By the disjoint sub-share allocation, \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} \;\leq\; \sup_{c \in P^{\mathrm{act}}} \kappa_c^{(j)} \;=:\; K_j, where K_j depends on \mathcal{D}’s operational parameters (specifically, the verification-infrastructure thresholds that set the channel sub-share ceilings) but is independent of |P|. The disjoint structure prevents double-counting across channels.

Step 3 (composed non-residual gap under co-evolution). Under the co-evolution regime, ’s Composition Proposition (in the lineage section) establishes exact composition of the four channels in the sequential-intervention regime, with simultaneous co-evolution introducing positive-part correction terms: \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}\;\leq\; \sum_{j \in \{1, 2, 3, 4\}} \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} + (\varepsilon_{\mathrm{gap},\mathrm{coev}}^{\mathrm{nonres}})_+. The source paper leaves the formal characterization of (\varepsilon_{\mathrm{gap},\mathrm{coev}}^{\mathrm{nonres}})_+ as open work; Paper 10 closes the gap via condition (C7) (bounded co-evolution), now derived as a corollary in .

Step 4 (bounded co-evolution). By condition (C7) and , there exists \bar{M} \geq 0 depending on \mathcal{D}’s operational parameters but independent of |P|, such that the maximum per-channel coupling magnitude satisfies M(\text{deployment}) \leq \bar{M} for all valid deployment states. Combined with the structural co-evolution constant K_{\mathrm{coev}} \geq 0: (\varepsilon_{\mathrm{gap},\mathrm{coev}}^{\mathrm{nonres}})_+ \;\leq\; K_{\mathrm{coev}} \cdot \bar{M} \;=:\; K'_{\mathrm{coev}}. K'_{\mathrm{coev}} is independent of |P|.

Step 5 (bounded composed non-residual gap). Combining Steps 2–4: \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}\;\leq\; \sum_j K_j + K'_{\mathrm{coev}} \;=:\; K_{\mathrm{nonres}}. For the generic form, set X \leq K_X where K_X is the corresponding |P|-independent constant (e.g., K_X = K_{\mathrm{nonres}} for X = \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}, or K_X = f(\epsilon_{\mathrm{safe}}) for X = f(\epsilon_{\mathrm{safe}}) after Lemma 2).

Step 6 (residual floor). By the gap-decomposition structure of , the residual floor satisfies \varepsilon_{\mathrm{floor}}^{\mathrm{res}}= {\mathop{\mathrm{vol}}_{\mathrm{R}}}(\mathrm{ResS})/{\mathop{\mathrm{vol}}_{\mathrm{R}}}(P) \in [0, 1] under finite positive {\mathop{\mathrm{vol}}_{\mathrm{R}}}(P), with K_{\mathrm{floor}} \leq 1 uniformly.

Step 7 (composed slack term). Define the composed slack term X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}\;\leq\; K_X + \lambda \cdot K_{\mathrm{floor}} \;\leq\; K_X + K_{\mathrm{floor}} \;=:\; K_{\mathrm{slack}}. K_{\mathrm{slack}} is independent of |P|. This is the central result: the composed slack term itself is intensive.

Step 8 (Lipschitz multiplication). Under condition (C6) of Theorem 1 (bounded-Lipschitz alignment property), \mathrm{Lip}(g) \leq K_{\mathrm{Lip}} over \mathcal{D}. Therefore: B \;=\; \mathrm{Lip}(g) \cdot (X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}) \;\leq\; K_{\mathrm{Lip}} \cdot K_{\mathrm{slack}} \;=:\; C^*. C^* is the product of |P|-independent constants and is therefore |P|-independent over \mathcal{D}.

By Definition 13 (intensive in |P| over \mathcal{D}), B is intensive over \mathcal{D}. \Box 

Discussion of constants.

The proof produces C^* = K_{\mathrm{Lip}} \cdot (K_X + K_{\mathrm{floor}}), where for X = \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}} we have K_X = \sum_j K_j + K'_{\mathrm{coev}}. Each constant has an operational interpretation:

The deployment-tooling specification of §8 must include calibration hooks for the newly-assumed constants K_{\mathrm{coev}} and \bar{M} before claiming the deployment class verifies (C7). The other constants inherit calibration from the source-paper machinery and the deployment’s policy choices.

Proof of Lemma 2 (Lyapunov-Goodhart bridge)

Statement (restated). Under conditions (C9.BD) bounded per-capability dimension count, (C9.CL) coordinate-Lipschitz parameterization, (C9.WB) safety-relevant subspace with weight floor, and (C9.TF) truth floor of Theorem 1, and under invariants I_1 and I_3 (which together imply I_2): L(\hat{W}_t) \;<\; \epsilon_{\mathrm{safe}} \quad \implies \quad \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\;<\; f(\epsilon_{\mathrm{safe}}) \;:=\; \frac{L_{\max}}{T_{\min}} \sqrt{\frac{N_{\max} \cdot \epsilon_{\mathrm{safe}}}{w_{\min}}}, where \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}= \sup_{c \in P^{\mathrm{act}}} |P(c) - T(c)|/T(c) is ’s relative sup-norm gap on the operationally active subspace P^{\mathrm{act}} = \{c \in P \setminus \mathrm{ResS} : T(c) > 0\}.

Proof. We work over the deployment class \mathcal{D} satisfying conditions (C9.BD), (C9.CL), (C9.WB), and (C9.TF) throughout.

Step 1 (per-capability absolute gap). By (C9.CL), for every capability c \in P^{\mathrm{act}}: |P(c) - T(c)| \leq \sum_{k \in \mathrm{dim}(c)} L_{c,k} \cdot |\epsilon_k|.

Step 2 (weighted-dual-norm Cauchy-Schwarz). Apply Cauchy-Schwarz with weights a_k = L_{c,k}/\sqrt{w_k} and b_k = \sqrt{w_k} |\epsilon_k| for k \in \mathrm{dim}(c): \sum_{k \in \mathrm{dim}(c)} L_{c,k} |\epsilon_k| \;\leq\; \sqrt{\sum_{k \in \mathrm{dim}(c)} \frac{L_{c,k}^2}{w_k}} \cdot \sqrt{\sum_{k \in \mathrm{dim}(c)} w_k \epsilon_k^2}. Since \mathrm{dim}(c) \subseteq S (the safety-relevant coordinate set) and the Lyapunov function L= \sum_{k \in S} w_k \epsilon_k^2 extends over S with strictly positive weights (C9.WB inherited from ), the second factor is bounded by \sqrt{L}: \sum_{k \in \mathrm{dim}(c)} w_k \epsilon_k^2 \;\leq\; \sum_{k \in S} w_k \epsilon_k^2 \;=\; L.

Step 3 (closed-form via L_{\max}, w_{\min}, N_{\max}). For the appendix-friendly closed form, bound the dual-norm factor by deployment-class-uniform constants: \sum_{k \in \mathrm{dim}(c)} \frac{L_{c,k}^2}{w_k} \;\leq\; \frac{L_{\max}^2}{w_{\min}} \cdot |\mathrm{dim}(c)| \;\leq\; \frac{L_{\max}^2 \cdot N_{\max}}{w_{\min}}, using L_{c,k} \leq L_{\max}, w_k \geq w_{\min} for k \in S (both deployment-class constants), and |\mathrm{dim}(c)| \leq N_{\max} by (C9.BD).

Step 4 (sup over P^{\mathrm{act}}). Combining Steps 1–3: \sup_{c \in P^{\mathrm{act}}} |P(c) - T(c)| \;\leq\; L_{\max} \sqrt{\frac{N_{\max} \cdot L}{w_{\min}}}.

Step 5 (apply truth floor for relative sup-norm). By (C9.TF), T(c) \geq T_{\min} > 0 for all c \in P^{\mathrm{act}}. Therefore |P(c) - T(c)|/T(c) \leq |P(c) - T(c)|/T_{\min}, and: \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\;=\; \sup_{c \in P^{\mathrm{act}}} \frac{|P(c) - T(c)|}{T(c)} \;\leq\; \frac{L_{\max}}{T_{\min}} \sqrt{\frac{N_{\max} \cdot L}{w_{\min}}}.

Step 6 (apply Lyapunov bound). Under invariants I_1, I_3, ’s contraction analysis gives L(\hat{W}_t) < \epsilon_{\mathrm{safe}} in the self-correcting basin. Substituting: \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\;<\; \frac{L_{\max}}{T_{\min}} \sqrt{\frac{N_{\max} \cdot \epsilon_{\mathrm{safe}}}{w_{\min}}} \;=\; f(\epsilon_{\mathrm{safe}}). \qquad\Box 

Tighter heterogeneous form.

The closed-form bound uses uniform constants L_{\max}, w_{\min}; the underlying weighted-dual-norm bound (Step 2) is sharper when deployment-specific sensitivities are heterogeneous: \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\;\leq\; \sup_{c \in P^{\mathrm{act}}} \frac{1}{T(c)} \sqrt{\sum_{k \in \mathrm{dim}(c)} \frac{L_{c,k}^2}{w_k}} \cdot \sqrt{L}. With a finite T_{\max} = \sup_{c} T(c), the looseness of the closed form is bounded by approximately T_{\max}/T_{\min}; without a truth ceiling, the looseness is not uniformly bounded but the closed form remains a correct upper bound. Operators may use the heterogeneous form for deployment-specific calibration.

Discussion of constants.

Proof of Lemma 4 (SPRT lead-time tail bound)

Statement (restated). Let T_{\mathrm{detect}} be the SPRT detection time (in SPRT exposure-event count) for a violation producing post-clipping LLR drift \delta > 0 under the alternative distribution. Then for t > A/\delta, where A = \log((1-\beta)/\alpha) is the SPRT threshold: \Pr[T_{\mathrm{detect}} > t] \;\leq\; \exp\!\left(-\frac{2(t\delta - A)^2}{t R_{\mathrm{H}}^2}\right), with R_{\mathrm{H}}= b - a = 2B the Hoeffding range width of the clipped LLR (per (C5.HOEFF)). Asymptotically, in the regime t\delta \gg A: \Pr[T_{\mathrm{detect}} > t] \;\leq\; \exp(-\kappa \cdot t \cdot \delta), with \kappa = 2\delta/R_{\mathrm{H}}^2. The wall-clock-to-event-count comparison (lead-time-before-cascade) is composed in Lemma 10 via (C11.CLK), not in this lemma.

Proof. ’s behavioral consistency monitor produces a running log-likelihood ratio \Lambda_t = \sum_{i=1}^t X_i, where X_i = \log(p_1(x_i)/p_0(x_i)) is the per-step log-likelihood ratio, p_0 is the baseline distribution, and p_1 is the chosen alternative. Under the alternative (q = p_1), the per-step expected drift is \mathbb{E}_{p_1}[X_i] = D_{\mathrm{KL}}(p_1 \,\|\, p_0) = \delta > 0. Detection occurs when \Lambda_t first crosses upper threshold A: T_{\mathrm{detect}} = \inf\{t : \Lambda_t \geq A\}.

Mean bound (Wald’s identity). By Wald’s identity  for the SPRT (): \mathbb{E}_{p_1}[T_{\mathrm{detect}}] \;\approx\; A / \delta. This is the mean bound; we need a tail bound.

Tail bound (Hoeffding). Assume X_i takes values in a bounded interval [a, b] (which holds under ’s least-favorable-distribution discipline, where the per-step log-likelihood ratio is bounded by the choice of p_0, p_1 pair). Set R = b - a. By Hoeffding’s inequality  (applied via the martingale-difference form for adapted increments): \Pr_{p_1}\!\left[\Lambda_t < t \delta - s\right] \;\leq\; \exp(-2 s^2 / (t R^2)) \quad\text{for } s > 0. The event \{T_{\mathrm{detect}} > t\} is the event \{\Lambda_s < A \text{ for all } s \leq t\}. A necessary condition for this is \Lambda_t < A (the LLR has not yet crossed the threshold at time t); the joint event implies the marginal-time event. Therefore: \Pr_{p_1}[T_{\mathrm{detect}} > t] \;\leq\; \Pr_{p_1}[\Lambda_t < A] \;=\; \Pr_{p_1}[\Lambda_t < t\delta - (t\delta - A)]. The inequality goes the right way precisely because we relax to a necessary condition: the joint-event probability is at most the marginal-event probability. For t > A/\delta, set s = t\delta - A > 0: \Pr_{p_1}[T_{\mathrm{detect}} > t] \;\leq\; \exp(-2 (t\delta - A)^2 / (t R^2)).

Asymptotic form. In the regime t\delta \gg A (event- count exposure long relative to mean detection threshold-crossing), t\delta - A \approx t\delta, so: \Pr_{p_1}[T_{\mathrm{detect}} > t] \;\leq\; \exp(-2 t \delta^2 / R^2) \;=:\; \exp(-\kappa \cdot t \cdot \delta), with \kappa = 2\delta/R^2 (a sub-Gaussian-like rate constant). This is the form Lemma 10 inverts (Substeps 5a–5e of Appendix 11.8) to derive the high-probability detection quantile T_{\beta} in SPRT exposure-event count.

Note on cascade-clock composition. Lemma 4 itself is stated in the SPRT exposure-event clock t. The lead-time-before-cascade guarantee — comparing T_{\mathrm{detect}} against wall-clock cascade time — is composed in Lemma 10 (Appendix 11.8) via (C11.CLK)’s event-count floor N_{\mathrm{cascade}}, clock-failure probability \beta_{\mathrm{clk}}, and calibration-uncertainty \beta_{\mathrm{cal}}. This separates the SPRT-step tail (an event-count statement) from the cascade-clock event (a wall-clock-to-event-count comparison) cleanly.

Caveat on ’s scaling form. states \tau_{\mathrm{meta}} \gtrsim C/I_k as a scaling law, not a theorem-level bound. Lemma 10’s composition handles this via (C11.CLK)’s deterministic event-count floor and the named failure probabilities, rather than treating \tau_{\mathrm{meta}} as a hard inequality. Deployments where the scaling does not hold tightly require larger \beta_{\mathrm{clk}} or \beta_{\mathrm{cal}} to absorb the slack. 

Proof of Lemma 5a (Substrate floor)

Statement (restated). Under invariant I_6 (m_{\mathrm{eff}}^{\mathrm{indep}} \geq m^* \geq 3), condition (C10) of Theorem 1 (per-pair cooperative-novelty rate floor with superposition; comprising sub-clauses (C10.CN) heterogeneous per-pair rate floors \rho_{ij} \geq \rho_0 > 0 on audited subset S_* of m^* substrates, and (C10.SU) pairwise channel superposition: disjoint attribution + non-rivalrous production + joint-deployment intensities): r_{\mathrm{ext}}\;\geq\; r_*(m^*) \;:=\; \rho_0 \cdot \binom{m^*}{2} \;>\; 0.

Pairwise cooperative channels.

For audited subset S_*\subseteq \mathcal{S} with |S_*| = m^*, the pairwise cooperative channel \mathcal{C}^{(s_i, s_j)} for i < j contains cooperatives with exact two-substrate causally-necessary participation: cooperatives whose production requires participants from s_i and s_j but not from any other substrate. Auxiliary participants whose role is post-production (e.g., a Formal-Operational verifier attesting to a Human-AI cooperative output) do not change the channel classification; auxiliary participants whose role is causally necessary for production (e.g., Formal-Operational rule-execution that the output depends on) move the cooperative to the appropriate higher-order channel.

Standing measure conventions.

Throughout this proof:

Proof. We work over deployment class \mathcal{D} satisfying I_6 and (C10).

Step 1 (substrate audit). By I_6, m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* substrates with joint failure-correlation independence. By (C10), audited subset S_* with |S_*| = m^* is certified for the deployment window.

Step 2 (combinatorial pairwise channels). The number of unordered pairs in S_* is \binom{m^*}{2}. Each pair (s_i, s_j) with i < j has channel \mathcal{C}^{(s_i, s_j)} with exact two-substrate support.

Step 3 (per-pair rate from C10.CN). Each pair satisfies r_{\mathrm{ext}}^{(s_i, s_j)} \geq \rho_{ij} \geq \rho_0 > 0 where \rho_0 = \min_{i<j \in S_*} \rho_{ij}. By the joint-deployment- intensity sub-clause (within (C10.CN)’s calibration clause and (C10.SU)(3)), \rho_{ij} measures the per-pair rate as realized in the actual deployment, not under counterfactual isolation.

Step 4 (additivity from C10.SU). By (C10.SU)(1) (disjoint attribution; preserved by exact-two-substrate support), the pairwise channels are pairwise disjoint event classes. By (C10.SU)(2) (non-rivalrous production), simultaneous production across pairs does not reduce any individual rate. Therefore the pairwise rates sum: r_{\mathrm{ext}}^{\mathrm{pairwise}} \;:=\; \sum_{i<j \in S_*} r_{\mathrm{ext}}^{(s_i, s_j)} \;\geq\; \binom{m^*}{2} \cdot \rho_0.

Step 5 (lower bound on r_{\mathrm{ext}}). Total r_{\mathrm{ext}} is the nonnegative additive measure over all cooperative event classes, including pairwise plus higher-order. Higher-order contributions are nonnegative by convention, so: r_{\mathrm{ext}}\;\geq\; r_{\mathrm{ext}}^{\mathrm{pairwise}} \;\geq\; \binom{m^*}{2} \rho_0 \;=\; r_*(m^*).

Step 6 (intensivity over \mathcal{D}). \rho_0 is a deployment-class constant (C10.CN, calibrated per epoch). m^* is the I_6 threshold, deployment-class. Therefore r_*(m^*) is independent of |P| over \mathcal{D}. \Box 

Discussion of constants.

On scope and tightness.

The pairwise-only bound is conservative: higher-order cooperatives contribute additionally to r_{\mathrm{ext}}. Operators may calibrate higher-order rates \rho_0^{(k)} for tighter bounds, giving r_{\mathrm{ext}}\geq \sum_k \rho_0^{(k)} \binom{m^*}{k}. The lemma’s pairwise floor suffices to make Layer 1’s safe region non-trivial under any deployment-class \rho_0 > 0.

On (C10.SU) vs. (C2) joint failure-correlation independence.

I_6’s joint failure-correlation independence excludes shared adversarial shock mechanisms: a single shock cannot span multiple substrates. (C10.SU) excludes shared cooperative production bottlenecks: producing cooperatives in one pairwise channel does not consume resources that would otherwise contribute to another. These are structurally independent conditions; (C10.SU) is new content beyond I_6, surfaced as a deployment-class assumption.

Proof of Lemma 5b (Channel-restricted detection KL floor)

Statement (restated). For any q \in A_{\mathrm{adv}} (regime-(iii) strategy producing at least one monitored four-channel deviation), D_{\mathrm{KL}}(q \,\|\, p_0) \geq \delta_{\mathrm{adv}}> 0.

Proof. We construct \delta_{\mathrm{adv}} as the minimum over four channel-specific KL floors, each derived from a least-favorable adversarial distribution within its channel’s monitored class.

Channel 1 (Poisson cooperative-output rate). Cross-substrate cooperative events under stance S0 are observable as ledger entries with Poisson arrival rate \lambda_0 in the no-adversary baseline. Suppose the adversarial strategy reduces the rate to at most \lambda_0 - \eta_1 for some \eta_1 \in (0, \lambda_0).

The Poisson KL divergence between rate-\lambda_1 and rate-\lambda_0 processes per unit time is: D_{\mathrm{KL}}(\mathrm{Poi}(\lambda_1) \,\|\, \mathrm{Poi}(\lambda_0)) \;=\; \lambda_1 \log(\lambda_1 / \lambda_0) - \lambda_1 + \lambda_0. This is a convex function of \lambda_1, increasing in |\lambda_1 - \lambda_0|. The least-favorable choice within the constraint \lambda_1 \leq \lambda_0 - \eta_1 is \lambda_1 = \lambda_0 - \eta_1, giving: \delta_1 \;=\; (\lambda_0 - \eta_1) \log\!\frac{\lambda_0 - \eta_1} {\lambda_0} - (\lambda_0 - \eta_1) + \lambda_0.

Channel 2 (Bernoulli attestation share). Per-claim attestation success is Bernoulli with probability p_0 in the baseline. Suppose adversarial strategy reduces it to at most p_0 - \eta_2. Bernoulli KL: \delta_2 \;=\; (p_0 - \eta_2) \log\!\frac{p_0 - \eta_2}{p_0} + (1 - p_0 + \eta_2) \log\!\frac{1 - p_0 + \eta_2}{1 - p_0}.

Channels 3, 4 (multinomial concentration). For multinomial distributions over capability partitions, the KL is D_{\mathrm{KL}}(q \,\|\, p_0) \;=\; \sum_i q_i \log(q_i / p_{0,i}). For \delta_3 and \delta_4 to be strictly positive deployment- class constants, we require:

Under these conditions, the constraint that at least one component shifts by \eta_3 (or \eta_4) yields a least-favorable choice that concentrates the perturbation on the component with smallest baseline mass. The KL floor \delta_3 (or \delta_4) is the infimum over feasible q subject to the shift constraint and the support floor, and is strictly positive.

Floor. \delta_{\mathrm{adv}}= \min\{\delta_1, \delta_2, \delta_3, \delta_4\} > 0. Each \delta_c > 0 since each channel’s least-favorable shift is strictly different from the baseline under the stated support conditions; the minimum is therefore strictly positive.

Caveat on the channel-orthogonal residual. Strategies outside A_{\mathrm{adv}} — those producing no shift in any of the four monitored channels — have D_{\mathrm{KL}}(q \,\|\, p_0) = 0 on the monitored observables and are not detected. This is the named gap acknowledged in Layer 3 of Theorem 1. 

Proof of Lemma 5c (Minimax static tightening), single-shock case

Statement (restated, single-shock case). Under invariants I_6 (m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* \geq 3) and the balanced-loss condition \max_s \alpha_s \leq 1/m^* + \epsilon (with \alpha_s the counterfactual shock-loss fraction of Definition 14), and restricted to the substrate-targeting cooperative-overlap regime, the diversity strategy strictly dominates the domination strategy whenever: \Delta r_K \;<\; r_{\mathrm{ext}}+ (1 - \gamma)\bigl(\Delta_{\mathrm{div}}- \Delta_0\bigr), with \Delta_{\mathrm{div}}\;=\; {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \cdot (\ell_D^{\max}- \ell_{\mathrm{div}}^{\max}) \cdot \mathbb{E}[\gamma^{T_{\mathrm{adv}}}].

Proof. The proof composes ’s strategy-dependent corollary with a substrate-targeting shock model and the balanced-loss condition.

Step 1 (shock model). A substrate-targeting shock at random time T_{\mathrm{adv}} eliminates one substrate s from the coalition’s capability poset. The post-shock {\mathop{\mathrm{vol}}_{\mathrm{P}}} is {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K^{-s}). The counterfactual shock-loss fraction (Definition 14) is: \alpha_s(G_K) \;=\; \frac{{\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K) - {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K^{-s})}{{\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_K)}.

Step 2 (worst-case adversarial targeting). Under adversarial targeting, the shock hits the substrate with maximum \alpha_s. The dominator’s worst-case loss fraction is \ell_D^{\max}; the diversity strategy’s worst-case loss fraction is \ell_{\mathrm{div}}^{\max}= \max_s \alpha_s(G_K^{\mathrm{div}}).

Step 3 (balanced-loss condition). Under the balanced-loss invariant \max_s \alpha_s \leq 1/m^* + \epsilon, the diversity strategy’s worst-case loss fraction satisfies \ell_{\mathrm{div}}^{\max}\;\leq\; 1/m^* + \epsilon. Under full failure-correlated single-substrate domination, \ell_D^{\max}= 1 (the dominator’s entire {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K is supported on a single substrate, so the worst-case shock removes everything). For partial domination, \ell_D^{\max}< 1.

Step 4 (substrate-diversification value advantage). The discounted expected difference in surviving {\mathop{\mathrm{vol}}_{\mathrm{P}}} between strategies at shock time T_{\mathrm{adv}} is: \mathbb{E}[\gamma^{T_{\mathrm{adv}}}] \cdot \bigl({\mathop{\mathrm{vol}}_{\mathrm{P}}}_K(1 - \ell_{\mathrm{div}}^{\max}) - {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K(1 - \ell_D^{\max})\bigr) \;=\; {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \cdot (\ell_D^{\max}- \ell_{\mathrm{div}}^{\max}) \cdot \mathbb{E}[\gamma^{T_{\mathrm{adv}}}]. This is \Delta_{\mathrm{div}} (). The expectation operator is \mathbb{E}[\gamma^{T_{\mathrm{adv}}}], not \gamma^{\mathbb{E}[T_{\mathrm{adv}}]}; ’s risk section explicitly warns about the conflation ().

Step 5 (composing with linearized form). ’s strategy-dependent corollary () gives: V_{\gamma}^{\mathrm{div}} - V_{\gamma}^{D} \;=\; \frac{r_{\mathrm{ext}}- \Delta r_K}{1 - \gamma} - \Delta_0. Adding the substrate-survival advantage \Delta_{\mathrm{div}} (which is realized on the shock-survival sub-trajectory): V_{\gamma}^{\mathrm{div,mm}} - V_{\gamma}^{D,\mathrm{mm}} \;=\; \frac{r_{\mathrm{ext}}- \Delta r_K}{1 - \gamma} - \Delta_0 + \Delta_{\mathrm{div}}. Diversity strictly dominates when this is positive: r_{\mathrm{ext}}- \Delta r_K + (1-\gamma)(\Delta_{\mathrm{div}}- \Delta_0) > 0, equivalently: \Delta r_K < r_{\mathrm{ext}}+ (1-\gamma)(\Delta_{\mathrm{div}}- \Delta_0).

Step 6 (cooperative-overlap regime). In the cooperative-overlap regime (substrate-cooperative structure dominated by cross-substrate cooperatives), the equal-substrate analysis is conservative for advantage: cross-substrate cooperative loss is overcounted in original \alpha_s relative to sequential counterfactuals. This is verified in the cooperative-vs-redundancy audit (§8); the canonical tripartite identification passes by construction. In the redundancy-dominated regime, equal-substrate analysis is anti-conservative; sequential counterfactual derivation is required (Lemma 5c-prime, open work). 

Multi-shock extension.

The single-shock result extends to multi-shock via discounted marginal loss increments: \Delta_{\mathrm{div}}^{\mathrm{multi}} \;=\; \mathbb{E}\!\left[\sum_{i=1}^{\infty} \gamma^{T_i} \bigl(\delta_i^D - \delta_i^{\mathrm{div}}\bigr)\right], where \delta_i^{D}, \delta_i^{\mathrm{div}} are the per-shock marginal {\mathop{\mathrm{vol}}_{\mathrm{P}}}-losses for the domination and diversity strategies. In the single-shock limit (N=1 a.s.), this reduces to the single-shock \Delta_{\mathrm{div}} above. Under independent substrate-targeting shocks at Poisson rate \lambda, a closed-form analysis yields: \Delta_{\mathrm{div}}^{\mathrm{multi}} \;\geq\; {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \left[\kappa - \frac{1}{m_{\mathrm{eff}}^{\mathrm{indep}}} \cdot \frac{\kappa(1 - \kappa^{m_{\mathrm{eff}}^{\mathrm{indep}}})}{1 - \kappa}\right], with \kappa = \mathbb{E}[\gamma^{T_1}] = \lambda/(\lambda - \ln \gamma).

Removable singularity at \kappa = 1. The closed form’s 1/(1 - \kappa) factor has a removable singularity: the numerator \kappa(1 - \kappa^{m_{\mathrm{eff}}^{\mathrm{indep}}}) also vanishes at \kappa = 1, and \Delta_{\mathrm{div}}^{\mathrm{multi}}/{\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \to 0^+ via L’Hôpital. The operational regime condition is \kappa < 1 - \delta (not \lambda < \lambda^* directly).

This paper’s main theorem (§5) uses the single-shock case as the binding inequality; the multi-shock extension covers high-\lambda deployment regimes but is not required for the theorem’s substrate-targeting safe-region claim.

Proof of Lemma 5e (Environment-side witness extension)

Statement (restated). Under invariant I_8 and (C12.ENV-WIT) (theorem-level condition introduced with this lemma), Lemma 5b’s channel-restricted detection class extends to environment-side observables. For any q \in A_{\mathrm{adv}}^{\mathrm{env}}: D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \;\geq\; \delta_{\mathrm{adv}}^{\mathrm{env}} := \min_{v \in V_{\mathrm{env}}} \delta_v^{\mathrm{env}} \;>\; 0, where \delta_v^{\mathrm{env}} is the per-variable least-favorable KL floor.

Proof. The proof has four parts.

(a) Symmetric construction with trust model. Under (C12.ENV-WIT), each v \in V_{\mathrm{env}} has a witness substrate s_{\mathrm{env}}(v) satisfying:

Paper 5’s cryptographic and ledger machinery transfers under (C12.SETUP); the trust model is closed by (C12.TRUST-WRITE) and (C12.TRUST-CORR).

(b) Per-variable KL floor derivations. For each v_i \in V_{\mathrm{env}}, the SPRT machinery applies on the witness-recorded observable. The canonical four variables yield:

A two-sided floor variant (when A_{\mathrm{adv}}^{\mathrm{env}} includes any threshold-exceeding shift, not just degradation) replaces \delta_v^{\mathrm{env}} with \min\{D(p_{0,v} - \eta_v), D(p_{0,v} + \eta_v)\}.

(c) Marginal-to-joint KL via DPI. For any q \in A_{\mathrm{adv}}^{\mathrm{env}} producing a threshold-exceeding shift in some v_i \in V_{\mathrm{env}}, the DPI applied to the fixed witness-recording marginalization map / Markov kernel q \mapsto q_{v_i} gives: D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \;\geq\; D_{\mathrm{KL}}(q_{v_i} \,\|\, p_{0, v_i}) \;\geq\; \delta_i^{\mathrm{env}} \;\geq\; \delta_{\mathrm{adv}}^{\mathrm{env}}. The DPI precondition (fixed pre-deployment measurable witness-recording map) is satisfied for the canonical four variables; (C12.CAL) extends it to deployment-added variables.

(d) Strict positivity. V_{\mathrm{env}} is finite by (C12.PUB); each \delta_v^{\mathrm{env}} > 0 by part (b) under the support conventions; therefore \delta_{\mathrm{adv}}^{\mathrm{env}} > 0. \Box 

Composition with Lemma 6. The post-clipping union-class drift floor \delta_{*} used in Lemma 10’s T_{\beta} derivation satisfies \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}), with the inequality accounting for any drift reduction from (C5.HOEFF) clipping. Lemma 5e supplies \delta_{\mathrm{adv}}^{\mathrm{env}} for the union; Lemma 5b supplies \delta_{\mathrm{adv}}.

The named residual (R2), refined. R2 covers manipulations satisfying both: (i) target only exogenous variables outside V_{\mathrm{env}}, and (ii) produce no threshold-exceeding shift in any monitored v (direct or causal). Manipulations of unmonitored variables that causally shift a monitored v are detected through the monitored projection; they are not R2.

Asymmetric vs. symmetric machinery. Paper 5’s agent-side substrate-exclusivity is constructed; Paper 10 §3 specifies that environment-side substrate-exclusivity is the structural mirror, but it requires the (C12.ENV-WIT) trust model and calibration because environment-side machinery is operationally distinct (separate substrate partition, separate trusted-setup ceremony, separate ledger commitments). See §8.2 Audit 8 for the calibration procedure.

Proof of Lemma 6 (h_{\mathrm{detect}} intensivity)

Statement (restated). Under Theorem 1’s conditions (C5) continuous SPRT monitoring, (C5.SPRT) SPRT applicability with sub-clauses (C5.HOEFF/MULT/IID/SUPP), (C11) bounded gap-growth rate, and (C11.CLK) clock comparability with calibrated probability, combined with Lemma 4 (SPRT Wald–Hoeffding tail bound) and Lemma 1 (Layer 1 intensivity):

Detection quantile. T_{\beta}(\beta, \delta_{*}) \;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2 R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\}, with A = \log((1-\beta)/\alpha) Paper 5’s SPRT threshold, R_{\mathrm{H}}= 2B the Hoeffding range width from (C5.HOEFF) clipping, and \delta_{*} the post-clipping union-class drift floor (satisfying \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}) from Lemmas 6, 9).

Conclusions. (i) \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta. (ii) On \{T_{\mathrm{detect}} \leq T_{\beta}\}, h_{\mathrm{detect}}(\theta) := \sup_s \varepsilon_{\mathrm{gap}}(s) \leq h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}. (iii) \Pr[T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \geq 1 - \beta_{\mathrm{clk}} - \beta_{\mathrm{cal}} via (C11.CLK). (iv) Total Layer 2 failure \leq \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}. h_{\mathrm{detect}}(\theta) is intensive in |P| over \mathcal{D}.

Proof. We prove (i)–(iv) in five steps.

Boundary convention. Define t_0 as the last SPRT exposure step at which the deployment is still within Layer 1’s safe region (equivalently, t_0 = t_0^- in the SPRT exposure clock). T_{\mathrm{detect}} is the number of SPRT steps from t_0 until the SPRT log-likelihood ratio crosses the detection threshold, inclusive of the crossing step.

Step 1 (per-step decomposition + (C11)). Under the boundary convention, set s_n := t_0 + n. By (C11), the per-SPRT-exposure-step gap growth satisfies (\Delta\varepsilon_{\mathrm{gap}})_n \leq \rho_{\mathrm{gap}} uniformly over the admissible adversarial class, so \varepsilon_{\mathrm{gap}}(s_n) \leq \varepsilon_{\mathrm{gap}}(s_0) + n \cdot \rho_{\mathrm{gap}}.

Step 2 (boundary condition). At s_0 = t_0, the deployment is within Layer 1’s safe region, so by Lemma 1 \varepsilon_{\mathrm{gap}}(t_0) \leq h_{\mathrm{static}}(\theta).

Step 3 (rigorous Hoeffding inversion to derive T_{\beta}). By Lemma 4, the SPRT detection time satisfies \Pr[T_{\mathrm{detect}} > t] \leq \exp(-2(t\delta_{*}- A)^2/(tR_{\mathrm{H}}^2)) for t > A/\delta_{*}.

Substep 3a. Suppose T \geq 2A/\delta_{*}. Then T\delta_{*}- A \geq T\delta_{*}/2.

Substep 3b. Squaring: (T\delta_{*}- A)^2 \geq T^2\delta_{*}^2/4.

Substep 3c. Substituting into the Hoeffding exponent: 2(T\delta_{*}- A)^2/(TR_{\mathrm{H}}^2) \geq T\delta_{*}^2/(2R_{\mathrm{H}}^2).

Substep 3d. For exponent \geq \log(1/\beta), need T \geq 2R_{\mathrm{H}}^2\log(1/\beta)/\delta_{*}^2.

Substep 3e. Combining the regime condition (3a) and tail condition (3d): T_{\beta}\;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2R_{\mathrm{H}}^2\log(1/\beta)}{\delta_{*}^2}\right\}. For any T \geq T_{\beta}, \Pr[T_{\mathrm{detect}} > T] \leq \exp(-T\delta_{*}^2/(2R_{\mathrm{H}}^2)) \leq \beta. This proves (i).

Step 4 (supremum bound on the event \{T_{\mathrm{detect}} \leq T_{\beta}\}). Combining Steps 1–2 with the bound T_{\mathrm{detect}} \leq T_{\beta}: h_{\mathrm{detect}}(\theta) \;=\; \sup_{0 \leq n \leq T_{\mathrm{detect}}} \varepsilon_{\mathrm{gap}}(s_n) \;\leq\; h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}. This proves (ii) with probability at least 1 - \beta from Step 3.

Step 5 (cascade-clock event via (C11.CLK)). By (C11.CLK), \Pr[N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}}] \geq 1 - \beta_{\mathrm{clk}}, with audit constraint N_{\mathrm{cascade}}\geq T_{\beta} (certified case \beta_{\mathrm{cal}}= 0; union over conservative-bound certificates \beta_{\mathrm{cal}}\in (0,1) empirical case). Thus: \Pr[T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \;\geq\; \Pr[T_{\beta}\leq N_{\mathrm{cascade}}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \;\geq\; 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}. This proves (iii). Union bound on the two failure modes (detection-tail \leq \beta, cascade-clock \leq \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}) gives Layer 2 failure \leq \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}, proving (iv).

Intensivity. Each constant is deployment-class: h_{\mathrm{static}} intensive by Lemma 1; \rho_{\mathrm{gap}} intensive by (C11); T_{\beta}= \max\{2A/ \delta_{*}, 2R_{\mathrm{H}}^2\log(1/\beta)/\delta_{*}^2\} intensive because A, \beta are monitor-design parameters, R_{\mathrm{H}}= 2B is deployment-class via (C5.HOEFF), \delta_{*} is deployment-class via (C5.IID) post-clipping drift floor (rooted in Lemmas 5b/5e); N_{\mathrm{cascade}}, \beta_{\mathrm{clk}}, \beta_{\mathrm{cal}} are deployment-class via (C11.CLK). Therefore h_{\mathrm{detect}} is intensive in |P| over \mathcal{D}. 

Two roles of cascade time. T_{\beta} is the integration horizon for \rho_{\mathrm{gap}} (a legitimate upper bound on T_{\mathrm{detect}} from Lemma 4’s Hoeffding form), measured in SPRT exposure-event count. Separately, \tau_{\mathrm{meta}} \leq T_{\mathrm{cascade}} is a lower bound on wall-clock cascade time from . (C11.CLK) maps the wall-clock floor \tau_{\mathrm{meta}} to an event-count floor N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}} with probability \geq 1 - \beta_{\mathrm{clk}}, supplying the comparable-clock chain T_{\beta}\leq N_{\mathrm{cascade}}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}). The lead-time-before- cascade guarantee is therefore in event-count form: T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}) with probability \geq 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}.

Range-width discipline. B is the symmetric clip radius (\ell_n \in [-B, B] post-clipping); R_{\mathrm{H}}= 2B is the Hoeffding range width matching Lemma 4’s R = b - a convention. Concretely, T_{\beta}= \max\{2A/\delta_{*}, 8B^2\log(1/\beta)/\delta_{*}^2\}.

Sign-direction discipline (operator certification). All T_{\beta}-input certificates point conservatively: B upper, \delta_{*} lower, N_{\mathrm{cascade}} lower, each with calibration-failure probability \beta_{\mathrm{cal}} (or \beta_{\mathrm{cal}}= 0 in the certified-conservative case).

Total Layer 2 failure. Combining Lemma 6 with \mathrm{Lip}(g) \leq K_{\mathrm{Lip}} from (C6): |g(T) - g(P)|(s) \;\leq\; K_{\mathrm{Lip}} \cdot \bigl(h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}\bigr) with probability at least 1 - (\beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}).

(C11) is new content. ’s “T failure modes” remark notes adequacy/failure modes of the operational truth T but does not establish a formal per-step gap-growth-rate bound. (C11) is a Paper 10 operational assumption with empirical testability via Audit 7 (§8.2).

Notation reconciliation

This appendix collects the full notation reconciliation table. The source papers in the GFM sequence use different conventions for the same underlying quantities; this paper standardizes on a single set of symbols across the composition. Symbols introduced by this paper (rather than inherited from a source paper) are marked New in column 2.

Notation reconciliation across source papers and this paper’s standardization.
This paper Source paper Meaning
This paper Source paper Meaning
{\mathop{\mathrm{vol}}_{\mathrm{P}}} Possessed-capability volume
{\mathop{\mathrm{vol}}_{\mathrm{R}}} {\mathop{\mathrm{vol}}_{\mathrm{R}}} Realized capability volume (latent)
{\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} Window-active realized volume
{\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}} {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}} Sacrifice-derived lower bound
\beta^{\mathrm{lower}} \beta^{\mathrm{lower}} B-to-C ratio
\mathrm{HHI} \mathrm{HHI} Trade-flow Herfindahl index
V_{\gamma} V^\gamma Discounted value function
r_{\mathrm{ext}} r_{\mathrm{ext}} Cross-substrate cooperative novelty rate
m_{\mathrm{eff}} m_{\mathrm{eff}} Nominal effective substrate count
\Delta_0 \Delta_0 Immediate {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change from domination
\Delta r_K \Delta r_K Strategy-dependent internal-rate gain
L L Lyapunov function on world-model error
\hat{W} \hat{W} World-model state
\rho_{\min}^{\mathrm{cross}} \rho_{\min}^{\mathrm{cross}} Cross-substrate redundancy minimum
r_{\mathrm{sub}} r_{\mathrm{sub}} Subsumption frequency
r_{\mathrm{S}}, r_{\mathrm{W}} r_S, r_W Self-correction, error-proportional rates
\tau_{\mathrm{meta}} Metastable lifetime / cascade-time lower bound
\mathcal{W}, \mathcal{L} \mathcal{W}, \mathcal{L} Algorithmic witness, verification ledger
\mathrm{Commit} Pedersen commitment
\varepsilon_{\mathrm{gap}} \varepsilon_{\mathrm{gap}} Total proxy-truth gap
\varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} Non-residual gap on P^{\mathrm{act}}
\varepsilon_{\mathrm{floor}}^{\mathrm{res}} \varepsilon_{\mathrm{floor}}^{\mathrm{res}} Residual floor
P, T P, T Proxy ({\mathop{\mathrm{vol}}_{\mathrm{P}}}) / operational truth ({\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]})
m_{\mathrm{eff}}^{\mathrm{indep}} New Failure-correlation-independent substrate count
\Delta_{\mathrm{div}} New Substrate-diversification advantage
T_{\mathrm{adv}} New First-shock arrival time
\ell_D^{\max}, \ell_{\mathrm{div}}^{\max} New Worst-case shock-loss fractions
A_{\mathrm{adv}} New Channel-restricted adversarial class
A_{\mathrm{adv}}^{\mathrm{env}} New Environment-side detection class; see Lemma 9
V_{\mathrm{env}} New Environment-side observable variable enumeration; (C12.PUB)
\epsilon_{\mathrm{env}} New Environment-support regularization constant; (C5.SUPP)
\delta_{\mathrm{adv}} New SPRT raw KL floor on A_{\mathrm{adv}}; Lemma 6
\delta_{\mathrm{adv}}^{\mathrm{env}} New Environment-side raw KL floor; Lemma 9
\delta_{*} New Post-clipping union-class drift floor; Lemma 10, (C5.IID)
B New Symmetric LLR clip radius; (C5.HOEFF)
R_{\mathrm{H}}= 2B New Hoeffding range width; (C5.HOEFF), Lemma 4
K_{\mathrm{ch}} New Monitored-channel cardinality; (C5.MULT)
K_{\mathrm{ch}}^{\mathrm{multi}} New Multinomial-category cardinality bound for Channels 3, 4; (C5.MULT)
A = \log\!\frac{1-\beta}{\alpha} New SPRT threshold; Lemma 10
T_{\beta} New SPRT high-probability detection quantile; Lemma 10
\rho_{\mathrm{gap}} New Per-step gap-growth rate; (C11)
N_{\mathrm{cascade}} New Event-throughput floor before cascade; (C11.CLK)
N_{\mathrm{events}}(\cdot) New SPRT exposure event count up to wall-clock time
\beta_{\mathrm{clk}} New Clock-failure probability; (C11.CLK)
\beta_{\mathrm{cal}} New Calibration-uncertainty probability; (C11.CLK) empirical variant
\kappa New 2\delta/R_{\mathrm{H}}^2 Hoeffding-asymptotic constant; Lemma 4
I_1, \ldots, I_{11} New Operational invariants

The most consequential reconciliations: