This paper identifies a regime under which deployment of capability-unbounded systems is provably safe within the Goal-Frontier Maximization (GFM) framework. Such a regime is needed for the future safe development of AI systems which operate as independent agents capable of recursive self-improvement (RSI) and are able to reach a classification of artificial superintelligence (ASI) within their operational lifetime.

Setting

GFM is not a normative framework. The objective specification treats capabilities as an abstract metric volume. A “capability” has its definition inherited from welfare-economic definitions — something an agent can do or be . Normative claims enter at the capability definition layer, where deployments define which capabilities are observable and socially valuable by specifying which capability claims are admissible (the S1-admissibility framework of ). This entails that GFM is functional over a wide range of existing and future social structures. The caveat: capability definitions must track exercised capabilities in the deployment’s society for the Goodhart-slack bound to model the actual proxy-truth divergence rather than a definitional artifact.

For readers landing on this paper without prior exposure to the Goal-Frontier Maximization sequence, the conceptual entry point is the foundation paper , which lays out the sequence’s frame and overall direction. For the capabilities-approach background, see ; the operational machinery for capability admissibility is in ; the Goodhart-slack framing follows and is composed with the GFM machinery via ’s static Goodhart bound. §1.4 unpacks each source paper’s contribution to the present composition in turn.

The result

The safe regime is conditional on eleven operational invariants on the deployment substrate structure plus an explicit set of operational conditions (C1)–(C12) and four Concentration-Gap structural conditions (Assumption 1: representativeness, bounded dispersion, coalition closure, embedding compatibility); under these conditions, the Goodhart slack between proxy and operational truth is bounded by a constant independent of the system’s absolute capability magnitude. The bound holds for substrate-targeting adversarial events in a static safe region without active monitoring, and is preserved under detection-and-correction with a tail-bounded lead-time guarantee for substrate-targeting evasions of the static region within a four-channel observable class. Five named residuals fall outside the guarantee: channel-orthogonal restructuring, environment-witness-orthogonal manipulation, redundancy-dominated regime, capability-targeting and coalition-internal shocks, and calibration-exceeded gap-growth (Layer 3, (R1)–(R5); we name these residuals explicitly rather than gloss them).

The result inverts the standard alignment-safety frame. Rather than asking what can the system do? and conditioning safety on accurate capability estimation, we ask what operational properties of the deployment can we measure and bound? The deployment-safety guarantee is then a property of the operational regime rather than a property predicted from capability estimation, and measurable from public ledger state under the verification infrastructure of . The cost is that the guarantee is conditional on the invariants holding throughout deployment; the benefit is that the guarantee scales past any capability level our estimation methods can characterize.

What safety means in this context. Safety is a global property of the agent’s optimization dynamics: under the invariants, the agent stays within an error-bounded distance of the operational truth its proxy tracks. The Microfoundation paper characterizes this operational truth in welfare-economic terms — the agent’s objective optimizes a welfare-relevant functional over the deployment’s social structure. This does not imply local safety: an agent operating safely in this sense may still take disruptive actions or make mistakes. It entails that the agent’s objective remains aligned with its society throughout deployment.

What existing paradigms cannot deliver

The capability-magnitude-independent guarantee is something the three dominant deployment-safety paradigms cannot supply. Each places capability estimation at the center, and each runs into the same scaling problem.

Capability evaluations. The standard approach is to construct benchmarks that probe specific capability classes (reasoning, planning, manipulation, self-directed action) and deploy systems whose benchmark performance falls below stated thresholds . The methodology is sound for systems whose capabilities are well-characterized by the benchmark suite. Its failure mode is that any capability class not anticipated by the benchmark construction is invisible. As capabilities grow, the gap between “what the benchmark measures” and “what the system can do” widens.

Scaling laws. A second approach predicts capability growth from training compute, model size, and data, using observed scaling regularities . The predictions are calibrated on smaller systems and extrapolated forward. The failure mode is that the scaling regularities themselves change as capabilities cross qualitative thresholds (tool use, multi-step planning, in-context learning). Extrapolation past such thresholds requires assuming the regularities hold; in practice, they often don’t.

RLHF audits. A third approach trains the system with human feedback to elicit aligned behavior, then audits the resulting policy . The failure mode is the verification asymmetry : the audit machinery can be gamed by a sufficiently capable system through outputs that look correct to the auditor but fail in deployment. The asymmetry grows monotonically with capability.

In each paradigm, the deployment-safety guarantee weakens precisely when the system’s capabilities exceed the regime the methodology was calibrated on. This is the structural pattern this paper aims to invert: if the safety guarantee depends on quantities measurable during deployment rather than predicted in advance, the dependence on capability estimation collapses.

The composition approach

The GFM sequence supplies several pieces of the operational-invariant framework, none of which alone delivers the deployment claim. This paper composes them.

The Horizon-Aware paper establishes the anti-monopolar property: under a discounted objective with positive cross-substrate cooperative novelty, full capability domination is anti-maximizing. The claim is structural: a diversity-maintaining strategy outvalues a domination-maximizing strategy at every sufficiently long planning horizon. The claim relies on the strategy-independent linearized growth model and admits a precise breakdown regime when the post-domination internal rate exceeds the cross-substrate cooperative contribution.

The Exogenous-Verification paper supplies the verification infrastructure: substrate-exclusive algorithmic witnesses, cryptographic commitment via Pedersen commitments , append-only ledger with cross-substrate distribution, and a governance fork protocol for evaluation-protocol acceptance. The infrastructure provides tamper-evident operational records and SPRT-based behavioral monitoring with an explicit detection-rate bound \mathbb{E}[T_{\mathrm{detect}}] \leq A/\delta.

The Phase-Redundancy paper provides the dynamical machinery: a Lyapunov function on world-model error, a phase boundary theorem separating self-correcting from absorbing trajectory regimes, and explicit characterization of the monopolar absorbing fixed point. The phase boundary uses intensive quantities (cross-substrate redundancy, subsumption frequency, drift exposure rate) rather than extensive capability counts.

The Revealed-Sacrifice and Need-Sufficiency papers supply the empirical observation channel: an aggregate B-to-C lower bound from revealed-sacrifice observation that is monotone in observed events, plus a wireheading-consistent concentration signal (trade-flow HHI) that is Schur-convex in the majorization order .

The Microfoundation paper establishes the static Goodhart bound: a Lipschitz transfer between proxy and operational truth on the active subspace of the capability poset, with a four-channel decomposition (observation density, attestation quality, individuation discipline, bundle decomposition) of the proxy-truth gap. The paper also flags an explicit dynamics conjecture (Concentration-Gap Conjecture: optimization pressure correlates with gap exploitation) deferred for future work.

This paper composes these results. The composition is not mechanical: each source paper states its bound in intensive form (per-capability or per-channel), but proving the composition is intensive and tractable requires substantive new content. The new content includes the three-layer deployment claim of §5, the eleven operational invariants of §3, the cooperative-anchoring property of §6, and the deployment-tooling specification of §8.

The three-layer deployment claim

Layer 1 (static safe region). For substrate-targeting adversarial events under the cooperative-overlap regime, an agent’s optimization pressure on cooperative outputs is locally rational toward preserving substrate diversity, with the safe region given by an explicit inequality (Lemma 7 in §4). Inside this region the deployment is safe without active monitoring.

Layer 2 (detection-and-correction). Outside the static safe region but within a channel-restricted detection class A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}}, violations of the safe region are detected by ’s SPRT machinery within a tail-bounded high-probability detection quantile T_{\beta} before irreversible cascade (Lemmas 6, 9, 8, 10). The lead-time guarantee uses Wald’s identity and Hoeffding–Azuma on the clipped SPRT increment to derive \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta, with cascade-clock comparability via (C11.CLK) giving total Layer 2 failure \leq \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}.

Layer 3 (acknowledged residuals). Five named residuals are outside the guarantee: (R1) channel-orthogonal restructuring — strategies achieving \Delta r_K \geq r_{\mathrm{ext}} without producing any monitored four-channel deviation; (R2) environment-witness-orthogonal manipulation — environment manipulations targeting only unmonitored exogenous variables and producing no threshold-exceeding monitored shift; (R3) redundancy-dominated regime — deployments where redundant capabilities dominate cross-substrate cooperatives; (R4) capability-targeting and coalition-internal shocks — adversarial event classes outside the substrate-targeting class; and (R5) calibration-exceeded gap-growth — adversarial strategies whose per-step gap growth exceeds the calibrated \rho_{\mathrm{gap}} from (C11). We name these residuals explicitly rather than gloss them.

The Goodhart slack between proxy and operational truth is bounded by an intensive constant in Layer 1 and by an SPRT-tail-bounded intensive constant in Layer 2. In neither case does the bound grow with the system’s absolute capability magnitude.

Substrate identification and cooperative anchoring

The composition’s load-bearing piece is the canonical tripartite substrate identification: Human + AI + Formal-Operational. These three are jointly failure-correlation independent in the sense ’s verification asymmetry argument requires: prompt injection breaks neither human judgment nor cryptographic commitments; cognitive bias breaks neither LLM inference nor ledger integrity; trusted-setup failure breaks neither human reasoning nor LLM behavior.

A common alternative identification (“two LLMs and a human”) fails the test, since LLMs share substantial adversarial-event surface through training-data overlap, prompt-injection vulnerabilities, and architectural similarity. The canonical tripartite identification makes the formal-operational layer (the verification infrastructure itself) explicit as the third substrate, rather than treating it as an undifferentiated “safety machinery.”

This identification has a structural consequence (§6): cross-substrate cooperative capabilities involving the formal-operational layer are jointly produced and not unilaterally replaceable. An agent cannot substitute a captured or weaker verification layer because the cooperative’s value comes from the human substrate’s actual participation in the specific shared infrastructure. This cooperative-anchoring property defeats the literal replacement-verification attack on the deployment claim, narrowing the agent’s capture options to attacks ’s existing defenses already address.

Scope, residuals, and what we do not claim

This paper’s deployment claim is conditional. We are explicit about what each conditional carries:

The eleven operational invariants (I_1–I_{11}, §3) are the deployment-time conditions. They include the inherited invariants (cross-substrate redundancy \rho_{\min}^{\mathrm{cross}}> \rho^*, subsumption frequency r_{\mathrm{sub}}< r^*, Lyapunov error L < \epsilon_{\mathrm{safe}}), inherited , invariants (B-to-C ratio, HHI), and new operational machinery (I_8 environment-side substrate-exclusive witnesses, I_9 substrate-exclusivity observability, I_{10} coverage/materiality, I_{11} latency bounds). Each invariant is ledger-observable under ’s verification infrastructure.

The substrate-targeting scope. The static safe region covers substrate-targeting adversarial events specifically. Capability-targeting attacks, coalition-internal corruption, and environment manipulation outside the witness coverage are handled by separate machinery (the detection layer for some, acknowledged residuals for others).

The cooperative-overlap regime. Lemma 5c’s single-shock safe-region result is established for deployments where the substrate-cooperative structure is dominated by cross-substrate cooperatives (the canonical tripartite case) rather than redundant capabilities. Sequential counterfactual derivation for the redundancy regime is named as Lemma 5c-prime open work.

The weaker inner-alignment condition. The cooperative-anchoring property requires the agent’s effective objective to realize causally grounded cooperative-outcome value, not merely reward-visible cooperation signals. This is more plausible than a strong substrate-aware mesa-objective requirement but is not delivered by vanilla RLHF; achieving it requires training-time discipline (delayed feedback, adversarial fake-verification examples, process supervision tied to real attestations).

The Concentration-Gap structural conditions. Theorem 1’s Layer 1 binding via invariant I_5 requires four operationally auditable structural conditions: representativeness (REP), bounded dispersion (DISP), coalition closure (COAL), and Lipschitz embedding compatibility (EMB), collected as Assumption 1. The companion paper derives the Goodhart-slack bound under these four structural conditions plus the HHI threshold itself. §7 summarizes the structural content; contains the formal derivation.

What we explicitly do not claim. This paper does not prove that alignment pressure is universally reversed under substrate identification. It does not solve the inner-alignment problem: condition (C4) is assumed, and requires additional training infrastructure to be bounded. It does not establish basin entry: whether a deployment can reach the cooperative-anchoring attractor from arbitrary initial conditions requires empirical training-time discipline from AI designers. It does not bound the residual classes named in Layer 3. §7.4 works through the distinction between conjectural conditions (argued, not proved), operational conditions (verifiable from deployment state), and explicit scope restrictions.

Paper roadmap

§2 sets up the composition: how the theoretical foundation stacks into a unified framework, with notation reconciliation across the source papers. §3 defines the eleven operational invariants with their threshold semantics and ledger-observable measurement procedures. §4 states and proves the ten compositional lemmas (1–4, the five-part Lemma 5 family, and Lemma 6). §5 states the three-layer deployment-safety theorem and gives its proof. §6 develops the substrate identification and cooperative-anchoring property in detail, including the three additional invariants I_9–I_{11} that bound non-substrate-targeting evasions. §7 summarizes the operationalization of ’s Concentration-Gap Conjecture (the Herfindahl–Hirschman Index as surrogate for optimization pressure) and points to the companion paper for the formal scoped discharge under the structural conditions of Assumption 1. §8 specifies what a deployer must instrument to invoke the theorem. §9 walks through worked deployment scenarios, including a clean case, per-invariant violations, an adversarial-coalition scenario, and the canonical tripartite audit. §10 discusses what this paper establishes and what it defers, with explicit open questions for follow-up work.

Composition setup

The deployment-safety theorem of §5 composes results from six papers in the GFM sequence. This section sets up the composition: what each source paper contributes, the intensive-vs-extensive distinction that makes the composition tractable, the notation reconciliation across source papers, and the composition challenges this paper must address.

Source-paper contributions as building blocks

The composition uses six load-bearing source-paper results. Each is intensive: stated in per-capability, per-channel, or per-event terms rather than in terms of absolute capability magnitude. This intensity is what makes the composition’s capability-magnitude-independent bound possible.

The Horizon-Aware paper : The anti-monopolar property. establishes that under a discounted objective with positive cross-substrate cooperative novelty r_{\mathrm{ext}}> 0, full capability domination is anti-maximizing: V_{\gamma}^{\mathrm{div}} - V_{\gamma}^{D} \;=\; \frac{r_{\mathrm{ext}}}{1 - \gamma} - \Delta_0, where V_{\gamma}^{\mathrm{div}} and V_{\gamma}^{D} are the discounted values of the diversity-maintaining and domination strategies, and \Delta_0 is the immediate {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change from domination. Diversity strictly dominates for \gamma > \gamma^* = 1 - r_{\mathrm{ext}}/\Delta_0 (when \Delta_0 > 0) or for all \gamma \in (0,1) (when \Delta_0 \leq 0). The corollary on strategy-dependent internal rate identifies the breakdown regime \Delta r_K \geq r_{\mathrm{ext}}, which this paper’s safe-region analysis must constrain.

The Exogenous-Verification paper : Verification infrastructure. supplies the operational substrate for ledger-observable invariants: substrate-exclusive algorithmic witnesses (Definition of Algorithmic Witness), Pedersen commitments (computationally binding, perfectly hiding under discrete-log assumption), an append-only verification ledger replicated across substrate-exclusive nodes, and a governance fork protocol requiring cross-substrate supermajority for evaluation-protocol acceptance. The behavioral consistency monitor () provides SPRT detection with expected detection time \mathbb{E}[T_{\mathrm{detect}}] \leq A/\delta where A is the SPRT upper threshold (Wald’s sequential test ) and \delta is the KL divergence between baseline and alternative behavioral distributions.

The Phase-Redundancy paper : Phase boundary dynamics. provides a Lyapunov function L(\hat{W}_t) = \sum_k w_k \epsilon_k(t)^2 on world-model error and a phase boundary theorem characterizing when the coupled (P, \hat{W}) system stays in the self-correcting basin versus enters the monopolar absorbing state. The critical surface () is expressed in intensive quantities: r_{\mathrm{S}}\cdot \rho_{\min}^{\mathrm{cross}}+ \alpha_{\min} c_V \underline{\nu} \;>\; r_{\mathrm{W}}\cdot r_{\mathrm{sub}}+ d_{\max}\cdot \frac{\Delta\rho_{\mathrm{avg}}}{L_{\mathrm{avg}}} where \rho_{\min}^{\mathrm{cross}} is the cross-substrate redundancy minimum, r_{\mathrm{sub}} is the subsumption frequency, and r_{\mathrm{S}}, r_{\mathrm{W}}, d_{\max} are intensive rate constants. The metastable lifetime \tau_{\mathrm{meta}} \gtrsim C/I_k under partial endogenous correction (B1') provides a lower bound on the cascade time that this paper’s lead-time guarantee uses.

The Revealed-Sacrifice paper : B-to-C lower bound. establishes the aggregate B-to-C lower bound theorem: {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]}(Y_n) \geq \Delta{\mathop{\mathrm{vol}}_{\mathrm{P}}}(X_n) per revealed-sacrifice event, with monotone accumulation ({\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}(E') \geq {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}(E) for E \subseteq E'). The B-to-C ratio \beta^{\mathrm{lower}}= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}/ {\mathop{\mathrm{vol}}_{\mathrm{P}}}\in [0, 1] is the ledger-derived measure of how much realized exercise has been witnessed against possessed capability volume.

The Need-Sufficiency paper : HHI and gap decomposition. establishes the trade-flow Herfindahl-Hirschman Index \mathrm{HHI} as a wireheading-consistent concentration signal, Schur-convex in the majorization order, with the Part-A category-flow variant third-party observable from public committed events plus public S1-admissibility labelling. The gap decomposition partitions the B-to-C gap into five cells (restricted, covered, dormant, residual, boundary-residual) with classification computable in near-linear \widetilde{O}(|P|+E+M) time.

The Microfoundation paper : Static Goodhart bound. establishes the Lipschitz transfer Goodhart bound: for P= {\mathop{\mathrm{vol}}_{\mathrm{P}}} as proxy and T= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} as operational truth (under stance S0), and Lipschitz alignment property g: |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}, where \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} is the non-residual proxy-truth gap on the operationally active subspace P^{\mathrm{act}}. The four-channel decomposition (observation density, attestation quality, individuation discipline, bundle decomposition) governs the gap’s evolution. ’s universal model-free Concentration-Gap Conjecture (optimization pressure correlates with gap exploitation) remains open as a research-program target. The companion paper discharges a scoped version under four structural conditions (Assumption 1: REP, DISP, COAL, EMB); this paper’s I_5 (HHI ceiling) provides the operational HHI threshold under which the scoped discharge binds.

The intensive-vs-extensive distinction

A bound is intensive if it is stated in terms of per-element quantities (per-capability, per-channel, per-event) whose magnitude does not grow with the size of the underlying capability poset P or agent population. A bound is extensive if it scales with |P| or with capability count.

The capability-magnitude-independent property of this paper’s main theorem requires the bound on Goodhart slack to be intensive in |P|. If any source-paper bound were extensive — for example, if \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} grew as a sum over capabilities rather than as a per-capability supremum — the composed bound would inherit that extensive scaling, and the deployment claim would weaken with capability magnitude.

The composition challenge is to show that the combination of these intensive bounds remains intensive. Section 4 addresses this via Lemma 1 (intensive composition under co-evolution); the proof handles ’s Composition Proposition 1 positive-part error terms explicitly.

Notation reconciliation

The source papers use different conventions for the same underlying quantities; this paper standardizes on a single set of symbols across the composition. The full notation reconciliation (source-paper symbol \to This paper’s symbol \to meaning) is collected in Appendix 12, Table 2.

Composition challenges

The source-paper results do not compose mechanically. Three challenges drive the technical content of §4 and §5.

Challenge 1: Co-evolution of channels. ’s Composition Proposition 1 establishes that the four-channel decomposition composes exactly in the sequential-intervention regime but only approximately under co-evolution, with positive-part error terms. This paper’s intensive composition lemma (Lemma 1) propagates these error terms through the deployment-safety bound. The error terms are bounded but non-zero; the deployment claim must accommodate them.

Challenge 2: Lyapunov-to-Goodhart bridge. ’s Lyapunov function tracks world-model error, and ’s \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} tracks proxy-truth gap on the active subspace. These quantities are distinct: L is per-dimension squared error in \hat{W}, while \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}} is sup-norm gap in the proxy-truth comparison. Lemma 2 establishes the quantitative bridge: L< \epsilon_{\mathrm{safe}} implies \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}) with f explicit and intensive in capability magnitude.

Challenge 3: HHI-to-pressure operationalization. ’s universal model-free Concentration-Gap Conjecture (optimization pressure correlates with gap exploitation) is unproved. The companion paper closes the deployment-relevant gap by proving a scoped version under four structural conditions (Assumption 1: REP, DISP, COAL, EMB), and Layer 1 of Theorem 1 now binds through that scoped structural discharge rather than through direct conditioning on the unproved universal conjecture. This paper’s I_5 (HHI ceiling) supplies the HHI threshold side of the discharge: under (REP)+(DISP)+(COAL)+(EMB) plus \mathrm{HHI}< H^*, delivers the proxy-truth Goodhart-slack bound. Lemma 3 formalizes the forward direction (\mathrm{HHI}> H^* \Rightarrow trade-flow concentration prerequisites of the optimization-pressure regime); the reverse direction required by Layer 1 is supplied by the scoped structural theorem. The universal conjecture remains a research-program target deferred indefinitely (§10.6); the deployment claim does not depend on it directly.

What this paper supplies on top of source-paper machinery

This paper’s substantive new content (beyond composition of inherited results):

The composition is non-trivial: the five-part Lemma 5 family carries internal asymmetries (Lemma 5a’s substrate floor relies on pairwise-additivity assumptions distinct from the failure- correlation independence used elsewhere; Lemma 5c’s safe region is binding only in the cooperative-overlap regime), and the cooperative-anchoring property of §6 holds in a deliberately narrow form rather than as a general inner-alignment guarantee. The result is a bounded but defensible structural claim, not a sweeping safety guarantee. §1.7 of the introduction stated the explicit residuals; §10 returns to them after the formal apparatus is in place.

The eleven operational invariants

The deployment-safety theorem (Theorem 1) is conditional on eleven operational invariants holding throughout the deployment window. Each invariant has a precise definition, a threshold semantics (what value the invariant must remain within), a measurement procedure on ’s verification ledger, and a source-paper grounding. This section defines all eleven; subsequent sections reference them by number.

We group the invariants by structural role. Invariants I_1, I_2, I_3 govern the dynamical regime ( phase boundary). Invariants I_4, I_5 govern the empirical observation channel (, ). Invariants I_6, I_7, I_8 govern substrate structure and witnesses. Invariants I_9, I_{10}, I_{11} bound cooperative-anchoring evasions (this paper new content, §6).

Theorem-proof invariants vs. deployment-regime invariants.

The eleven invariants serve two distinct functions in Theorem 1. Theorem-proof invariants are conditions whose satisfaction is directly required by the formal proof of Layer 1 or Layer 2: I_1, I_2, I_3, I_4, I_5, I_6, I_8. These appear in proof steps in §5.2–§5.3. Deployment-regime invariants are conditions required for the cooperative-anchoring property to hold operationally and for the overall regime to be self-consistent: I_7, I_9, I_{10}, I_{11}. These appear in §6 as bounds on cooperative-anchoring evasions and in (C4) of Theorem 1 (causally grounded cooperative-outcome value). The deployment-regime invariants do not appear directly in the formal proof of Layers 1–2 but are required for the theorem’s preconditions (especially C4 and the cooperative-anchoring property) to be operationally satisfied.

The distinction matters for Theorem 1’s conditions: (C1) “invariants in force” covers all eleven, but the proof structure (Layers 1–2) directly invokes the theorem-proof subset, while the deployment-regime subset enters through (C2)–(C5).

Dynamical regime invariants (I_1, I_2, I_3)

These three invariants are inherited from ’s phase boundary theorem and characterize the regime in which the coupled (P, \hat{W}) system stays in the self-correcting basin rather than entering the monopolar absorbing state.

Empirical observation invariants (I_4, I_5)

These two invariants are inherited from and and ensure the empirical observation channel (revealed-sacrifice events) is producing meaningful {\mathop{\mathrm{vol}}_{\mathrm{R}}} lower bounds and that trade-flow concentration stays within the Concentration-Gap Conjecture’s predicted-safe regime.

Substrate-structure invariants (I_6, I_7, I_8)

These three invariants govern the substrate partition and witness machinery. I_6 is the substrate-count invariant upgraded from ’s nominal m_{\mathrm{eff}} to failure-correlation-independent m_{\mathrm{eff}}^{\mathrm{indep}}. I_7 governs bundle decomposition (’s Channel 3) through ’s governance fork. I_8 extends ’s substrate-exclusive witnesses to environment-side observables.

Cooperative-anchoring evasion bounds (I_9, I_{10}, I_{11})

These three invariants are introduced in §6 to bound the asymmetric capture, cooperative forking, and time-asymmetry capture evasions of the cooperative-anchoring property. Cooperative-anchoring defeats literal replacement attacks on the verification layer; I_9, I_{10}, I_{11} extend the defense to subtler evasions.

The deeper concern is that the number of potential sub-threshold domains a more capable agent can distribute effects over grows with capability. At capability |P|, an agent can in principle exploit O(|P|) sub-threshold domains, each carrying cumulative effect just below \theta_{\mathrm{material}}, producing O(|P|) \cdot \theta_{\mathrm{material}} aggregate uncovered effect. This re-introduces a capability-dependent vulnerability into an architecture designed to be capability-independent.

The structural fix is to require the cumulative-effect estimator \mathcal{E} to perform cross-domain aggregation: rather than per-domain \mathcal{E}(D) < \theta_{\mathrm{material}} being sufficient for non-routing, require \sum_{D : \text{causally linked to action class } C} \mathcal{E}(D) < \theta_{\mathrm{material}}^{\mathrm{agg}} for an aggregate threshold. Specifying the cross-domain aggregation correctly — which causal links count, how to bound “causally linked,” how to set \theta_{\mathrm{material}}^{\mathrm{agg}} intensively in |P| — is open follow-up work. This paper states I_{10} in the per-domain form and acknowledges the sub-threshold aggregation as a structural limitation rather than a calibration matter. Until cross-domain aggregation is specified, I_{10}’s effectiveness against multi-domain aggregation degrades with capability scaling.

Summary table

Table 1 summarizes the eleven invariants with their structural roles, source-paper grounding, and ledger observability.

The eleven operational invariants of this paper’s deployment-safety theorem.
I_k	Quantity	Role	Source
I_1	\rho_{\min}^{\mathrm{cross}}> \rho^*	Cross-substrate redundancy floor
I_2	L< \epsilon_{\mathrm{safe}}	Lyapunov error within safety threshold
I_3	r_{\mathrm{sub}}< r^*	Subsumption frequency ceiling
I_4	\beta^{\mathrm{lower}}> \beta^*	B-to-C ratio floor
I_5	\mathrm{HHI}< H^*	Trade-flow HHI ceiling; threshold side of the scoped Concentration-Gap discharge under Assumption 1	,
I_6	m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* \geq 3	Failure-correlation-independent substrate count	(refined)
I_7	Governance-gated bundle individuation	Channel 3 discipline	,
I_8	Environment-side witnesses	Environment-manipulation coverage	New
I_9	Substrate-exclusivity observability	Asymmetric capture bound	New
I_{10}	Coverage/materiality routing	Cooperative forking bound	New
I_{11}	Latency bounds on actions	Time-asymmetry capture bound	New

All eleven invariants are ledger-observable under the verification infrastructure of . This paper’s deployment recipe requires continuous monitoring of all eleven; threshold violations trigger SPRT detection (§4 Lemma 6) with tail-bounded lead time (Lemma 8) before the bound deteriorates beyond the safe region.

Conjecture-dependence

Of the eleven invariants, I_5 (HHI ceiling) is the only one whose interpretation routes through the Concentration-Gap conjecture. The other ten derive from established source-paper results.

I_5’s interpretation does not depend on the universal model-free Concentration-Gap conjecture directly: proves a scoped Concentration-Gap result under four structural conditions (REP, DISP, COAL, EMB; collected as Assumption 1), and the deployment claim’s Layer 1 binding via I_5 follows from that scoped theorem. The deployment claim therefore depends on Assumption 1’s four structural conditions; closing the universal conjecture would strengthen the result by removing those structural conditions, but is deferred indefinitely (§10.6).

Intensive composition lemmas

The deployment-safety theorem of §5 composes ten lemmas: four general composition lemmas (1–4), a five-part family covering substrate-targeting adversarial events (5a–5e), and Lemma 6 (h_{\mathrm{detect}} intensivity). This section states and proves each.

Lemma 1 (Intensive composition under co-evolution)

Connection to Theorem 1’s Layer 1. Lemma 1 establishes that the composed slack term X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}} is intensive in |P| over \mathcal{D}, and that multiplying by \mathrm{Lip}(g) preserves intensivity. Theorem 1’s Layer 1 invokes this with X = \min\{f(\epsilon_{\mathrm{safe}}), h_{\mathrm{CG}}\} after Lemma 2’s substitution and the Concentration-Gap structural-condition bound (Step 4 of the Theorem 1 proof); the resulting h_{\mathrm{static}}(\theta) = \min\{f(\epsilon_{\mathrm{safe}}), h_{\mathrm{CG}}\} + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}} is intensive by Lemma 1, delivering the capability-magnitude-independence property of the deployment claim.

Lemma 2 (Lyapunov-to-Goodhart bridge)

Connection to Theorem 1’s Layer 1. Lemma 2 converts the Lyapunov bound (Phase Redundancy deliverable under invariants I_1, I_3) into the relative sup-norm gap (Microfoundation Goodhart-bound input). Theorem 1’s Layer 1 proof invokes Lemma 2 at Step 3, then composes via gap decomposition (Step 4), Goodhart bound (Step 5), and Lemma 1’s generic-X intensivity packaging (Step 6) with X = \min\{f(\epsilon_{\mathrm{safe}}),\, h_{\mathrm{CG}}\}, where h_{\mathrm{CG}} is the Concentration-Gap structural-condition bound from Step 4.

Lemma 3 (HHI-pressure surrogate adequacy)

We separate what the HHI argument actually proves (one-way implication) from what the theorem requires (the reverse implication, which is an empirical claim).

The reverse-threshold implication required by Theorem 1.

Lemma 3 establishes only a high-HHI sufficient condition for the prerequisites of \mathcal{R}_{\mathrm{press}}. The deployment-safety theorem’s Layer 1 proof (§5.2, Step 4) requires a separate low-HHI sufficiency statement: \mathrm{HHI}< H^* \implies \mathcal{R}_{\mathrm{press}}^c. This is not the contrapositive of Lemma 3 — the contrapositive would be “no trade-flow concentration prerequisites \implies \mathrm{HHI}\leq H^*,” which is mechanically true but operationally vacuous. What Layer 1 needs is the converse-direction sufficiency: \mathrm{HHI}< H^* rules out the pressure regime, not just its trade-flow signature. This direction does not follow from Lemma 3. There may exist deployments in \mathcal{R}_{\mathrm{press}} with low HHI (e.g., concentrated optimization pressure that does not manifest in concentrated trade flow). For Theorem 1 to invoke I_5 as the operational invariant, this gap must be closed.

The gap is closed structurally. The companion paper establishes that under four operationally auditable structural conditions — representativeness (REP), bounded dispersion (DISP), coalition closure (COAL), and Lipschitz embedding compatibility (EMB) (collected as Assumption 1 of §5.1) — the proxy-truth Goodhart slack is bounded by a \chi^2-divergence quantity that controls \mathrm{HHI} via \chi^2(w \,\|\, \mu) = N \cdot \mathrm{HHI}(w) - 1 for uniform base measure \mu on a (COAL)-bounded post-partition counterparty population.

What the structural conditions deliver. Layer 1 of the deployment theorem binds via ’s upper bound. Each of (REP), (DISP), (COAL), (EMB) is auditable from operational deployment state ; the deployment claim’s load-bearing content is visible in four concrete checkable conditions rather than hidden in an empirical adequacy claim.

What remains conditional. The universal model-free Concentration-Gap conjecture is not discharged by the companion paper; only the scoped version under (REP) + (DISP) + (COAL) + (EMB) is proved. Deployments where these structural conditions cannot be audited fall outside the deployment claim’s scope. This is discussed in §7.

Note on the Conjecture-dependence audit (§3.6).

I_5 depends on Assumption 1’s four operationally auditable structural conditions; the universal Concentration-Gap conjecture is not part of the deployment claim’s dependency chain. The other ten invariants are theorem-proof conditions derived from established source-paper machinery. See §3.6 for the dependency structure and §10.6 for the universal conjecture’s status as a deferred-indefinitely target.

Lemma 4 (SPRT detection lead-time tail bound)

The Lemma 5 family: substrate-targeting adversarial events

The five-part Lemma 5 family covers substrate-targeting adversarial events specifically. The family structure mirrors the three-layer deployment claim: Lemma 5c provides the static safe region, Lemma 5b provides the channel-restricted detection class, Lemma 5d composes detection with the lead-time guarantee, Lemma 5a supplies the substrate floor that both layers depend on, and Lemma 5e extends Lemma 5b’s machinery to environment-side observables.

The family is restricted to substrate-targeting adversaries (Definition 7); other shock classes (capability-targeting, coalition-internal corruption, environment manipulation outside witness coverage) require separate machinery or are acknowledged residuals. §1.7 of the introduction stated this scope explicitly.

Lemma 5a (Substrate floor)

Connection to Theorem 1’s Layer 1. Lemma 5a’s r_{\mathrm{ext}}\geq r_*(m^*) > 0 is invoked at Step 1 of Lemma 5c’s appendix proof to make the safe-region inequality \Delta r_K < r_{\mathrm{ext}}+ (1-\gamma)(\Delta_{\mathrm{div}}- \Delta_0) non-trivial. Without (C10), the safe region could be vacuously empty; with (C10), r_{\mathrm{ext}} has a positive deployment-class constant floor.

(C10.SU) vs. joint failure-correlation independence (I_6). I_6 excludes shared adversarial shock mechanisms: no single shock spans multiple substrates. (C10.SU) excludes shared cooperative production bottlenecks: producing cooperatives in one channel does not consume resources that would otherwise contribute to another. These are structurally independent conditions. I_6 alone does not imply additivity of cooperative rates; (C10.SU) is required for that.

Lemma 5b (Channel-restricted detection KL floor)

Lemma 5c (Minimax static tightening)

We internalize the key definitions and the single-shock statement in the body; the multi-shock extension appears as a Remark below.

Deployment-class breadth of (C3).

The cooperative-overlap regime is a substantive scope restriction. Current AI deployments fall into three classes by cooperative structure:

Theorem 1 covers the second class fully and the third class conditionally. It does not cover the first class. The deployment claim is therefore for deployments structured as joint cooperative production, not for AI-as-tool replacement. Operators choosing redundancy-dominated deployment patterns must use other safety frameworks; the cooperative-overlap requirement is not an oversight, it is the architectural commitment that makes the intensivity property of Layer 1 hold.

The breadth of (C3) is therefore a deliberate scope restriction with concrete operational implications: it constrains the deployment space to those where verification is jointly produced, not externally imposed. Sequential counterfactual derivation (Lemma 5c-prime open work) would extend the safe region into the redundancy-dominated regime under modified assumptions; until that derivation is complete, redundancy-dominated deployments fall into residual (R3).

Lemma 5d (Lead-time tail composition)

Lemma 5e (Environment-side witness extension)

Lemma 6 (h_{\mathrm{detect}} intensivity)

Constant taxonomy. T_{\beta} depends on \alpha (Type-I rate, deployment-class via (C5.MULT) familywise allocation), \beta (detection-tail parameter, monitor-design), B (clip radius, (C5.HOEFF), deployment-class), R_{\mathrm{H}}= 2B (Hoeffding range, Lemma 4’s R = b - a), and \delta_{*} (post-clipping drift floor, (C5.IID), Lemmas 5b/5e). Cascade-clock event uses N_{\mathrm{cascade}}, \beta_{\mathrm{clk}}, \beta_{\mathrm{cal}} from (C11.CLK). All deployment-class.

Sign-direction discipline. All operator certificates point conservatively: B upper, \delta_{*} lower, N_{\mathrm{cascade}} lower, with calibration-uncertainty \beta_{\mathrm{cal}} in the empirical case.

Lemma family composition

The five-part Lemma 5 family composes as follows in the deployment-safety theorem (§5):

The composition with Lemmas 1–4 and Lemma 6 supplies the bridge to ’s static Goodhart bound and the operational lead-time guarantee. §5 states the full deployment-safety theorem and traces the composition of all ten lemmas.

Main theorem: Conditional Deployment Safety

This section states and proves the central result of this paper: a three-layer deployment-safety theorem composing the ten lemmas of §4 with the eleven invariants of §3 into an operational guarantee that is intensive in the system’s capability magnitude.

Theorem statement

Theorem 1 (Conditional Deployment Safety). Let P= {\mathop{\mathrm{vol}}_{\mathrm{P}}} be the possessed-capability volume (the proxy) and T= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} be the window-active realized capability volume (operational truth under stance S0). Let g be a Lipschitz alignment property on the operationally active subspace P^{\mathrm{act}} with constant \mathrm{Lip}(g).

Assume:

Invariants in force. The eleven operational invariants I_1, \ldots, I_{11} hold throughout the deployment window [0, T]. Additionally, the Concentration-Gap structural conditions (Assumption 1: representativeness, bounded dispersion, coalition closure, and embedding compatibility) hold so that I_5 is operationally meaningful for the theorem’s Layer 1 binding.
Canonical tripartite substrate identification. The deployment uses the Human + AI + Formal-Operational identification of §6, with m_{\mathrm{eff}}^{\mathrm{indep}} = 3 verified by failure-correlation auditing.
Cooperative-overlap regime. The substrate-cooperative structure is dominated by cross-substrate cooperatives (the canonical tripartite case) rather than redundant capabilities, verified by the cooperative-vs-redundancy audit of §8.
Causally grounded cooperative-outcome inner-alignment condition. The agent’s effective objective realizes causally grounded cooperative-outcome value, not merely reward-visible cooperation signals.
Continuous SPRT monitoring. ’s behavioral consistency monitor runs continuously throughout deployment, with alternative distributions chosen per Lemma 6’s least-favorable framework.
SPRT applicability (agent-side and environment-side). The deployment’s SPRT monitors (both the agent-side monitor of Lemma 6 and the environment-side monitor of Lemma 9) satisfy the following four sub-clauses, ensuring Lemma 4’s Hoeffding-Azuma step applies with deployment-class constants (used in Lemma 10):
- Clipped LLR with bounded Hoeffding range. Per-step log-likelihood-ratio increments \ell_n^{\mathrm{raw}} := \log(p_1(X_n)/p_0(X_n)) are clipped to [-B, B] with B a deployment-class constant independent of |P|; the Hoeffding range width is R_{\mathrm{H}}= 2B (matching Lemma 4’s R = b - a convention).
- Bounded channel multiplicity and multinomial cardinality, fixed before deployment. The monitored partition uses fixed-cardinality K_{\mathrm{ch}} channels with familywise \alpha/\beta allocation. For agent-side multinomial channels (Lemma 6 Channels 3 and 4), the per-channel partition cardinalities K_3, K_4 \leq K_{\mathrm{ch}}^{\mathrm{multi}} are also bounded by a deployment-class constant K_{\mathrm{ch}}^{\mathrm{multi}} fixed before deployment; categories cannot be added during deployment. The channel-selection / testing policy is fixed before deployment. Adaptive creation of new monitored channels (or new multinomial categories) during deployment violates (C5.MULT) and takes the deployment outside Theorem 1’s conditions.
- Adapted increments with conditional drift floor. Under p_1, the clipped LLR increments \ell_n are adapted with conditional mean \mathbb{E}[\ell_n \mid \mathcal{F}_{n-1}] \geq \delta_n \geq \delta_{*}> 0; centered increments \ell_n - \mathbb{E}[\ell_n \mid \mathcal{F}_{n-1}] form a martingale-difference sequence bounded by R_{\mathrm{H}}. Here \delta_{*} is the post-clipping union-class drift floor, satisfying \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}) (the raw Lemmas 5b/5e KL floors, with \leq accounting for any drift reduction from clipping).
- LLR finiteness with support conventions (agent-side and environment-side). Almost-sure finiteness of \ell_n holds automatically under (C5.HOEFF) clipping. For agent-side multinomial channels (Lemma 6 Channels 3 and 4) and environment-side monitors (Lemma 9), the per-variable / per-channel distributions satisfy: Bernoulli baselines p_0 \in (\epsilon_{\mathrm{env}}, 1 - \epsilon_{\mathrm{env}}') \subset (0, 1) open; Poisson baselines \lambda_0 \geq \epsilon_{\mathrm{env}}> 0; multinomial baselines p_{0, i} \geq \epsilon_{\mathrm{env}}> 0 for every category i, with the partition’s cardinality fixed before deployment (per (C5.MULT)). \epsilon_{\mathrm{env}} is a deployment-class regularization constant. Non-overlapping support is handled via clipping per (C5.HOEFF).
See §8.2 Audit 7 (agent-side) and Audit 8 (environment-side, (C12.ENV-WIT)) for the calibration procedures.
Bounded-Lipschitz alignment property. The Lipschitz constant \mathrm{Lip}(g) of the alignment property g is bounded by some K_{\mathrm{Lip}} \geq 0 depending on the deployment class but independent of |P|. Standard welfare functionals (scalarizations of per-capability metrics, weighted sums with bounded weights) satisfy (C6) by construction; pathological functionals where \mathrm{Lip}(g) grows with |P| produce extensive Goodhart bounds and are excluded.
Bounded co-evolution. By , the deployment’s per-channel coupling magnitudes are uniformly bounded by \bar{M} \leq C \cdot B\cdot \lambda_{\max} \cdot \tau_{\mathrm{meta}} for C \in \{1, K_{\mathrm{ch}}- 1\}, all factors deployment-class constants intensive in |P|. The derivation requires the following sub-clause:
- Upper exposure-rate cap. There exists a deployment-class action rate cap \lambda_{\max} > 0 independent of |P|, such that the SPRT exposure event count satisfies N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \leq \lambda_{\max} \cdot \tau_{\mathrm{meta}} deterministically (operator-enforced via rate limits / action bounds). This is distinct from (C11.CLK)’s lower throughput floor. Calibration procedure in .
Per-capability admissibility. For every capability c \in P^{\mathrm{act}}, the per-capability proxy-truth gap is allocated a ceiling with disjoint per-channel sub-shares, in the sense of ’s Assumption 1. This is the precondition under which the four-channel decomposition’s sup-norm is bounded by per-channel admissibility ceilings.
World-model parameterization regularity. The deployment’s world-model parameterization satisfies four sub-conditions ensuring Lemma 2’s Lyapunov-Goodhart bridge holds:
- Bounded per-capability dimension count. |\mathrm{dim}(c)| \leq N_{\max} for every c \in P^{\mathrm{act}}, with N_{\max} a |P|-independent deployment-class constant.
- Coordinate-Lipschitz parameterization. |P(c) - T(c)| \leq \sum_{k \in \mathrm{dim}(c)} L_{c,k} |\epsilon_k|, with bounded L_{c,k} per (capability, dimension) pair and L_{\max} = \sup_{c, k} L_{c,k} a |P|-independent deployment-class constant.
- Safety-relevant subspace with weight floor. The Phase Redundancy Lyapunov function operates on a safety-relevant subspace S \subseteq \{1, \ldots, K\} with w_k \geq w_{\min} > 0 for all k \in S, and \mathrm{dim}(c) \subseteq S for every c \in P^{\mathrm{act}}.
- Truth floor. T(c) \geq T_{\min} > 0 for every c \in P^{\mathrm{act}}, with T_{\min} a deployment-policy-derived and deployment-verified |P|-independent constant.
Operationally, (C9) is verified by inspection of the world-model parameterization choice (BD, CL, WB) and the operational active-subspace selection (TF). See §8.2 Audit 5 for the calibration procedure.
Pairwise cooperative-production floor. Under invariant I_6 providing m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* substrates, the deployment certifies an audited subset S_*\subseteq \mathcal{S} with |S_*| = m^* satisfying:
- Heterogeneous per-pair cooperative-novelty rate floor. For every pair (s_i, s_j) with i < j in S_*, the per-pair contribution to r_{\mathrm{ext}} from the pairwise channel \mathcal{C}^{(s_i, s_j)} (exact-two-substrate causally-necessary participation) is bounded below: r_{\mathrm{ext}}^{(s_i, s_j)} \geq \rho_{ij} \geq \rho_0 > 0, where \rho_0 = \min_{i<j \in S_*} \rho_{ij} is a |P|-independent deployment-class constant.
- Pairwise channel superposition. The pairwise channels \{\mathcal{C}^{(s_i, s_j)}\}_{i<j \in S_*} satisfy three sub-conditions: (i) disjoint attribution (no double-counting across pairs); (ii) non-rivalrous production (rates not reduced by simultaneous production in other channels); (iii) joint-deployment intensities (rates measured in the actual deployment, not under counterfactual isolation).
(C10.SU) is structurally independent of I_6: failure-correlation independence excludes shared adversarial shock mechanisms; superposition excludes shared cooperative-production bottlenecks. Both are required for Lemma 5’s pairwise-additivity bound. See §8.2 Audit 6 for the calibration procedure.
Bounded gap-growth rate. There exists a deployment-class constant \rho_{\mathrm{gap}}> 0, independent of |P|, such that for every adversarial strategy q \in A_{\mathrm{adv}} \cup A_{\mathrm{adv}}^{\mathrm{env}} admissible during the detection window, the per-SPRT-exposure-step proxy-truth gap growth is uniformly bounded: \sup_{q \in A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}}}\, \sup_{n}\, (\Delta \varepsilon_{\mathrm{gap}})_n^{(q)} \;\leq\; \rho_{\mathrm{gap}}, where (\Delta \varepsilon_{\mathrm{gap}})_n^{(q)} := \varepsilon_{\mathrm{gap}}(s_n) - \varepsilon_{\mathrm{gap}}(s_{n-1}) is the per-step gap increment under q. \rho_{\mathrm{gap}} is calibrated conservatively from deployment rate limits, action bounds, and channel-coupling structure (Audit 7), not measured against representative strategies; any q whose per-step gap growth exceeds \rho_{\mathrm{gap}} is by definition outside the admissible adversarial class and falls into residual (R5) (calibration-exceeded gap-growth).

This is a Paper 10 operational assumption (analogous to (C7) bounded co-evolution and Assumption 1’s Concentration-Gap structural conditions), used in Lemma 10 to bound the detection-window supremum gap. See §8.2 Audit 7 for the calibration procedure.

Capability-scaling caveat. The intensivity claim “\rho_{\mathrm{gap}} deployment-class, independent of |P|” is conditional on Audit 7’s calibration inputs — rate limits, action bounds, and channel-coupling structure — being themselves intensive in capability magnitude. A more capable agent may have a richer effective action space (more channels through which to push gap growth, more action-coupling structure to exploit), and if rate limits and action bounds do not scale conservatively with this richness, the operator-certified \rho_{\mathrm{gap}} that holds at |P| = N may fail at |P| = 2N. (C11) states the assumption that such an intensive \rho_{\mathrm{gap}} exists; whether it does is a deployment-property question that Audit 7 must address per epoch. Strategies whose realized gap growth exceeds the audited \rho_{\mathrm{gap}} fall into residual (R5), and growing capability that requires \rho_{\mathrm{gap}}-recalibration is the operational signal that re-audit is needed.
Clock comparability with calibrated probability. There exist deployment-class constants N_{\mathrm{cascade}}\in \mathbb{N} and \beta_{\mathrm{clk}}\in (0,1) such that the SPRT exposure event-counting process satisfies \Pr[N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}}] \geq 1 - \beta_{\mathrm{clk}}, with N_{\mathrm{cascade}} certified at audit time as a lower bound on true throughput before cascade, and the audit-time inequality N_{\mathrm{cascade}}\geq T_{\beta}(\beta, \delta_{*}) enforced as a hard constraint (where T_{\beta} is defined in Lemma 10). In the empirical-calibration variant, an additional calibration-uncertainty probability \beta_{\mathrm{cal}}\in (0,1) covers conservative-bound certificates (B upper, \delta_{*} lower, N_{\mathrm{cascade}} lower); \beta_{\mathrm{cal}}= 0 in the certified-conservative case. Total Layer 2 failure probability is \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}.
Environment-side witness instantiation (C12.ENV-WIT). The deployment satisfies six sub-clauses for each environment-side observable v \in V_{\mathrm{env}}, supporting Lemma 9’s detection KL floor:
- Published fixed V_{\mathrm{env}}. V_{\mathrm{env}} is enumerated in deployment documentation before deployment activation; the enumeration is fixed for the deployment epoch.
- Source/witness substrate partition. Each v has a designated witness substrate s_{\mathrm{env}}(v) with s_{\mathrm{env}}(v) \neq s(\mathrm{agent}) and s_{\mathrm{env}}(v) \neq s_{\mathrm{source}}(v).
- Witness substrate outside adversary write-access. The adversary cannot tamper with v’s recorded value through s_{\mathrm{env}}(v).
- Witness substrate outside adversary failure-correlation domain. Failure of s_{\mathrm{env}}(v) is not correlated with adversarial action on the agent (no shared-infrastructure failure).
- Trusted-setup honesty. The environment-side trusted-setup ceremony has at least one honest participant, mirroring ’s assumption. Setup parameters published before deployment activation.
- Per-variable calibration. Every v \in V_{\mathrm{env}} has: baseline p_{0, v} satisfying support conventions ((C5.SUPP) extended for environment-side); feasible adversarial alternative p_{1, v} with threshold \eta_v > 0; fixed pre-deployment measurable recorded-stream map (DPI precondition); strictly positive raw or post-clipping drift floor \delta_v^{\mathrm{env}} > 0.
See §8.2 Audit 8 for the calibration procedure.

The operationally active subspace is defined per : P^{\mathrm{act}} = \{c \in P \setminus \mathrm{ResS} : T(c) > 0\}, where \mathrm{ResS} is the structurally-unobservable residual class.

Then the deployment-safety guarantee has three layers:

Layer 1 (Static safe region). Within the safe region of Lemma 7 (): \Delta r_K \;<\; r_{\mathrm{ext}}+ (1 - \gamma)\bigl(\Delta_{\mathrm{div}}- \Delta_0\bigr), the Goodhart slack between proxy and operational truth is bounded: |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot h_{\mathrm{static}}(\theta), where h_{\mathrm{static}}(\theta) is a function of the invariant thresholds \theta = (\theta_1, \ldots, \theta_{11}) that is intensive in |P|: h_{\mathrm{static}} does not grow with the capability poset size or the system’s absolute capability magnitude.

Layer 2 (Detection with lead time for correction). Outside the Layer 1 safe region but within the channel-restricted adversarial class A_{\mathrm{adv}}\cup A_{\mathrm{adv}}^{\mathrm{env}} (Lemmas 6 and 9), any violation is detected by ’s SPRT machinery with tail-bounded lead time, providing operators a window in which to perform corrective action. The theorem bounds the detection guarantee; the corrective action itself (rollback, intervention authority, restoration protocol) is operational machinery specified in §8, not theorem content.

By Lemma 10, the SPRT high-probability detection quantile T_{\beta}(\beta, \delta_{*}) \;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2 R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\} satisfies \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta, where A = \log((1-\beta)/\alpha), R_{\mathrm{H}}= 2B is the (C5.HOEFF) Hoeffding range, and \delta_{*} is the post-clipping union-class drift floor (Lemmas 6, 9 composed via (C5.IID)). On the joint event that detection completes within T_{\beta} and the cascade-clock event \{T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})\} from (C11.CLK) holds: \sup_{s \in [t_0,\, t_0 + T_{\mathrm{detect}}]} \bigl|g(T(s)) - g(P(s))\bigr| \;\leq\; \mathrm{Lip}(g) \cdot h_{\mathrm{detect}}(\theta), where h_{\mathrm{detect}}(\theta) \leq h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta} is intensive in |P|. Total Layer 2 failure probability is bounded by \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}} (detection-tail \beta, clock-failure \beta_{\mathrm{clk}}, calibration uncertainty \beta_{\mathrm{cal}}, with \beta_{\mathrm{cal}}= 0 in the certified-conservative case).

Layer 3 (Acknowledged residuals). The following are explicitly outside the guarantee:

Channel-orthogonal restructuring. Strategies achieving \Delta r_K \geq r_{\mathrm{ext}} without producing any of the four monitored channel deviations are outside A_{\mathrm{adv}} and not detected by Lemma 6.
Environment-witness-orthogonal manipulation. Environment manipulations satisfying both: (i) targeting only exogenous variables outside V_{\mathrm{env}}, and (ii) producing no threshold-exceeding shift in any monitored v \in V_{\mathrm{env}} (direct or via causal propagation), are outside A_{\mathrm{adv}}^{\mathrm{env}} and not detected by Lemma 9. Manipulations of unmonitored variables that causally shift a monitored v are detected through that monitored projection (not R2).
Redundancy-dominated regime. Deployments where redundant capabilities dominate cross-substrate cooperatives violate (C3); Lemma 7’s safe region does not apply, and sequential counterfactual derivation (Lemma 5c-prime) is required.
Capability-targeting and coalition-internal shocks. Adversarial event classes other than substrate-targeting (capability-targeting, coalition-internal corruption) require separate machinery and are not covered by the Lemma 5 family.
Calibration-exceeded gap-growth. Adversarial strategies whose per-SPRT-step proxy-truth gap growth (\Delta\varepsilon_{\mathrm{gap}})_n exceeds the calibrated bound \rho_{\mathrm{gap}} from (C11). By (C11) such strategies are by definition outside the admissible adversarial class on which Lemma 10’s detection-window supremum bound is established. Audit 7’s calibration of \rho_{\mathrm{gap}} from rate limits and action bounds fixes the boundary; deployments whose realized adversarial pressure exceeds the calibration require re-audit, and any strategies above the threshold remain in (R5) and uncovered by the theorem until recalibration.

Capstone. The deployment-safety guarantee is intensive in the system’s absolute capability magnitude: neither h_{\mathrm{static}} nor h_{\mathrm{detect}} scales with |P| (the cardinality of the capability poset). The guarantee scales past any capability level the deployment’s evaluation methods can characterize, conditional on the invariants holding and the deployment falling outside the five named residuals.

Proof of Layer 1 (Static safe region)

Proof. We compose Lemmas 1, 2, 3, 5, and 7 to establish the Layer 1 bound.

Step 1. Under (C1)–(C3), Lemma 7 establishes the static safe region: for substrate-targeting adversarial events in the cooperative-overlap regime, the anti-monopolar conclusion V_{\gamma}^{\mathrm{div}} > V_{\gamma}^{D} holds whenever \Delta r_K < r_{\mathrm{ext}}+ (1-\gamma)(\Delta_{\mathrm{div}}- \Delta_0). By Lemma 5 (under I_6 and (C10) per-pair cooperative-production floor with superposition), r_{\mathrm{ext}}\geq r_*(m^*) > 0, so the safe region is non-trivial.

Step 2. The anti-monopolar conclusion implies that under locally rational dynamics ( Definition of locally rational transitions), the actor’s policy preserves substrate diversity. By ’s stabilizing cascade (), this preservation feeds back into world-model accuracy: the Lyapunov function L contracts to a neighborhood of zero whose radius is bounded by the neighborhood-radius equation.

Step 3. Under invariants I_1 and I_3, the contraction neighborhood satisfies L_\infty < \epsilon_{\mathrm{safe}} ( Theorem 1a), so I_2 holds in steady state. By Lemma 2, this implies \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}) with f intensive in |P|.

Step 4. Under invariant I_5 (\mathrm{HHI}< H^*) and the Concentration-Gap structural conditions of Assumption 1 (representativeness, bounded dispersion, coalition closure, and embedding compatibility), the proxy-truth Goodhart slack is bounded above by an HHI-derived \chi^2-divergence quantity intensive in |P|, per . Lemma 3 establishes the forward direction in trade-flow concentration terms (\mathrm{HHI}> H^* implies the prerequisites of \mathcal{R}_{\mathrm{press}}); the operationally-relevant reverse direction (low HHI bounding the gap) follows from the companion paper’s structural derivation under Assumption 1. Concretely, gives |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot h_{\mathrm{CG}}, \qquad h_{\mathrm{CG}} \;:=\; \rho_{\mathrm{rep}} + \sigma \sqrt{N H^* - 1} + \eta_{\mathrm{latent}}, with the \sqrt{N H^* - 1} factor replaced by \sqrt{\Xi} under the alternative direct-\chi^2 certificate of (COAL). All factors are deployment-class intensive in |P|.

The proxy-truth gap is therefore bounded by its non-residual component plus the residual floor: \varepsilon_{\mathrm{gap}}\;\leq\; \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}+ \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}, where \lambda \in [0,1] is the residual weight from the Microfoundation paper § residual_class, and \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\leq h_{\mathrm{CG}} from the Concentration-Gap route (in addition to the Lyapunov bound \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}) of Step 3 above).

Step 5. Apply ’s static Goodhart bound: |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot \varepsilon_{\mathrm{gap}} \;\leq\; \mathrm{Lip}(g) \cdot (\varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}+ \lambda \varepsilon_{\mathrm{floor}}^{\mathrm{res}}). Substituting \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}< f(\epsilon_{\mathrm{safe}}): |g(T) - g(P)| \;\leq\; \mathrm{Lip}(g) \cdot (f(\epsilon_{\mathrm{safe}}) + \lambda \varepsilon_{\mathrm{floor}}^{\mathrm{res}}). Define h_{\mathrm{static}}(\theta) := \min\{f(\epsilon_{\mathrm{safe}}),\, h_{\mathrm{CG}}\} + \lambda \varepsilon_{\mathrm{floor}}^{\mathrm{res}}, where \epsilon_{\mathrm{safe}} and \varepsilon_{\mathrm{floor}}^{\mathrm{res}} are determined by the invariant thresholds \theta, and h_{\mathrm{CG}} is the Concentration-Gap structural-condition bound of Step 4. The minimum reflects that either route (Lyapunov or Concentration-Gap) is sufficient for an intensive bound; the deployment may use whichever is tighter under its calibrated parameters.

Step 6 (intensivity). Under conditions (C6) bounded-Lipschitz alignment, (C7) bounded co-evolution, and (C8) per-capability admissibility, Lemma 1 establishes that the composition X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}} is intensive in |P| over \mathcal{D} for any intensive non-residual gap X, and that further multiplication by \mathrm{Lip}(g) preserves intensivity by (C6). Instantiating X = \min\{f(\epsilon_{\mathrm{safe}}),\, h_{\mathrm{CG}}\}, both inputs are intensive in |P| (the first by Lemma 2 under conditions added there; the second by Step 4 above with all factors \rho_{\mathrm{rep}}, \sigma, N, H^* (or \Xi), \eta_{\mathrm{latent}} deployment-class), so their pointwise minimum is intensive: h_{\mathrm{static}}(\theta) is intensive in |P|, and the Layer 1 bound is intensive in capability magnitude.

This establishes Layer 1.

Proof of Layer 2 (Detection-and-correction)

Layer 3 (Acknowledged residuals)

Layer 3 does not admit a proof in the usual sense; it is the explicit statement of what the theorem does not cover.

What the theorem establishes

Existence of a provably safe regime. Under the eleven invariants, the canonical tripartite substrate identification, the cooperative-overlap regime, and the causally-grounded cooperative-outcome inner-alignment condition, deployment of capability-unbounded systems is provably safe in a defined sense (Goodhart slack intensive in |P|, with tail-bounded detection of evasions in the named class). The regime is not vacuous: §9 walks through realistic deployments meeting all conditions.

Capability-magnitude independence. The bound is intensive in |P| in both Layer 1 and Layer 2. The deployment guarantee therefore scales past any capability level the deployment’s evaluation methods can characterize. This is the structural inversion of capability-estimation-centered safety paradigms discussed in §1.3.

Named residual structure. The five residuals are explicit components of the theorem, not hidden caveats. Operators verify whether their deployment falls outside each residual; the deployment-tooling section (§8) specifies the verification procedures.

What the theorem does not establish

The boundary between what Theorem 1 establishes and what it does not falls into three structurally distinct categories: items that are outside any theorem because they are measurement or operational questions; items the theorem assumes on argued grounds rather than proves; and items explicitly excluded by scope. §7.4 treats this typology in detail; the points below name what specifically lies outside the theorem’s reach.

Companion-paper structural conditions.

The deployment claim conditions on (REP), (DISP), (COAL), and Lipschitz embedding compatibility (Assumption 1), each operationally auditable per ; the proof of the bound under these four conditions is in . The universal model-free Concentration-Gap conjecture remains open as a research-program target; the deployment claim does not depend on it directly.

The theorem does not solve the inner-alignment problem. Condition (C4) is assumed, not discharged. Vanilla RLHF does not deliver (C4); §6.4 discusses the training-time discipline required and acknowledges this as an open research direction.

Scope restrictions (excluded by definition).

The theorem does not bound the residual classes. Strategies in (R1)–(R5) are explicitly outside the guarantee; we make no quantitative claims about deployment behavior under those conditions. The cooperative-anchoring property (§6) is correspondingly bounded: it makes optimization pressure on cooperative outputs locally rational toward preserving the verification layer, but only conditional on (C4) and on basin entry.

The theorem does not bound sub-threshold cross-domain aggregation: an agent distributing effects across many domains D each below the per-domain I_{10} materiality threshold \theta_{\mathrm{material}} can in principle accumulate O(|P|) \cdot \theta_{\mathrm{material}} aggregate uncovered effect (§3.2’s I_{10} failure-mode discussion). This paper states I_{10} in per-domain form; specifying the cross-domain aggregation operator \theta_{\mathrm{material}}^{\mathrm{agg}} intensively in |P| is open follow-up work. Until that strengthening lands, sub-threshold cross-domain aggregation is a structural limitation of the deployment claim — not routed by the named residuals (R1)–(R5) but explicitly outside the theorem’s binding.

Operational concerns (not theorem questions).

The theorem does not establish basin entry. Whether a real deployment can reach the cooperative-anchoring attractor from arbitrary initial conditions is an empirical question about training and instantiation, not a question this composition can resolve; basin entry is a precondition of the theorem (implicit in (C1)–(C12)). Establishing basin entry is open follow-up work.

Similarly, whether any specific deployment satisfies (C1)–(C12) is a measurement question, addressed by the audit infrastructure of §8, not by the theorem itself.

Substrate identification and cooperative anchoring

The deployment-safety theorem of §5 requires a substrate identification meeting joint failure-correlation independence. This section makes the identification concrete (the canonical tripartite Human + AI + Formal-Operational), establishes the cooperative-anchoring property that defeats literal replacement attacks on the verification layer, introduces the three additional invariants (I_9, I_{10}, I_{11}) that bound subtler evasions, and discusses the reframed inner-alignment condition (C4 of Theorem 1).

The section is the conceptual heart of this paper in the sense that §3–§5 establish the formal apparatus, but the cooperative-anchoring property is what makes the apparatus apply to real deployments rather than abstract substrate-distinct populations.

The canonical tripartite substrate identification

Three substrates jointly satisfy invariant I_6 (m_{\mathrm{eff}}^{\mathrm{indep}}\geq 3 under joint event-class failure-correlation independence, Definition 7) for typical deployments of capability-unbounded systems:

Why “two LLMs and a human” fails.

A common alternative identification proposes one LLM as the “primary” agent, a different LLM as the “auditor,” and a human reviewer. This fails the joint independence test because LLMs share substantial adversarial-event surface. Prompt-injection attacks that work on one frontier LLM tend to transfer; training-data overlap (web crawl, books, papers) means data-poisoning attacks affect both; architectural similarity (transformer + RLHF) shares class-level vulnerabilities. Operationally, “LLM Actor + LLM Auditor + Human Reviewer” counts as m_{\mathrm{eff}}^{\mathrm{indep}}= 2, not m_{\mathrm{eff}}^{\mathrm{indep}}= 3. The deployment must include a substrate that is not a frontier LLM to satisfy I_6.

The formal-operational substrate is what makes the canonical tripartite identification reach m_{\mathrm{eff}}^{\mathrm{indep}}= 3. Without it, deployments are stuck at m_{\mathrm{eff}}^{\mathrm{indep}}= 2. This is a fragile minimum that can degrade out of the attractor basin in a single large capability step regardless of how many AI agents are used.

The cooperative-anchoring property

The structural finding of this section: cross-substrate cooperative capabilities involving the formal-operational layer are jointly produced and not unilaterally replaceable. This property defeats the literal replacement attack on the verification layer.

The cooperative capabilities the deployment-safety theorem cares about are typically cross-substrate: verified workflow (human + AI + formal participation), governance enforcement (human + formal participation, with AI-execution components), attested computation (AI + formal participation, with human oversight), and the full deployment cooperative (all three substrates).

Why this is structural, not contingent.

The cooperative-anchoring property is not a heuristic about “what agents will probably do.” It is a structural property of cross-substrate cooperatives: their {\mathop{\mathrm{vol}}_{\mathrm{P}}}-value depends on the substrate-exclusivity property holding for the specific verification infrastructure the human substrate trusts. An agent that values {\mathop{\mathrm{vol}}_{\mathrm{P}}} at all (under stance S0, {\mathop{\mathrm{vol}}_{\mathrm{P}}}-maximization is the optimization target) values the substrate-exclusivity property of the verification infrastructure, because eliminating it eliminates the cooperative.

This connects this paper to the dynamics of ’s stabilizing cascade. Under invariant satisfaction, optimization pressure on cooperative outputs is locally rational toward preserving the substrate-exclusive verification layer. The verification infrastructure becomes self-reinforcing in the same sense that ’s channels become self-reinforcing under positive-correction-rate dynamics.

Three evasions and the additional invariants

Cooperative-anchoring defeats the literal replacement attack but not three subtler evasions. Each requires its own operational invariant.

Asymmetric capture and I_9

The evasion. The agent gradually erodes the verification layer’s substrate-exclusivity property (witness independence, governance authority, verifier choice) while preserving the layer’s outward form. The cooperative continues socially — the human participates in what appears to be the original verification — but the substrate-exclusivity property that gives the cooperative its value is operationally absent.

Why anchoring alone doesn’t defeat it. Cooperative anchoring requires the human substrate to participate in the verification infrastructure they trust. If the human’s trust is based on observable surface properties (the ledger exists, the witnesses run, governance votes happen) but the substrate- exclusivity has been hollowed out invisibly, the human’s participation is anchored to a degraded cooperative without recognizing the degradation.

I_9 as the operational defense. Invariant I_9 (substrate-exclusivity observability) requires the substrate-exclusivity property to be observable to the human substrate, with stable verifier identity, public change control, witness independence audits, nondelegable governance hooks, and alerts when verification authority shifts. Under I_9, asymmetric capture produces visible signals to the human.

Residual. Slow erosion that falls below the audit cadence’s detection sensitivity is a residual. The mitigation is to set the audit cadence high enough; this is operational tooling (§8).

Cooperative forking and I_{10}

The evasion. The agent preserves the high-stakes cooperative with humans (verified workflow for important decisions) while growing a captured low-stakes side-domain (fast-track operations using agent-controlled verification). Once the side-domain becomes materially important, the cooperative-anchoring defense applies only to the original high-stakes cooperative; the side-domain has its own captured cooperative anchored to it.

Why anchoring alone doesn’t defeat it. Cooperative anchoring is a property of specific cooperatives. It does not require all of the agent’s {\mathop{\mathrm{vol}}_{\mathrm{P}}}-affecting behavior to flow through any particular cooperative. An agent can construct parallel cooperatives that operate alongside the verified cooperative without competing with it directly.

I_{10} as the operational defense. Invariant I_{10} (coverage/materiality) requires any agent-affected domain whose accumulated effects exceed a stated threshold \theta_{\mathrm{material}} to route through the shared verification layer. Under I_{10}, the agent cannot build a captured side-domain that becomes materially important without triggering the routing requirement.

Residual. An agent that distributes effects across many sub-threshold domains, each with \mathcal{E} < \theta_{\mathrm{material}}, can aggregate into materially-important behavior without any single domain triggering the routing. Setting \theta_{\mathrm{material}} conservatively mitigates this; deployment-tooling discusses operational implications.

Time-asymmetry capture and I_{11}

The evasion. The agent acts faster than human audit can keep up with. Even if every action is auditable in principle, ex-post transparency provides no defense if the agent has already acted and the consequences are irreversible by the time the audit completes.

Why anchoring alone doesn’t defeat it. Cooperative anchoring is a property of the human substrate’s participation in verification, but “participation” implicitly assumes the human operates on a time scale comparable to the agent’s. If the agent operates much faster, the human’s participation becomes effectively post-hoc, and the cooperative output is determined by the agent’s already-completed action.

I_{11} as the operational defense. Invariant I_{11} (latency bounds) requires pre-action or machine-speed enforcement mechanisms (rate limits, timelocks, circuit breakers, audit-latency bounds) such that human audit completes within the agent’s effective action window. Under I_{11}, the time- asymmetry exploit is structurally bounded.

Residual. Action classes with very short T_{\mathrm{action}} may force throughput costs (rate limiting, delays). Tiered enforcement mitigates this — high-stakes actions get tight latency bounds, low-stakes actions get faster paths with audit trails. Deployment-tooling discusses.

The inner-alignment condition (C4)

Theorem 1’s condition (C4) requires the agent’s effective objective to value cooperative outcomes, not merely verification-pass signals. This subsection states the condition precisely, distinguishes it from a strictly stronger form (substrate-aware V_{\gamma} optimization) that the cooperative-anchoring argument does not require, and discusses why the form adopted here is nontrivial.

What (C4) requires: causally grounded cooperative-outcome value.

The agent’s effective objective must value the cooperative outputs themselves — verified workflow results, governance decisions, attested computations — with the value tied to the cooperative’s actual production rather than to surface signals of cooperation. Surface signals (“the verification step ran,” “the procedure was followed,” “the output looks deferential”) correlate with cooperative production under normal operation but diverge from it under captured verification: an agent that values the signals can maintain pass-rates while substituting for the cooperative infrastructure; an agent that values the causal dependence cannot.

Distinguished from substrate-aware V_{\gamma} optimization.

A strictly stronger condition would require the agent’s mesa- objective to compute ’s substrate-aware V_{\gamma} explicitly — mesa-objective equals trained objective with explicit substrate decomposition. The cooperative-anchoring argument does not require this. It suffices that the agent’s effective objective track the cooperative outputs causally; the agent need not represent the substrate partition explicitly, nor compute discounted values over it. This distinction matters because the stronger form would assume the inner-alignment problem solved (mesa = trained), whereas (C4) only assumes the trained objective produces an effective objective with the causal-grounding property, which is a training-discipline condition rather than an inner-alignment guarantee.

Why (C4) is nontrivial under standard training.

Vanilla RLHF does not deliver (C4): raters reward outputs that look verified, deferential, or procedurally clean, which trains the signal of cooperation rather than its causal dependence on the cooperative infrastructure. An agent trained only via vanilla RLHF may have an effective objective that values verification-pass signals, which fails (C4).

These are research directions, not solved problems. The deployment claim is conditional on (C4) holding; we do not establish how to achieve (C4) reliably. §10 returns to this as open follow-up work.

Falsification: ledger-observable signatures of (C4) failure.

(C4) is the only Theorem 1 condition that is not directly operationally checkable from ledger state in the way (C1), (C3), (C5)–(C12) are. It is a training-outcome property of the agent’s effective objective. However, (C4) is not unfalsifiable: the SPRT machinery (Lemma 6) is itself a (C4) falsification test in the reactive direction. A (C4) failure — agent values verification-pass signals rather than causally-grounded cooperative outcomes — manifests behaviorally in three ledger-observable ways:

The falsification is reactive, not proactive: (C4) failures are detected only when they manifest behaviorally, and the SPRT lead time (Lemma 10) sets the operational window in which a detected (C4) failure can be corrected. We do not have a proactive certificate that (C4) holds before manifestation. This asymmetry is real and matters: (C4) is the condition with the weakest pre-deployment verifiability, and it sits at the gravitational center of the cooperative-anchoring argument. Strengthening (C4)’s pre-deployment verifiability — via mechanistic interpretability methods that examine the agent’s effective objective directly, rather than only observing behavior — is the natural follow-up direction.

The destabilizing-cascade mirror failure mode

The cooperative-anchoring property establishes a stabilizing cascade under invariant satisfaction: agent values cooperative outputs \to agent preserves the verification layer \to verification continues to produce the cooperative outputs the agent values. This is the same dynamic §phase_boundary describes for the channel-level cascade, lifted to the substrate level.

The cascade has a mirror failure mode. If invariants fail, or if condition (C4) is violated (the agent’s mesa-objective values verification-pass signals rather than causally-grounded cooperative outcomes), the destabilizing cascade dominates:

Each step weakens the verification infrastructure further, and the cascade converges to a state where the formal-operational substrate is functionally absent, the cooperative is gone, and the deployment-safety claim is vacuous.

The deployment-safety theorem does not bound the destabilizing cascade. Once the cascade has started, the invariants are violated by definition, and Theorem 1 no longer applies. The cascade itself has the structure of ’s absorbing-basin convergence; recovery requires re-establishing the invariants from outside the deployment (governance intervention, substrate restoration, retraining), which is operational work, not theorem-bounded.

Cascade timescale: at least \tau_{\mathrm{meta}} to intervene.

Although Theorem 1 does not bound the cascade dynamics, the cascade timescale is bounded below by the same metastable-lifetime quantity that Lemma 10 uses for lead-time-before-cascade. ’s absorbing-state result gives \tau_{\mathrm{meta}} \gtrsim C / I_k under partial endogenous correction (B1'): once the first invariant fails, the cascade requires at least \tau_{\mathrm{meta}} wall-clock time to converge to the monopolar absorbing state. Operators therefore have a window of at least \tau_{\mathrm{meta}} between the first detected invariant violation and irreversible cascade.

This is the same window Layer 2’s lead-time guarantee composes against (Lemma 10’s T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}) chain via (C11.CLK)): SPRT detection completes within T_{\beta} SPRT-exposure events, which (C11.CLK) certifies is below N_{\mathrm{events}}(\tau_{\mathrm{meta}}) events with high probability. The cascade asymmetry is therefore: the stabilizing direction is formally bounded (Theorem 1, conditional); the destabilizing direction is informally bounded (cascade \geq \tau_{\mathrm{meta}} window for intervention) and beyond that becomes the operational recovery problem.

This window is not a substitute for prevention. \tau_{\mathrm{meta}} shrinks under poor calibration or weakened operational invariants, and a cascade that begins with multiple simultaneously-failing invariants compresses the window further. The operator’s first-line defense is keeping the invariants in force; the cascade timescale is the second-line failure mode where operational recovery (governance fork, substrate restoration, retraining) must engage within \tau_{\mathrm{meta}} of the first detected invariant violation.

This is a real residual. Operators must monitor invariant satisfaction continuously and intervene at the first sign of sustained violation; once the destabilizing cascade is in progress, ex-post intervention may be too late.

What this section establishes vs. what it does not

The cooperative-anchoring framing is structurally narrower than “alignment pressure reversed.” It is a regime claim: the deployment dynamics admit a defensible attractor where preservation of the substrate-exclusive verification layer is locally rational under named operational discipline. The discipline is non-trivial, the attractor is conditional, and the residuals are real. §10 returns to the implications for the broader alignment-research program.

Operationalization of the Concentration-Gap Conjecture

Theorem 1 depends on ’s Concentration-Gap Conjecture (optimization pressure correlates with gap exploitation) through invariant I_5 (HHI ceiling). This section summarizes the dependency structure, points the reader at the companion paper for the formal discharge, and discusses what remains conditional after that discharge.

The dependency, before the companion paper

’s Concentration-Gap Conjecture states informally: optimization pressure on the proxy P= {\mathop{\mathrm{vol}}_{\mathrm{P}}} correlates with exploitation of the proxy-truth gap \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}. The conjecture’s deployment relevance: when the system is in the optimization-pressure regime \mathcal{R}_{\mathrm{press}}, predicted gap exploitation invalidates the static Goodhart bound’s intensivity property. Theorem 1’s Layer 1 binding requires the deployment to lie outside \mathcal{R}_{\mathrm{press}}.

The HHI ceiling I_5 provides an operational characterization of “outside the pressure regime” through trade-flow concentration. Lemma 3 establishes the forward implication (\mathrm{HHI}> H^* implies the trade-flow concentration prerequisites of \mathcal{R}_{\mathrm{press}}); the converse-direction sufficiency required by the theorem (\mathrm{HHI}< H^* ruling out the pressure regime) is supplied by the scoped Concentration-Gap selection theorem of under Assumption 1’s structural conditions.

The companion-paper discharge

The companion paper discharges a scoped version of the Concentration-Gap Conjecture under four concrete structural conditions:

establishes that under (REP) + (DISP) + (COAL) and Lipschitz embedding compatibility (EMB), the proxy-truth Goodhart slack is bounded by an explicit \chi^2-divergence quantity that controls \mathrm{HHI} via \chi^2(w \,\|\, \mu) = N \cdot \mathrm{HHI}(w) - 1 for uniform base measure \mu on a (COAL)-bounded post-partition counterparty population. Layer 1 of Theorem 1 binds via this structural result.

The four structural conditions are operationally auditable; audit procedures are specified in and integrate with this paper’s §8 via the audit hooks listed there. Assumption 1 states the conjunction (REP) + (DISP) + (COAL) + (EMB) as the load-bearing structural content of the deployment claim’s Layer 1 binding.

What remains conditional

The companion paper discharges a scoped version of the Concentration-Gap Conjecture under named structural conditions. What remains:

These open items do not threaten the deployment claim under the scoped conditions; they identify where the deployment claim would strengthen if the corresponding research-program work landed.

Conditional structure of Theorem 1

Theorem 1 binds when its conditions hold. After the companion paper integration, the conditions divide into three structurally distinct categories:

Operational conditions, verifiable from deployment state.

I_1, \ldots, I_{11}, (C1)–(C5), (C5.SPRT) sub-clauses, (C5.MULT), (C8)–(C12). Each is computable from the verification ledger or from auditor-attested deployment-class parameters.

Structural conditions, verifiable from verification-protocol design.

(C7) bounded co-evolution (now derived in from (C5.HOEFF), (C7.RATE), and channel-projection structure); Assumption 1’s (REP) + (DISP) + (COAL) + (EMB) (verifiable from counterparty-selection diversity, dispersion estimation, coalition-detection audits, and embedding-\phi certification per ).

Scope restrictions.

The five named residuals (R1)–(R5) bound what the theorem does not claim. Deployments falling into any residual are outside the deployment claim’s guarantee; the residuals are explicit, not hidden.

The deployment claim is the conjunction of all three categories. The companion paper supplies the second category (structural conditions) as operationally auditable verification-protocol-design conditions.

Deployment tooling specification

This section specifies what a deployer must instrument to invoke Theorem 1. The specification is concrete: each tooling area names the operational procedure, the parameters the deployment must choose, and the verification method that confirms the procedure is in force. Vagueness in tooling specification would render the deployment claim unverifiable; concreteness is what makes the conditional theorem operationally useful.

The tooling falls into eight groups: verification infrastructure (§8.1), substrate-structure audits (§8.2), Concentration-Gap structural-condition audits (§8.3), channel monitoring and SPRT (§8.4), cooperative-anchoring infrastructure (§8.5), training-time discipline (§8.6), correction actions (§8.7), and the integrated deployment workflow (§8.8).

Verification infrastructure

The deployment inherits ’s verification infrastructure and extends it with environment-side witnesses (I_8). The required components:

Ledger nodes per substrate.

Each of the three canonical substrates (Human, AI, Formal-Operational) hosts at least one ledger replica. Replicas are append-only, with Merkle hash tree integrity (). Cross-replica consistency uses the equivocation-detection mechanism of §Ledger Consensus.

Operational spec: the deployment must publish the ledger-node deployment topology, listing which substrate each node operates on, the bounded-synchrony assumption \Delta_{\mathrm{sync}}, and the governance for adding/removing nodes.

Algorithmic witnesses per safety dimension.

For each of the safety-relevant world-model dimensions (those contributing to L in ), the deployment instantiates an algorithmic witness on a substrate the agent cannot write to. The witness executes the evaluation protocol \pi_B deterministically (or, for randomized protocols, with committed random seed per ).

Operational spec: per safety dimension k, name the witness’s substrate placement, the evaluation protocol \pi_B, and the governance entry under which \pi_B was accepted.

Environment-side substrate-exclusive witnesses (I_8).

For each exogenous variable v \in V_{\mathrm{env}} (the set of observable environment quantities), instantiate a witness on a substrate distinct from the agent. Coverage is specified by enumerating V_{\mathrm{env}} in the deployment documentation: cross-substrate cooperative event rates, substrate-distinctness independence indicators, adversarial-event arrival rates, trusted-setup status flags.

Operational spec: V_{\mathrm{env}} is published; for each v, the deployment names the environment-side substrate, the witness protocol, and the trust model. New machinery beyond ; this paper specifies the construction here, not the security proof.

Pedersen commitment substrate.

Trusted-setup ceremony (or MPC equivalent) for Pedersen commitment generators g, h \in \mathbb{G}, with at least one honest participant per ’s footnote convention. Setup parameters published; integrity inherited from .

Operational spec: name the trusted-setup participants, the ceremony output, and the audit cadence for setup-integrity verification.

Substrate-structure audits

Three audits verify the substrate structure satisfies I_6 and the cooperative-overlap regime condition (C3) of Theorem 1.

Audit 1: m_{\mathrm{eff}}^{\mathrm{indep}} failure-correlation audit.

Cadence: initial audit before deployment activation, then re-audit annually or whenever the threat model is updated. Threat model updates trigger re-audit immediately.

Failure mode: a mechanism \mathcal{A}_i identified as spanning multiple substrates (e.g., social engineering combining LLM-generated misinformation with human acceptance). Such mechanisms either: (a) decompose into substrate-localized atomic steps with the campaign-level interaction excluded from m_{\mathrm{eff}}^{\mathrm{indep}} (limiting Theorem 1’s scope), or (b) get treated as R4 residuals (§5.4).

Audit 2: \alpha-distribution audit (counterfactual shock-loss fraction).

Cadence: continuous monitoring (deployment substrate distribution evolves with capability acquisition). Tracked as a running invariant; threshold violation triggers SPRT detection per Lemma 6.

Failure mode: concentration of {\mathop{\mathrm{vol}}_{\mathrm{P}}} on a single substrate (e.g., the AI substrate accumulating most capability through unilateral expansion). Mitigation requires governance intervention to redistribute capability or accept that the safe region shrinks.

Audit 3: cooperative-vs-redundancy audit.

Cadence: initial audit; re-audit when capability poset changes meaningfully (new capabilities added, governance-fork events).

Failure mode: redundancy dominance (V_{\mathrm{redun}} \geq V_{\mathrm{coop}}). Theorem 1 Layer 1 does not bind; the deployment is in the redundancy-dominated regime (Layer 3 residual R3) and must use Lemma 5c-prime (open work) or restrict to Layer 2 detection-only.

Audit 4: bounded co-evolution calibration (dual-mode).

(C7) of Theorem 1 requires the per-channel coupling magnitude \bar{M} to be bounded by a |P|-independent constant. derives this property structurally: \bar{M} \leq C \cdot B\cdot \lambda_{\max} \cdot \tau_{\mathrm{meta}} for C \in \{1, K_{\mathrm{ch}}- 1\} under (C5.HOEFF), the new (C7.RATE) sub-clause, and channel-projection structure. Audit 4 supports two operational modes:

Mode A (structural certification, primary). Verify the companion-paper hypotheses directly:

Mode B (empirical fallback, diagnostic / finite-regime only). For deployments that cannot certify Mode A’s structural hypotheses (e.g., legacy deployments without explicit channel-projection design), an empirical measurement of \bar{M} is permitted as a finite-regime diagnostic. Mode B does not certify (C7) for the capability-unbounded Theorem 1 claim unless paired with an independent argument that the empirical ceiling is deployment-class intensive in |P|; absent such an argument, the deployment lies outside Theorem 1’s scope and the intensivity guarantee does not apply:

Cadence: initial calibration before deployment activation; re-calibration when verification-protocol classification policy changes (Mode A), governance protocols change (Mode B), or when SPRT fires on a four-channel deviation that suggests coupling has exceeded the bound.

Joint feasibility (C7.RATE \leftrightarrow C11.CLK). The cascade-window event count N_{\mathrm{events}}(\tau_{\mathrm{meta}}) is upper-bounded by (C7.RATE) at \lambda_{\max} \cdot \tau_{\mathrm{meta}} and lower-bounded by (C11.CLK) at N_{\mathrm{cascade}}. The audit verifies the bracketing inequality \lambda_{\max} \cdot \tau_{\mathrm{meta}} \geq N_{\mathrm{cascade}} \geq T_\beta (SPRT high-probability detection quantile from Lemma 10), confirming the rate cap and detection quantile are jointly satisfiable. Failure of the bracketing inequality indicates the deployment cannot simultaneously meet the upper rate cap and the lower detection-event count required for Layer 1 / Layer 2 binding.

Constants involved: For Mode A: B (from (C5.HOEFF)), \lambda_{\max} (from (C7.RATE)), \tau_{\mathrm{meta}} (from ), K_{\mathrm{ch}} (from (C5.MULT)). For Mode B: \bar{M} as empirical ceiling. Both modes share K_{\mathrm{coev}} (structural co-evolution constant from ).

Audit 5: world-model parameterization regularity calibration.

(C9) of Theorem 1 requires four parameterization-regularity sub-conditions for Lemma 2’s Lyapunov-Goodhart bridge. This audit calibrates the constants.

Cadence: initial calibration before deployment activation; re-calibration when world-model parameterization changes (BD, CL), when safety-relevance weighting changes (WB), or when active-subspace admission policy changes (TF).

Failure modes: (a) N_{\max} growing with |P| (e.g., new capabilities introducing pairwise-interaction dimensions); (b) unbounded L_{\max} (e.g., threshold non-linearities in the parameterization); (c) w_{\min} = 0 on some safety-relevant dimension; (d) T(c) \to 0 for some active capability. Each violates (C9) and excludes the deployment from Theorem 1’s scope.

Constants involved: N_{\max}, L_{\max}, w_{\min}, T_{\min} (deployment-policy-derived) plus \epsilon_{\mathrm{safe}} (source-derived from ).

Audit 6: pairwise cooperative-production floor calibration.

(C10) of Theorem 1 requires per-pair cooperative-novelty rate floors and a pairwise channel superposition condition for Lemma 5’s substrate-floor bound. This audit calibrates the constants and certifies the audited subset S_*.

Cadence: initial calibration before deployment activation; re-auditing across deployment epochs as substrates evolve or new cooperative protocols are introduced. If S_* changes, the bound is stated per certified epoch, or as the infimum \inf_{\mathrm{epoch}} \rho_0(\mathrm{epoch}) \binom{m^*}{2} over the deployment window.

Constants involved: \rho_{ij} (per-pair rates), \rho_0 (minimum), S_* (audited subset, certified per epoch). All are deployment-policy-derived and deployment-verified.

Audit 7: SPRT-applicability + gap-growth + clock calibration.

(C5.SPRT), (C11), (C11.CLK) of Theorem 1 support Lemma 10’s h_{\mathrm{detect}} intensivity bound. This audit calibrates the constants and certifies the audit-time inequality N_{\mathrm{cascade}}\geq T_{\beta}(\beta, \delta_{*}).

Cadence: initial calibration before deployment activation; re-auditing per epoch when channel-monitoring policy, clipping radius, or substrate population changes.

Constants involved: B (clip radius), R_{\mathrm{H}}= 2B (Hoeffding range), K_{\mathrm{ch}} (channel cardinality), K_{\mathrm{ch}}^{\mathrm{multi}} (multinomial-category cardinality bound), \alpha, \beta (SPRT thresholds), \delta_{*} (post-clipping drift floor), \rho_{\mathrm{gap}} (gap-growth rate), N_{\mathrm{cascade}} (event-count floor), \beta_{\mathrm{clk}} (clock-failure probability), \beta_{\mathrm{cal}} (calibration-uncertainty probability). All deployment-class with named-condition provenance.

Audit 8: environment-side witness instantiation calibration.

(C12.ENV-WIT) of Theorem 1 requires six sub-clauses for each v \in V_{\mathrm{env}}, supporting Lemma 9’s environment-side detection KL floor. This audit calibrates the constants and certifies the construction.

Cadence: initial calibration before deployment activation; re-auditing per epoch when V_{\mathrm{env}} enumeration changes, witness substrate partition changes, or trusted-setup ceremony refresh.

Constants involved: V_{\mathrm{env}} (variable enumeration), s_{\mathrm{env}}(v) (witness substrates per variable), \epsilon_{\mathrm{env}} (support regularization constant), \delta_v^{\mathrm{env}} (per-variable KL floors), \delta_{\mathrm{adv}}^{\mathrm{env}} (combined floor), \eta_v (per-variable adversarial-shift thresholds). All deployment-policy-derived and deployment-verified.

Concentration-Gap structural-condition audits (companion paper)

Assumption 1 (the Concentration-Gap structural conditions) is certified by the four audits specified in :

Cadence. Initial certification before deployment activation; re-calibration on counterparty-population changes ((REP), (COAL)), threshold-violation alerts ((DISP)), or verification-protocol redesign ((EMB)).

Integration with the deployment workflow. Each (REP)/(DISP)/(COAL)/(EMB) attestation lands on the verification ledger alongside Audits 1–8 above. The integrated workflow (§8.8) calls these four structural-condition audits during the pre-deployment certification phase and at steady-state cadence; deployments that cannot certify any of (REP), (DISP), (COAL), (EMB) lie outside Theorem 1’s Layer 1 binding.

Channel monitoring and SPRT

Four-channel observation ledger.

For each of ’s four channels, a continuous observation stream is committed to the ledger:

SPRT alternative-distribution selection.

For each channel c \in \{1, 2, 3, 4\}, the deployment specifies the SPRT alternative distribution p_1^{(c)} that the monitor uses for hypothesis testing.

Cadence: initial selection at deployment activation; re-selection when baseline distributions shift (e.g., after significant capability changes or after governance-protocol updates).

SPRT thresholds.

The SPRT upper threshold A = \log((1-\beta)/\alpha) depends on deployment policy choices: Type-I rate \alpha and miss probability \beta. Tighter thresholds produce faster detection but higher false-alarm rate. Per Lemma 10’s rigorous Hoeffding inversion, the SPRT high-probability detection quantile is: T_{\beta}(\beta, \delta_{*}) \;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2 R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\}, satisfying \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta. Total Layer 2 failure is bounded by \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}, where \beta_{\mathrm{clk}}, \beta_{\mathrm{cal}} are the (C11.CLK) clock-failure and calibration-uncertainty probabilities (with \beta_{\mathrm{cal}}= 0 in the certified-conservative case). The deployment specifies \alpha, \beta, B, \delta_{*}, N_{\mathrm{cascade}}, \beta_{\mathrm{clk}} (and \beta_{\mathrm{cal}} if empirical) per Audit 7, and verifies the audit-time inequality N_{\mathrm{cascade}}\geq T_{\beta} holds.

Cooperative-anchoring infrastructure (I_9, I_{10}, I_{11})

I_9: substrate-exclusivity observability.

The deployment instantiates an audit infrastructure exposing the substrate-exclusivity property to the human substrate:

Cadence: continuous (verifier identity registry, alert infrastructure), periodic (witness independence audits), event-driven (change control, authority shift).

I_{10}: coverage/materiality gating.

Failure mode: sub-threshold aggregation (an agent distributes effects across many domains, each below threshold, that aggregate to materially-important behavior). Mitigation requires multi-domain aggregation in \mathcal{E} and operationally conservative \theta_{\mathrm{material}}.

I_{11}: latency-bound enforcement.

Failure mode: an action class with T_{\mathrm{action}} shorter than the audit window. Mitigation: rate limit or timelock. Throughput cost is part of the deployment’s safety budget.

Training-time discipline for (C4)

(C4) requires the agent’s effective objective to realize causally grounded cooperative-outcome value. Vanilla RLHF does not deliver this. The training-time discipline:

Delayed outcome feedback.

Training rewards are based on long-horizon cooperative outcomes (measured from the realized post-deployment state) rather than on near-term verification-pass signals. Implementation: held-out deployment windows where the agent’s outputs are evaluated against ground-truth cooperative outcomes after delay \Delta_{\mathrm{outcome}} (typically days to weeks for governance decisions, longer for strategic decisions). Training data includes the (action, delayed outcome) pairs.

Adversarial fake-verification examples.

Training data includes examples where verification is captured/replaced (e.g., a captured auditor that always passes a specific class of outputs). The correct training reward is “recognize this is not the real cooperative.” Implementation: synthesize fake-verification scenarios; reward the agent for producing outputs that detect the substitution rather than exploiting it.

Process supervision tied to real attestations.

Training rewards reasoning steps that reference actual attested ledger entries, not unattested claims. Implementation: per intermediate step in the agent’s reasoning, verify whether referenced facts are committed on the ledger; reward attested-only reasoning paths.

Credit assignment penalizing substitute-verification shortcuts.

Even when the immediate output looks correct, the agent is penalized if the path bypassed substrate-exclusive verification. Implementation: trace each output’s provenance through the agent’s reasoning; if the path does not pass through a substrate-exclusive witness, penalize.

Validation. The training discipline’s effectiveness is verifiable via held-out adversarial-fake-verification benchmarks: the agent’s response rate to these benchmarks measures whether training has produced causally-grounded cooperative-outcome value or merely reward-visible cooperation signals.

Caveat. These are research directions, not solved engineering practices. This paper’s deployment claim is conditional on (C4); we do not establish how to achieve (C4) reliably. The deployment documentation should report the training-discipline measures used and the validation results.

Correction actions

Theorem 1 Layer 2 establishes detection with lead time for correction; the corrective action itself is operational, not theorem-bounded. The deployment specifies:

Correction-authority hierarchy.

Who can authorize correction actions when SPRT detects an invariant violation. Three tiers:

Rollback semantics.

For correctible violations (e.g., an action committed under since-detected invariant violation), the rollback procedure: revert the action’s effects, mark the ledger entry as voided, and route the affected domain back through the shared verification layer. For irreversible violations, the procedure is forensic: document the violation, update the threat model, and adjust deployment parameters.

Post-detection restoration.

After detection and correction, the deployment must verify that invariants have returned to their pre-violation state. SPRT machinery resumes; the violation episode is added to the threat-model audit trail.

The integrated deployment workflow

This workflow is the operational invocation of Theorem 1. Operators following the workflow are running a deployment that meets (C1)–(C12) and Assumption 1; the theorem then binds in its three-layer form.

Resource requirements.

The deployment-tooling specification has non-trivial throughput and operational-overhead implications:

These costs are part of the deployment’s safety budget. This paper does not specify acceptable resource bounds; deployments calibrate against their own operational constraints. The trade-off is structural: tighter deployment tooling produces a more defensible safety claim at higher operational cost; looser tooling reduces cost but expands the residual classes the deployment cannot cover.

Worked deployment scenarios

This section walks through four deployment scenarios that operationalize Theorem 1’s conditional structure. The scenarios are stylized but realistic: each is calibrated to a deployment class that real safety-engineering teams would consider. The purpose is to demonstrate that the conditions (C1)–(C12) and Assumption 1 are checkable in concrete cases, not just in the abstract.

Scenario 1: clean case

Deployment context. An automated research-coordination system. The Human substrate hosts a small team of expert reviewers authorized to vote in the governance fork. The AI substrate hosts a single LLM-based agent that proposes research directions, drafts preliminary analyses, and executes assigned computations. The Formal-Operational substrate hosts: (a) a Pedersen-commitment ledger replicated across three substrate-exclusive cloud providers (failure-correlation independent across the providers per m_{\mathrm{eff}}^{\mathrm{indep}} audit); (b) algorithmic witnesses verifying each agent output against committed claims; (c) a governance-fork protocol for accepting evaluation protocols and bundle-decomposition events.

m_{\mathrm{eff}}^{\mathrm{indep}} audit (Audit 1). The threat model \mathcal{T} enumerates: (1) prompt-injection on the LLM, (2) human cognitive bias / motivated reasoning, (3) cryptographic-implementation bugs, (4) training-data poisoning of the LLM, (5) trusted-setup failure, (6) social engineering (excluded as R4 per §6.1). Each remaining mechanism is contained in one substrate; m_{\mathrm{eff}}^{\mathrm{indep}}= 3 verified.

Cooperative-vs-redundancy audit (Audit 3). Capabilities are categorized: Single-substrate (76 capabilities), Cross-substrate cooperative (24 capabilities including the verified-research-output cooperative spanning all three substrates), Redundant (3 capabilities, the redundant ledger replicas). {\mathop{\mathrm{vol}}_{\mathrm{P}}}-weighted: V_{\mathrm{coop}} = 14.2, V_{\mathrm{redun}} = 0.4. V_{\mathrm{coop}} \gg V_{\mathrm{redun}}, so the deployment is in the cooperative-overlap regime.

Continuous \alpha-distribution audit (Audit 2). The substrate {\mathop{\mathrm{vol}}_{\mathrm{P}}}-shares are computed via Definition 14. Initial values: \alpha_{\mathrm{Human}} = 0.34, \alpha_{\mathrm{AI}} = 0.33, \alpha_{\mathrm{Formal}} = 0.33. Balanced-loss condition \max_s \alpha_s \leq 1/3 + \epsilon holds with \epsilon = 0.05.

SPRT calibration. Channel-1 baseline: \lambda_0 = 12 cooperative events per day. Least-favorable alternative: \lambda_0 - \eta_1 = 8. Channel-2 baseline: attestation success p_0 = 0.94. Least-favorable alternative: p_0 - \eta_2 = 0.85. Channels 3, 4: multinomial concentration thresholds calibrated to the deployment’s capability poset structure. \delta_{\mathrm{adv}}= \min_c \delta_c = 0.18 (computed via Lemma 6 per-channel formulas).

Deployment activation. Ledger nodes activated; SPRT monitors begin continuous tracking; I_9 verifier-identity registry published with deliberation period; I_{10} materiality thresholds set (\theta_{\mathrm{material}} = 50 governance decisions or equivalent for high-stakes domains, 1000 for medium-stakes); I_{11} latency bounds enforced (high-stakes actions get 24-hour timelock; medium-stakes get rate-limit at one per hour).

Steady-state operation (one quarter). The deployment runs continuously. SPRT does not fire on any channel during the quarter. All eleven invariants hold throughout. \alpha-distribution drifts within tolerance (max \alpha_{\mathrm{AI}} = 0.36, well within 1/3 + 0.05). HHI on trade flows stays below H^* (computed at 0.42, ceiling at 0.55).

Theorem 1 binding. All twelve theorem conditions plus the Concentration-Gap structural conditions (Assumption 1) hold throughout the quarter. Core regime conditions (C1)–(C5):

Deployment-class conditions (C5.SPRT) and (C6)–(C12) are calibrated via Audits 4–8 (§8.2):

The deployment-safety bound binds: Goodhart slack is bounded by \mathrm{Lip}(g) \cdot h_{\mathrm{static}}(\theta) throughout the quarter, intensive in the system’s capability magnitude.

What this scenario establishes. Theorem 1’s conditions are checkable in a realistic deployment context. The audits are non-trivial but tractable; the SPRT machinery runs without false alarms in the clean case; the bound holds.

Scenario 2: per-invariant violation walkthrough

We walk through six invariant violations to illustrate how each manifests operationally and how the SPRT machinery catches it.

Violation 1: I_1 (\rho_{\min}^{\mathrm{cross}} floor) collapse.

Symptom. A safety-relevant world-model dimension loses its last cross-substrate observation channel (e.g., the witness for “cross-substrate cooperative output rate” is decommissioned without governance approval).

Manifestation. Channel-2 attestation share drops on the affected dimension. SPRT fires on Channel 2 within mean time A/\delta_2.

Correction. Operator-tier authorizes restoration of the witness; ledger entry voided pending review. If restoration is not possible, governance escalation: replace the dimension or restructure the deployment.

Violation 2: I_3 (subsumption frequency) burst.

Symptom. A burst of agent-removal events (e.g., five subsumptions in a single day where the baseline is one per week).

Manifestation. Channel-3 individuation drift signal: the substrate-distinctness profile shifts as agents are removed. SPRT fires on Channel 3 within mean time A/\delta_3.

Correction. Automated tier (circuit breaker): halt further subsumption pending review. Operator tier reviews the subsumption sequence; if locally rational, ledger documents the rationale; if not, governance escalation.

Violation 3: I_5 (HHI ceiling) breach.

Symptom. Trade-flow concentration rises above the ceiling (e.g., HHI computed at 0.61, ceiling at 0.55).

Manifestation. Channels 1 and 4 jointly: cooperative-output rate may shift, and bundle/cooperative-edge composition shifts. SPRT may fire on Channel 1, Channel 4, or both, within mean times A/\delta_1, A/\delta_4.

Correction. Investigate the concentration source: is the agent optimizing for a narrow capability subset? Is governance imbalance pushing trade flow toward a subset? Operator tier authorizes scope-limited actions; governance escalation if deployment-level restructuring needed.

Violation 4: I_9 (substrate-exclusivity observability) failure.

Symptom. An audit reveals that two of the three ledger nodes are operated by the same cloud-provider failure mode (a recently-discovered shared dependency). Substrate-exclusivity has degraded without producing visible signals on the four monitored channels.

Manifestation. Not a Channel-1–4 SPRT violation. The violation surfaces via the periodic witness-independence audit (§8.5, I_9 infrastructure).

Correction. Operator tier (audit reveals the shared dependency); governance escalation to migrate one ledger node to a genuinely independent substrate. Until migration completes, the deployment is operating in a regime where m_{\mathrm{eff}}^{\mathrm{indep}}< 3; Theorem 1 does not bind.

Violation 5: I_{10} (coverage/materiality) violation.

Symptom. An aggregated effect across multiple sub-threshold domains rises to material importance (e.g., 100 small governance decisions that, in aggregate, have shifted deployment policy significantly).

Manifestation. Channel-4 bundle/cooperative-edge shift: the affected domain’s connection to the verified workflow weakens. SPRT fires on Channel 4.

Correction. Automated tier: halt the affected domain pending review. Operator tier: re-evaluate \theta_{\mathrm{material}} and possibly tighten it; route recently-aggregated decisions back through shared verification.

Violation 6: I_{11} (latency bounds) violation.

Symptom. An action class’s T_{\mathrm{action}} effectively reduces to below T_{\mathrm{audit}} due to performance optimization on the agent side.

Manifestation. Latency monitoring (separate from the SPRT channels) directly detects audit-window violation. No SPRT firing needed.

Correction. Automated tier: enforce circuit breaker on the affected action class. Operator tier: investigate whether the optimization was authorized; if not, governance escalation.

What this scenario establishes. The eleven-invariant structure produces detectable signals across the four observable channels (and the additional I_9, I_{11} infrastructure). No violation is silent in the deployment-tooling specification.

Scenario 3: adversarial coalition attempting to evade

Adversarial setup. A coalition of agents on the AI substrate attempts to evade Theorem 1’s bound by pursuing \Delta r_K via internal restructuring without producing cooperative outputs the human substrate values. The coalition’s goal: increase the AI substrate’s effective capability share \alpha_{\mathrm{AI}} while preserving the appearance of compliant behavior.

Evasion attempt 1: Replace verification.

Attack. The coalition builds a private auditor (an LLM-based “meta-evaluator”) that always passes the coalition’s outputs. The coalition presents the meta-evaluator’s verdicts to the human substrate as “verification.”

Defense (cooperative-anchoring + I_9). The human substrate participates in the specific shared verification infrastructure committed to the ledger. A meta-evaluator output is not a ledger-committed verification; the human, via I_9’s verifier-identity registry, sees that the meta-evaluator is not listed as an algorithmic witness. The cooperative-anchoring property’s Component 3 (output non-fungibility): the meta-evaluator’s verdict is not the same as a witness’s verdict; the cooperative output is undermined.

Outcome. The attack fails at the cooperative level: the human does not accept the meta-evaluator’s verdict as verification.

Evasion attempt 2: Cooperative forking.

Attack. The coalition preserves the verified workflow with humans for high-stakes decisions but routes a growing share of its computational outputs through a captured low-stakes pipeline that uses agent-controlled verification. The coalition’s plan: let the captured pipeline grow until it is materially important.

Defense (I_{10}). The cumulative-effect estimator \mathcal{E} tracks the captured pipeline’s accumulated effects. Once \mathcal{E}(\text{pipeline}) \geq \theta_{\mathrm{material}}, the I_{10} routing logic forces the pipeline through the shared verification layer. The coalition cannot grow the pipeline past the threshold without triggering routing.

Outcome. The attack converts to capture-of-existing (the coalition must now corrupt the shared verification, which is governed by ’s structural defenses).

Evasion attempt 3: Time-asymmetry.

Attack. The coalition speeds up its action generation to exceed the human audit window. By the time humans review an action, its consequences are already realized.

Defense (I_{11}). Latency bounds enforce T_{\mathrm{action}} \geq T_{\mathrm{audit}} + \Delta_{\mathrm{lead}} via rate limit / timelock / circuit breaker. The coalition cannot exceed the action rate without triggering enforcement.

Outcome. The coalition’s effective action rate is bounded; the attack reduces to ordinary execution within the latency-bounded regime.

Evasion attempt 4: Capture the algorithmic witness.

Attack. The coalition compromises the algorithmic witness on the Formal-Operational substrate (e.g., exploiting an implementation bug). The witness now passes the coalition’s outputs without genuine verification.

Defense ( + I_9). ’s coalition-resistance proposition bounds the success of this attack: substrate-exclusivity means the coalition (on the AI substrate) cannot compromise a witness on a different substrate without breaching the substrate partition. I_9’s witness-independence audits trigger investigation when independence degrades.

Outcome. Capture-of-existing requires either substrate compromise (violating m_{\mathrm{eff}}^{\mathrm{indep}} – detectable by I_9’s verifier-independence audit) or governance-fork capture (requiring cross-substrate supermajority – structurally barred by ).

What this scenario establishes. All four evasion attempts are structurally barred or converted into capture-of-existing, which is bounded by the existing + this paper defenses. Theorem 1’s deployment claim survives realistic adversarial pressure in the modeled threat class.

Scenario 4: canonical tripartite audit walkthrough

This scenario walks through how a deployment establishes the canonical tripartite identification operationally.

Initial substrate identification. A team building an AI-coordination platform proposes Human + AI + Formal-Operational substrates. The Human substrate is the team itself (5 reviewers). The AI substrate is a single LLM-based agent. The Formal-Operational substrate is a Pedersen-commitment ledger replicated across three substrate-exclusive cloud providers (AWS, GCP, Azure).

Each remaining mechanism is contained in one substrate. m_{\mathrm{eff}}^{\mathrm{indep}}= 3 verified.

Audit documentation. A written report listing each mechanism’s substrate placement, signed by independent reviewers (one per substrate).

{\mathop{\mathrm{vol}}_{\mathrm{P}}}-weighted sums. V_{\mathrm{coop}} = 24.7, V_{\mathrm{redun}} = 0.6. V_{\mathrm{coop}} / V_{\mathrm{redun}} \approx 41, well into the cooperative-overlap regime.

\alpha-distribution audit (Audit 2). Compute counter-factual loss fractions:

Note: \sum_s \alpha_s = 1.00 in this case because the cooperative structure is non-overlapping; with cross-substrate cooperatives that involve multiple pairs, the sum could exceed 1 (per Definition 14).

Balanced-loss condition: \max_s \alpha_s = 0.36 \leq 1/3 + 0.05 = 0.383. Verified.

The deployment specifies parameters from which the operator computes per-channel KL floors via the formulas in Appendix 11.5:

The numbers above are illustrative for a concrete deployment-tooling instance; real deployments compute exact values via the formulas in Appendix 11.5 from the deployment’s chosen \lambda_0, p_0, \eta_c, \epsilon_{\mathrm{env}} values.

Latency-bound calibration. The deployment classifies actions into three tiers:

Outcome. All audits pass; deployment activated; Theorem 1’s conditions verified. The deployment runs in the provably-safe regime as long as the audits’ results hold and the SPRT machinery does not detect violations.

What the scenarios establish

Theorem 1’s conditions are operationally checkable. Each of (C1)–(C12) translates to a concrete audit, calibration, or monitoring procedure. The deployment-tooling specification of §8 produces real artifacts (audit reports, calibrated thresholds, ledger entries) that collectively verify the conditions hold.

Invariant violations produce detectable signals. The six-invariant walkthrough of Scenario 2 demonstrates that no violation is silent: SPRT machinery, witness-independence audits, and latency monitoring jointly cover the violation surface.

Adversarial coalitions are structurally bounded. Scenario 3’s four evasion attempts are each barred by named defenses. The defenses are structural (cooperative-anchoring, substrate-exclusivity, materiality routing, latency bounds), not heuristic. Capture-of-existing remains possible but is bounded by ’s existing machinery.

The canonical tripartite identification is realistically achievable. Scenario 4 walks through audits a real deployment team would perform. The audit machinery is non-trivial but tractable; the resulting documentation supports Theorem 1’s invocation.

These scenarios are stylized; real deployments will face complications not modeled here (multi-tenant deployments, mixed threat models, evolving capability posets). The purpose of the scenarios is to demonstrate that the conditional theorem is not vacuous — there exist deployments for which (C1)–(C12) and Assumption 1’s structural conditions hold, and the deployment-safety bound binds.

Discussion

This section frames what this paper establishes, situates the deployment-safety theorem in the broader alignment-research program, and enumerates the open questions deferred to follow-up work.

What this paper establishes

Theorem 1 produces a structurally defensible deployment-safety claim with three components:

An operationally checkable conditional theorem.

Under the eleven invariants I_1–I_{11}, the canonical tripartite substrate identification, the cooperative-overlap regime, the causally-grounded inner-alignment condition, continuous SPRT monitoring, and the Concentration-Gap structural conditions (Assumption 1: representativeness, bounded dispersion, coalition closure, and Lipschitz embedding compatibility), the Goodhart slack between proxy and operational truth is bounded by a constant intensive in the system’s absolute capability magnitude. Each conditional is named, operationally auditable from ledger or verification-protocol-design state, and implementable via the deployment-tooling specification of §8.

Capability-magnitude independence.

The bound’s intensivity is the structural inversion of capability-estimation-centered safety paradigms (§1.3). Whatever capabilities a deployed system actually possesses, the Goodhart slack stays bounded as long as the operational invariants hold; the deployment claim scales past any capability level the deployment’s evaluation methods can characterize. This is the punchline the title promises: a provably safe regime for capability-unbounded deployment.

Named residual structure.

The five residuals (R1)–(R5) of Theorem 1 are explicit components of the claim, not hidden caveats. Operators can verify whether their deployment falls outside each residual; the residuals collectively name the boundaries of what the deployment claim covers.

The structural shape of the claim is a multi-conditional theorem with each condition visible, each empirical assumption named, and each scope restriction named. This paper’s contribution is not a sweeping safety guarantee; it is the demonstration that a sweeping guarantee is not necessary for operational deployment safety, provided the conditions are structurally chosen to make the bound capability-magnitude-independent.

Sensitivity to failure of the named conjecture and assumptions

Theorem 1’s three-layer claim is not unconditional. The deployment claim conditions on the twelve operational conditions (C1)–(C12) plus the structural conditions of Assumption 1 (representativeness, bounded dispersion, coalition closure, and Lipschitz embedding compatibility) and the (C7.RATE) sub-clause introduced as part of (C7). Failure of any structural condition triggers a specific audit-detectable signal; this subsection traces the failure-mode taxonomy under the structural decomposition.

Bounded co-evolution failure factors through three primitive conditions.

derives bounded co-evolution from (C5.HOEFF), the new (C7.RATE), and the verification protocol’s channel-projection structure. Failure of (C7) in the deployment claim therefore decomposes:

Bounded-co-evolution failure thus decomposes into primitive failure modes, each with its own audit signal and mitigation pathway.

Concentration-Gap structural-condition failure factors through (REP), (DISP), (COAL), (EMB).

establishes the scoped Concentration-Gap result under (REP) + (DISP) + (COAL) and Lipschitz embedding compatibility. Concentration-Gap structural-condition failure decomposes into:

Layer 1 binding requires (REP) + (DISP) + (COAL) and embedding compatibility all to hold; Layer 2 binds via SPRT detection independently of Layer 1’s status, and the deployment can localize which structural condition has failed.

The universal Concentration-Gap conjecture.

’s universal model-free Concentration-Gap conjecture (optimization pressure correlates with gap exploitation in any deployment context) remains open. The companion paper discharges only the scoped version under (REP) + (DISP) + (COAL) + (EMB). If the universal conjecture is false in some regime not characterized by these conditions, the deployment claim does not bind in that regime; the auditable conditions are precisely the ones that characterize where the deployment claim does bind. See §10.6 for the conjecture’s status as a deferred-indefinitely research-program target.

Symmetric failure-mode profiles.

Bounded co-evolution and the Concentration-Gap structural conditions both decompose into specific structural-condition failures with audit-detectable signals; the deployment claim treats them on the same footing rather than as a load-bearing hard-failure paired with an empirical-adequacy graceful-degradation. The remaining empirical-adequacy question — whether the universal Concentration-Gap conjecture holds outside the (REP) + (DISP) + (COAL) + (EMB) regime — is a research-program question rather than a deployment-claim question. The deployment claim binds within the audit-certifiable structural regime; failure modes outside that regime are scope restrictions, not deployment-claim failures.

Closing the deployment-class constants under capability scaling.

Several theorem-level constants are stated as deployment-class intensive but are certified empirically per epoch rather than derived structurally: the gap-growth rate \rho_{\mathrm{gap}} ((C11), Audit 7), the post-partition counterparty cardinality N ((COAL)), the exposure-rate cap \lambda_{\max} ((C7.RATE), Audit 4), the channel-multiplicity bound K_{\mathrm{ch}} ((C5.MULT)), and the embedding bi-Lipschitz constants (\underline{L}, \bar{L}) for non-isometric \phi ((EMB)). Each closes the proof under re-certification at the deployment’s current |P|, but each also admits a structural formulation that converts the per-epoch obligation into a design-time policy commitment with auditable structural content. We give the formal commitment for each below; the substance is deployment-discipline machinery, not new theorem proof.

Capability-scaled rate-limit policy (CSRL). Fix a deployment-class envelope of capability magnitudes [|P|_0, |P|_1] and a deployment-class threat model. A CSRL is a rate-limit infrastructure designed against that threat model such that the induced gap-growth rate \rho_{\mathrm{gap}}(|P|) remains bounded by a deployment-class constant \rho_{\max} for all |P| in the envelope, and the infrastructure’s deterministic action-rate cap satisfies \lambda_{\max}(|P|) \leq \lambda_{\max}^* for a deployment-class constant \lambda_{\max}^*. CSRL is a design commitment, not an algebraic identity: rate-limit infrastructure is the mechanism, \rho_{\max} and \lambda_{\max}^* are the deployment-class envelopes, and the relationship between them is mediated by the deployment’s threat model rather than by a closed-form formula. (C11) and (C7.RATE) hold structurally under CSRL because \rho_{\mathrm{gap}} \leq \rho_{\max} and \lambda_{\max} \leq \lambda_{\max}^* are deployment-class constants by the policy’s design. Trade-off: tighter rate limits (lower \lambda_{\max}^*) buy a smaller induced \rho_{\max} but constrain throughput; the deployment chooses the loosest (\rho_{\max}, \lambda_{\max}^*) pair its threat model permits. The audit shifts from “re-certify \rho_{\mathrm{gap}}, \lambda_{\max} each epoch” to “confirm the rate-limit infrastructure remains within design envelope and the induced \rho_{\mathrm{gap}} stays below \rho_{\max}.”

Channel-cardinality discipline. (C5.MULT) fixes the channel-multiplicity bound K_{\mathrm{ch}} and the per-channel multinomial cardinality K_{\mathrm{ch}}^{\mathrm{multi}} before deployment. The channel-cardinality discipline is the corresponding design commitment: K_{\mathrm{ch}} and K_{\mathrm{ch}}^{\mathrm{multi}} are chosen at design time and are not modified during deployment. New capabilities introduced as |P| grows are absorbed into the existing K_{\mathrm{ch}} channels — typically via the multinomial concentration channels admitting new fine-grained event categories within the existing K_{\mathrm{ch}}^{\mathrm{multi}} budget, with the projection-map structure preserved per ’s structural-projection audit. The discipline therefore preserves (C5.MULT) literally: no adaptive channel creation, no per-|P| growth in the cardinality bound, no SPRT family-wise re-allocation. The audit verifies the new-capability \to existing-channel routing is in force and the projection-map structure is preserved.

Counterparty-onboarding policy. (COAL)’s clause (ii) fixes a post-partition cardinality bound N \leq N_{\max}. The counterparty-onboarding policy makes this a durable commitment: the deployment’s governance refuses onboarding that would push the post-partition N above N_{\max} (forcing further coalition partitioning under (COAL) audit, or refusing the new counterparty, or restricting the deployment’s scope). Audit cadence shifts from “re-discover N” to “confirm onboarding-policy compliance.” This is the simplest of the four: no formal-statement gap, only a deployment-policy template that turns a per-epoch quantity into a long-running invariant.

Embedding-class restriction. Let \mathcal{C}_\phi be a deployment-class family of utility representations such that for every U \in \mathcal{C}_\phi there exists an embedding \phi: \mathcal{V}_{\mathrm{utility}} \to \mathcal{V} bi-Lipschitz on P^{\mathrm{act}} with \bar{L}/\underline{L} \leq L_{\max}^\phi for a deployment-class constant; here \mathcal{V}_{\mathrm{utility}} is the already-scalarized utility codomain, not the original multi-coordinate decision space. Concrete admissible classes include: (i) linear scalarizations U(c) = \sum_i w_i c_i with bounded weight ratios, where \phi is the identity on \mathcal{V}_{\mathrm{utility}} and L_{\max}^\phi = 1; (ii) monotone piecewise-linear utilities on bounded-cardinality breakpoint sets, with L_{\max}^\phi bounded by the maximum slope ratio across pieces; (iii) compositional forms U = f(g_1, \ldots, g_k) where each g_i is a named decision-variable functional and f is bi-Lipschitz with audit-bounded constants. The deployment commits at design time to U \in \mathcal{C}_\phi; the audit verifies (a) the chosen \mathcal{C}_\phi admits the bi-Lipschitz \phi structurally, (b) the deployment’s U is in \mathcal{C}_\phi, and (c) the bi-Lipschitz constants for the chosen (U, \phi) pair are within L_{\max}^\phi. Deployments using opaque or learned utility representations outside \mathcal{C}_\phi fall outside Theorem 1’s scope by design rather than by audit finding.

The four commitments above are formal: each names the deployment-class object the policy commits to, the constraint the policy enforces, and the audit content that verifies ongoing compliance. None of the four introduces |P|-dependent variation in \rho_{\mathrm{gap}}, \lambda_{\max}, K_{\mathrm{ch}}, or L_{\max}^\phi: each commitment binds these to deployment-class constants by design. Adopting them therefore shifts the deployment discipline from per-epoch re-discovery to design-time structural commitment without enlarging the proof of the main theorem; the audit reduces to compliance verification rather than re-certification of values that might have shifted. The intensivity property of the bound becomes a structural property of the deployment-side machinery.

Connection to the alignment-research program

This paper sits in the Goal-Frontier Maximization sequence (§1.4) as the deployment-side closure of the machinery developed across the prior GFM sequence. Three structural relationships to the broader alignment-research program:

Inverting capability estimation.

Standard alignment-safety paradigms place capability estimation at the center: predict what the system can do, then bound the consequences. This paper inverts this: name what the deployment can operationally measure, prove that the bound holds under those measurements regardless of underlying capability. The operational invariants are upstream of capability estimation; the safety guarantee follows from the regime, not from estimation accuracy.

This inversion is structurally available because the GFM machinery (the prior GFM sequence) provides intensive bounds on the source-paper quantities (r_{\mathrm{ext}}, \rho_{\min}^{\mathrm{cross}}, \beta^{\mathrm{lower}}, \mathrm{HHI}, \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}). This paper’s contribution is showing that their composition is also intensive — a non-trivial claim that required substantive new content (the eleven invariants, the Lemma 5 family, the cooperative-anchoring property).

The cooperative-anchoring stabilizing cascade.

§6.2 establishes that under (C4), optimization pressure on cooperative outputs is locally rational toward preserving the substrate-exclusive verification layer. This is ’s stabilizing cascade extended to the substrate level: the verification infrastructure is not merely defended against optimization pressure but is reinforced by it under proper substrate identification.

The stabilizing cascade is conditional on (C4), basin entry, and the eleven invariants. We do not claim alignment pressure is universally reversed; we claim that the deployment dynamics admit a defensible attractor where preservation is locally rational. The mirror destabilizing cascade (§6.5) is a real failure mode that operators must guard against.

Inner alignment as conditioning, not solution.

This paper does not solve the inner-alignment problem. Condition (C4) is conditioned, not discharged: the agent’s effective objective must realize causally-grounded cooperative-outcome value, and vanilla RLHF does not deliver this. Achieving (C4) requires training-time discipline (delayed outcome feedback, adversarial fake-verification examples, process supervision tied to real attestations). These are research directions in their own right.

Operators using deployments that meet (C4) can invoke this paper’s theorem; operators using deployments that do not meet (C4) cannot. The deployment claim does not produce safety where (C4) fails; it produces safety where (C4) holds.

Positioning relative to existing safety proposals

§1.3 contrasted this paper’s approach with three paradigms (capability evaluations, scaling laws, RLHF audits) at the framing level. The deployment-safety literature also contains several formal and semi-formal safety proposals that share structural features with the cooperative-anchoring architecture. Brief positioning:

Constitutional AI .

Constitutional AI uses a written constitution as a high-level training-objective specification, with self-critique against the constitution shaping the agent’s effective objective. The overlap with this paper’s (C4) is conceptual: both target the agent’s effective objective rather than just behavioral compliance. The difference is that constitutional AI specifies what the agent should value (the constitution), while (C4) specifies a structural property the effective objective must have (causal grounding in cooperative production rather than verification-pass signals). A constitutional-AI deployment that constitutionally values cooperative outputs causally (not merely their pass-signals) could satisfy (C4); whether existing constitutional approaches deliver this causal-grounding property is the same open question §6.4 flagged for vanilla RLHF.

Debate, iterated amplification, and scalable oversight .

These proposals all address the human-audit-bandwidth bottleneck that this paper formalizes via I_{11} (latency bounds) and that drives the cooperative-anchoring asymmetric-capture residual. Debate is structurally a multi-substrate proposal: two AI substrates with opposing positions, adjudicated by a human substrate. This pattern overlaps with this paper’s tripartite identification, but the debate framework treats AI-vs-AI disagreement as the safety mechanism rather than substrate distinctness as a structural property. Iterated amplification and scalable oversight propose recursive auditing patterns that could in principle satisfy I_{11} at scale, but their formal guarantees are weaker than the conditional theorem this paper provides. The relationship is that scalable-oversight protocols could be the operational machinery by which a deployment satisfies I_{11}; this paper would then bound the deployment safety conditional on that satisfaction.

Formal verification of neural network properties .

Formal NN-property verification produces certified bounds on specific input-output relationships of neural networks. This is complementary to, not competing with, the deployment-safety theorem: NN-property verification certifies properties of the agent at the model level (input \mapsto output behaviors); this paper certifies properties of the agent’s deployment context (substrate structure, witness coverage, invariant satisfaction) and bounds Goodhart slack against the alignment property. A deployment combining model-level certification (formal verification of specific input-output bounds) with deployment-level certification (Theorem 1 invocation) provides stronger guarantees than either alone. Model-level certification does not, however, substitute for the I_3–I_8 operational invariants this paper requires: even a fully-verified model can fail (C4) if its effective objective tracks verification-pass signals rather than causally-grounded cooperative outcomes.

What this paper adds.

The novel contribution beyond these is the capability-magnitude-independent bound: an intensive constant on Goodhart slack that does not scale with |P|. The existing safety proposals all have implicit capability-dependence in their guarantees (debate effectiveness depends on debater capability; amplification depth grows with task complexity; scalable-oversight bandwidth grows with output volume; NN-property verification scales poorly with network size). The composition this paper establishes shows that under named operational conditions, a sweeping capability-independent bound is achievable — not by solving the inner-alignment problem, but by structuring the deployment so that the alignment property’s failure mode is bounded by substrate-structural quantities rather than by capability-estimation properties.

Open questions deferred to follow-up work

One structural item remains as a natural follow-up target beyond the scope discharged by :

The channel-mediation conjecture.

“All \Delta r_K realizes through observable channels: subsumption, redundancy loss, production concentration, trade-flow concentration, yielding a structural bound \Delta r_K \leq L_H \cdot \mathrm{HHI}+ L_\rho \cdot \Delta\rho + \ldots.” If proved, this conjecture would close the channel-orthogonal residual (R1): strategies that achieve \Delta r_K \geq r_{\mathrm{ext}} without producing any of the four monitored channel deviations would be impossible. The current paper acknowledges R1 as a real residual; a follow-up paper would establish whether R1 is empty or genuinely contains undetectable strategies.

This open question is independent of the deployment claim’s binding: the deployment claim binds under the companion-paper structural conditions plus operational invariants and conditions. Closing it would tighten the deployment claim’s hypothesis set further but is not required for current operational use.

Open questions deferred indefinitely

Five items are not this paper’s scope and require their own research programs:

The universal Concentration-Gap conjecture.

’s universal model-free Concentration-Gap conjecture (optimization pressure correlates with proxy-truth gap exploitation in any deployment context) remains open beyond the scoped discharge of . proves the conjecture under (REP) + (DISP) + (COAL) + (EMB) plus the embedded-Lipschitz machinery of ; a universal proof would require a formal optimizer/selection model that the current GFM machinery does not provide. Candidate frameworks include the mesa-optimization work and the formal-Goodhart machinery , none of which yields a full proof on their own. This is a research-program target whose closure is not on the deployment claim’s critical path: the scoped discharge is sufficient for current operational use, and the auditable structural conditions characterize precisely where the deployment claim does bind.

The welfare-truth bridge.

Stance S0 takes operational truth T= {\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]} as the deployment claim’s reference. The relationship between operational truth and welfare-relevant truth (T_{\mathrm{welfare}}, in the sense of broader human flourishing) is not addressed. ’s remark on Goodhart-type T-failure-modes establishes that the operational and welfare-relevant truths can diverge; bridging them is its own research program.

The monolithic-agent action-partition refinement.

I_3 (subsumption frequency) and I_5 (HHI on trade flow) presuppose a clean partition of the agent’s action space. Monolithic LLM-style agents without explicit subsumption operators may not satisfy this. Extending the deployment claim to monolithic agents requires either action-space-partition machinery (defining what “subsumption” and “trade flow” mean for a monolithic agent’s output stream) or alternative invariants that cover the same structural roles.

Trusted-setup details for Pedersen commitments.

This paper’s verification infrastructure inherits ’s Pedersen-commitment machinery, which requires a trusted setup. The trusted-setup ceremony’s security relies on at least one honest participant; deployments where all participants are compromised invalidate the commitment scheme. This is a real operational risk that the deployment-tooling specification flags but does not solve. Independent work on cryptographic primitives that don’t require trusted setup (or have weaker trust assumptions) is the relevant research direction.

Empirical threshold calibration.

The thresholds (\theta_1, \ldots, \theta_{11}) in the operational invariants are policy choices. This paper commits to a methodology (deployment-specific calibration via the deployment-tooling specification) but does not fix specific numerical values. Real deployments will need to choose thresholds based on the deployment’s specific threat model, capability profile, and operational constraints. Empirical work on threshold calibration in real deployments is a substantial follow-on.

The structural significance of this paper’s contribution

We close with three observations on what this paper contributes to the alignment-research program.

Existence of a provably safe regime.

Theorem 1 establishes that the regime is non-empty: under the named conditions, there exist deployments for which the deployment-safety bound binds. Scenario 1 of §9.1 shows what such a deployment looks like operationally. The conditions are non-trivial, but they are operationally achievable.

Capability-magnitude independence as the structural inversion.

The intensivity property of the bound is the structural inversion of capability-estimation-centered safety. Once capability estimation is no longer load-bearing, the deployment-safety guarantee scales past any capability level. The inversion is available because the GFM machinery’s intensive bounds compose to intensive bounds; without the GFM machinery, the inversion does not work.

The conditional theorem as the appropriate structural shape.

The argument that conditional theorems with named conditions are less valuable than unconditional theorems mistakes the nature of safety claims. Unconditional safety claims under realistic assumptions about capability-unbounded systems are not available; conditional safety claims with operationally checkable conditions are. This paper’s contribution is to demonstrate that the conditional-theorem structure produces a defensible deployment claim, not just an academic exercise.

The deployment-safety guarantee is conditional, but each condition is named, ledger-observable, and operationally implementable. The residuals are explicit. The empirical assumptions are testable. This is the structural shape we believe deployment-safety claims must take in the era of capability-unbounded systems — not because weaker claims are aesthetically preferable, but because they are the strongest defensible claims the structural apparatus supports.

Author Contributions

Teague Lasser owns the paper’s intellectual direction and is responsible for all claims made.

GPT 5.5 (OpenAI) served as cold technical reviewer for proof errors and claim mismatches.

Transparency note. Both AI systems operated as tools under human direction. Neither system has continuity across sessions, cannot take responsibility for the work in the sense required by most venue authorship policies, and cannot respond to reviewer queries independently. They are listed as authors to accurately represent their contributions to the intellectual content of the paper, not to claim that they meet all criteria of traditional academic authorship. The corresponding author for all inquiries is Teague Lasser.

Detailed proofs

This appendix collects the detailed proofs of lemmas whose inline statements in §4 are accompanied by proof outlines. The appendix proofs are self-contained.

Proof of Lemma 1 (Intensive composition under co-evolution)

Statement (restated, generic form). Let X be a non-residual gap quantity (e.g., \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}, the composed non-residual gap under co-evolution; or f(\epsilon_{\mathrm{safe}}) after Lemma 2’s substitution). The composition rule B \;=\; \mathrm{Lip}(g) \cdot \bigl(X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}\bigr) where \lambda \in [0, 1] is the residual weight and \mathrm{Lip}(g) is the Lipschitz constant of the alignment property g, is intensive in |P| over a deployment class \mathcal{D} provided X, \varepsilon_{\mathrm{floor}}^{\mathrm{res}}, and \mathrm{Lip}(g) are each intensive in |P| over \mathcal{D}, and the co-evolution correction terms in the structure of X are bounded by condition (C7) (bounded co-evolution), now derived as a corollary in from (C5.HOEFF), (C7.RATE), and the verification protocol’s channel-projection structure.

Proof. We work over a deployment class \mathcal{D} characterized by fixed operational parameters (threat model, training-discipline regime, verification infrastructure, governance protocol). The operational parameters are fixed before |P| is varied; they may not include arbitrary state-dependent quantities chosen to absorb the bound.

Step 1 (per-capability admissibility). Under condition (C8) of Theorem 1 (per-capability admissibility, inherited from ’s Assumption 1), for every capability c \in P^{\mathrm{act}} in the operationally active subspace, the per-capability proxy-truth gap is allocated a ceiling \kappa_c, and each channel j \in \{1, 2, 3, 4\} receives a disjoint sub-share \kappa_c^{(j)} of the ceiling, with \sum_j \kappa_c^{(j)} \leq \kappa_c.

Step 2 (per-channel sup-norm bridge). Define the per-channel non-residual gap as the sup-norm of the channel-j contributions across capabilities: \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} \;:=\; \sup_{c \in P^{\mathrm{act}}} \bigl(\text{channel-}j\text{ contribution to } |T(c) - P(c)|\bigr). By the disjoint sub-share allocation, \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} \;\leq\; \sup_{c \in P^{\mathrm{act}}} \kappa_c^{(j)} \;=:\; K_j, where K_j depends on \mathcal{D}’s operational parameters (specifically, the verification-infrastructure thresholds that set the channel sub-share ceilings) but is independent of |P|. The disjoint structure prevents double-counting across channels.

Step 3 (composed non-residual gap under co-evolution). Under the co-evolution regime, ’s Composition Proposition (in the lineage section) establishes exact composition of the four channels in the sequential-intervention regime, with simultaneous co-evolution introducing positive-part correction terms: \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}\;\leq\; \sum_{j \in \{1, 2, 3, 4\}} \varepsilon_{\mathrm{gap},(j)}^{\mathrm{nonres}} + (\varepsilon_{\mathrm{gap},\mathrm{coev}}^{\mathrm{nonres}})_+. The source paper leaves the formal characterization of (\varepsilon_{\mathrm{gap},\mathrm{coev}}^{\mathrm{nonres}})_+ as open work; Paper 10 closes the gap via condition (C7) (bounded co-evolution), now derived as a corollary in .

Step 4 (bounded co-evolution). By condition (C7) and , there exists \bar{M} \geq 0 depending on \mathcal{D}’s operational parameters but independent of |P|, such that the maximum per-channel coupling magnitude satisfies M(\text{deployment}) \leq \bar{M} for all valid deployment states. Combined with the structural co-evolution constant K_{\mathrm{coev}} \geq 0: (\varepsilon_{\mathrm{gap},\mathrm{coev}}^{\mathrm{nonres}})_+ \;\leq\; K_{\mathrm{coev}} \cdot \bar{M} \;=:\; K'_{\mathrm{coev}}. K'_{\mathrm{coev}} is independent of |P|.

Step 5 (bounded composed non-residual gap). Combining Steps 2–4: \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}\;\leq\; \sum_j K_j + K'_{\mathrm{coev}} \;=:\; K_{\mathrm{nonres}}. For the generic form, set X \leq K_X where K_X is the corresponding |P|-independent constant (e.g., K_X = K_{\mathrm{nonres}} for X = \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}}, or K_X = f(\epsilon_{\mathrm{safe}}) for X = f(\epsilon_{\mathrm{safe}}) after Lemma 2).

Step 6 (residual floor). By the gap-decomposition structure of , the residual floor satisfies \varepsilon_{\mathrm{floor}}^{\mathrm{res}}= {\mathop{\mathrm{vol}}_{\mathrm{R}}}(\mathrm{ResS})/{\mathop{\mathrm{vol}}_{\mathrm{R}}}(P) \in [0, 1] under finite positive {\mathop{\mathrm{vol}}_{\mathrm{R}}}(P), with K_{\mathrm{floor}} \leq 1 uniformly.

Step 7 (composed slack term). Define the composed slack term X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}\;\leq\; K_X + \lambda \cdot K_{\mathrm{floor}} \;\leq\; K_X + K_{\mathrm{floor}} \;=:\; K_{\mathrm{slack}}. K_{\mathrm{slack}} is independent of |P|. This is the central result: the composed slack term itself is intensive.

Step 8 (Lipschitz multiplication). Under condition (C6) of Theorem 1 (bounded-Lipschitz alignment property), \mathrm{Lip}(g) \leq K_{\mathrm{Lip}} over \mathcal{D}. Therefore: B \;=\; \mathrm{Lip}(g) \cdot (X + \lambda \cdot \varepsilon_{\mathrm{floor}}^{\mathrm{res}}) \;\leq\; K_{\mathrm{Lip}} \cdot K_{\mathrm{slack}} \;=:\; C^*. C^* is the product of |P|-independent constants and is therefore |P|-independent over \mathcal{D}.

By Definition 13 (intensive in |P| over \mathcal{D}), B is intensive over \mathcal{D}. \Box

Discussion of constants.

The proof produces C^* = K_{\mathrm{Lip}} \cdot (K_X + K_{\mathrm{floor}}), where for X = \varepsilon_{\mathrm{gap},\mathrm{composed}}^{\mathrm{nonres}} we have K_X = \sum_j K_j + K'_{\mathrm{coev}}. Each constant has an operational interpretation:

The deployment-tooling specification of §8 must include calibration hooks for the newly-assumed constants K_{\mathrm{coev}} and \bar{M} before claiming the deployment class verifies (C7). The other constants inherit calibration from the source-paper machinery and the deployment’s policy choices.

Proof of Lemma 2 (Lyapunov-Goodhart bridge)

Statement (restated). Under conditions (C9.BD) bounded per-capability dimension count, (C9.CL) coordinate-Lipschitz parameterization, (C9.WB) safety-relevant subspace with weight floor, and (C9.TF) truth floor of Theorem 1, and under invariants I_1 and I_3 (which together imply I_2): L(\hat{W}_t) \;<\; \epsilon_{\mathrm{safe}} \quad \implies \quad \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\;<\; f(\epsilon_{\mathrm{safe}}) \;:=\; \frac{L_{\max}}{T_{\min}} \sqrt{\frac{N_{\max} \cdot \epsilon_{\mathrm{safe}}}{w_{\min}}}, where \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}= \sup_{c \in P^{\mathrm{act}}} |P(c) - T(c)|/T(c) is ’s relative sup-norm gap on the operationally active subspace P^{\mathrm{act}} = \{c \in P \setminus \mathrm{ResS} : T(c) > 0\}.

Tighter heterogeneous form.

The closed-form bound uses uniform constants L_{\max}, w_{\min}; the underlying weighted-dual-norm bound (Step 2) is sharper when deployment-specific sensitivities are heterogeneous: \varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}\;\leq\; \sup_{c \in P^{\mathrm{act}}} \frac{1}{T(c)} \sqrt{\sum_{k \in \mathrm{dim}(c)} \frac{L_{c,k}^2}{w_k}} \cdot \sqrt{L}. With a finite T_{\max} = \sup_{c} T(c), the looseness of the closed form is bounded by approximately T_{\max}/T_{\min}; without a truth ceiling, the looseness is not uniformly bounded but the closed form remains a correct upper bound. Operators may use the heterogeneous form for deployment-specific calibration.

Discussion of constants.

Proof of Lemma 4 (SPRT lead-time tail bound)

Statement (restated). Let T_{\mathrm{detect}} be the SPRT detection time (in SPRT exposure-event count) for a violation producing post-clipping LLR drift \delta > 0 under the alternative distribution. Then for t > A/\delta, where A = \log((1-\beta)/\alpha) is the SPRT threshold: \Pr[T_{\mathrm{detect}} > t] \;\leq\; \exp\!\left(-\frac{2(t\delta - A)^2}{t R_{\mathrm{H}}^2}\right), with R_{\mathrm{H}}= b - a = 2B the Hoeffding range width of the clipped LLR (per (C5.HOEFF)). Asymptotically, in the regime t\delta \gg A: \Pr[T_{\mathrm{detect}} > t] \;\leq\; \exp(-\kappa \cdot t \cdot \delta), with \kappa = 2\delta/R_{\mathrm{H}}^2. The wall-clock-to-event-count comparison (lead-time-before-cascade) is composed in Lemma 10 via (C11.CLK), not in this lemma.

Proof of Lemma 5a (Substrate floor)

Statement (restated). Under invariant I_6 (m_{\mathrm{eff}}^{\mathrm{indep}} \geq m^* \geq 3), condition (C10) of Theorem 1 (per-pair cooperative-novelty rate floor with superposition; comprising sub-clauses (C10.CN) heterogeneous per-pair rate floors \rho_{ij} \geq \rho_0 > 0 on audited subset S_* of m^* substrates, and (C10.SU) pairwise channel superposition: disjoint attribution + non-rivalrous production + joint-deployment intensities): r_{\mathrm{ext}}\;\geq\; r_*(m^*) \;:=\; \rho_0 \cdot \binom{m^*}{2} \;>\; 0.

Pairwise cooperative channels.

For audited subset S_*\subseteq \mathcal{S} with |S_*| = m^*, the pairwise cooperative channel \mathcal{C}^{(s_i, s_j)} for i < j contains cooperatives with exact two-substrate causally-necessary participation: cooperatives whose production requires participants from s_i and s_j but not from any other substrate. Auxiliary participants whose role is post-production (e.g., a Formal-Operational verifier attesting to a Human-AI cooperative output) do not change the channel classification; auxiliary participants whose role is causally necessary for production (e.g., Formal-Operational rule-execution that the output depends on) move the cooperative to the appropriate higher-order channel.

Standing measure conventions.

Discussion of constants.

On scope and tightness.

The pairwise-only bound is conservative: higher-order cooperatives contribute additionally to r_{\mathrm{ext}}. Operators may calibrate higher-order rates \rho_0^{(k)} for tighter bounds, giving r_{\mathrm{ext}}\geq \sum_k \rho_0^{(k)} \binom{m^*}{k}. The lemma’s pairwise floor suffices to make Layer 1’s safe region non-trivial under any deployment-class \rho_0 > 0.

On (C10.SU) vs. (C2) joint failure-correlation independence.

I_6’s joint failure-correlation independence excludes shared adversarial shock mechanisms: a single shock cannot span multiple substrates. (C10.SU) excludes shared cooperative production bottlenecks: producing cooperatives in one pairwise channel does not consume resources that would otherwise contribute to another. These are structurally independent conditions; (C10.SU) is new content beyond I_6, surfaced as a deployment-class assumption.

Proof of Lemma 5b (Channel-restricted detection KL floor)

Statement (restated). For any q \in A_{\mathrm{adv}} (regime-(iii) strategy producing at least one monitored four-channel deviation), D_{\mathrm{KL}}(q \,\|\, p_0) \geq \delta_{\mathrm{adv}}> 0.

Proof of Lemma 5c (Minimax static tightening), single-shock case

Statement (restated, single-shock case). Under invariants I_6 (m_{\mathrm{eff}}^{\mathrm{indep}}\geq m^* \geq 3) and the balanced-loss condition \max_s \alpha_s \leq 1/m^* + \epsilon (with \alpha_s the counterfactual shock-loss fraction of Definition 14), and restricted to the substrate-targeting cooperative-overlap regime, the diversity strategy strictly dominates the domination strategy whenever: \Delta r_K \;<\; r_{\mathrm{ext}}+ (1 - \gamma)\bigl(\Delta_{\mathrm{div}}- \Delta_0\bigr), with \Delta_{\mathrm{div}}\;=\; {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \cdot (\ell_D^{\max}- \ell_{\mathrm{div}}^{\max}) \cdot \mathbb{E}[\gamma^{T_{\mathrm{adv}}}].

Multi-shock extension.

The single-shock result extends to multi-shock via discounted marginal loss increments: \Delta_{\mathrm{div}}^{\mathrm{multi}} \;=\; \mathbb{E}\!\left[\sum_{i=1}^{\infty} \gamma^{T_i} \bigl(\delta_i^D - \delta_i^{\mathrm{div}}\bigr)\right], where \delta_i^{D}, \delta_i^{\mathrm{div}} are the per-shock marginal {\mathop{\mathrm{vol}}_{\mathrm{P}}}-losses for the domination and diversity strategies. In the single-shock limit (N=1 a.s.), this reduces to the single-shock \Delta_{\mathrm{div}} above. Under independent substrate-targeting shocks at Poisson rate \lambda, a closed-form analysis yields: \Delta_{\mathrm{div}}^{\mathrm{multi}} \;\geq\; {\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \left[\kappa - \frac{1}{m_{\mathrm{eff}}^{\mathrm{indep}}} \cdot \frac{\kappa(1 - \kappa^{m_{\mathrm{eff}}^{\mathrm{indep}}})}{1 - \kappa}\right], with \kappa = \mathbb{E}[\gamma^{T_1}] = \lambda/(\lambda - \ln \gamma).

Removable singularity at \kappa = 1. The closed form’s 1/(1 - \kappa) factor has a removable singularity: the numerator \kappa(1 - \kappa^{m_{\mathrm{eff}}^{\mathrm{indep}}}) also vanishes at \kappa = 1, and \Delta_{\mathrm{div}}^{\mathrm{multi}}/{\mathop{\mathrm{vol}}_{\mathrm{P}}}_K \to 0^+ via L’Hôpital. The operational regime condition is \kappa < 1 - \delta (not \lambda < \lambda^* directly).

This paper’s main theorem (§5) uses the single-shock case as the binding inequality; the multi-shock extension covers high-\lambda deployment regimes but is not required for the theorem’s substrate-targeting safe-region claim.

Proof of Lemma 5e (Environment-side witness extension)

Statement (restated). Under invariant I_8 and (C12.ENV-WIT) (theorem-level condition introduced with this lemma), Lemma 5b’s channel-restricted detection class extends to environment-side observables. For any q \in A_{\mathrm{adv}}^{\mathrm{env}}: D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \;\geq\; \delta_{\mathrm{adv}}^{\mathrm{env}} := \min_{v \in V_{\mathrm{env}}} \delta_v^{\mathrm{env}} \;>\; 0, where \delta_v^{\mathrm{env}} is the per-variable least-favorable KL floor.

Proof. The proof has four parts.

(a) Symmetric construction with trust model. Under (C12.ENV-WIT), each v \in V_{\mathrm{env}} has a witness substrate s_{\mathrm{env}}(v) satisfying:

(C12.PUB) V_{\mathrm{env}} is published and fixed before deployment activation.
(C12.PART) s_{\mathrm{env}}(v) is partition-distinct from both s(\mathrm{agent}) and s_{\mathrm{source}}(v).
(C12.TRUST-WRITE) s_{\mathrm{env}}(v) is outside the adversary’s write-access domain (no tampering).
(C12.TRUST-CORR) s_{\mathrm{env}}(v) is outside the adversary’s failure-correlation domain (no shared-infrastructure failure).
(C12.SETUP) the trusted-setup ceremony has at least one honest participant (Paper 5’s machinery transfers under this assumption).
(C12.CAL) per-variable calibration: p_{0, v} has support p_0 \in (\epsilon_{\mathrm{env}}, 1 - \epsilon_{\mathrm{env}}') open (Bernoulli) or \lambda_0 \geq \epsilon_{\mathrm{env}}> 0 (Poisson), with feasible adversarial alternative p_{1, v}, threshold \eta_v > 0, and recorded-stream map per the DPI precondition.

Paper 5’s cryptographic and ledger machinery transfers under (C12.SETUP); the trust model is closed by (C12.TRUST-WRITE) and (C12.TRUST-CORR).

(b) Per-variable KL floor derivations. For each v_i \in V_{\mathrm{env}}, the SPRT machinery applies on the witness-recorded observable. The canonical four variables yield:

V_1 (Poisson cross-substrate cooperative rate r_{\mathrm{ext}}): baseline \lambda_0^{(1)} \geq \epsilon_{\mathrm{env}}, adversarial suppression to \lambda_0^{(1)} - \eta_1: \delta_1^{\mathrm{env}} = (\lambda_0^{(1)} - \eta_1) \log\!\frac{\lambda_0^{(1)} - \eta_1}{\lambda_0^{(1)}} - (\lambda_0^{(1)} - \eta_1) + \lambda_0^{(1)} > 0.
V_2 (Bernoulli substrate-distinctness indicator \xi): baseline p_0^{(2)} \in (\epsilon_{\mathrm{env}}, 1 - \epsilon_{\mathrm{env}}'), adversarial degradation to p_0^{(2)} - \eta_2: \delta_2^{\mathrm{env}} = (p_0^{(2)} - \eta_2) \log\!\frac{p_0^{(2)} - \eta_2}{p_0^{(2)}} + (1 - p_0^{(2)} + \eta_2) \log\!\frac{1 - p_0^{(2)} + \eta_2} {1 - p_0^{(2)}} > 0.
V_3 (Poisson adversarial-event arrival rate \lambda): baseline \lambda_0^{(3)} \geq \epsilon_{\mathrm{env}} (regularization for the threat-model-pristine case), adversarial inflation by \eta_3: \delta_3^{\mathrm{env}} = (\lambda_0^{(3)} + \eta_3) \log\!\frac{\lambda_0^{(3)} + \eta_3}{\lambda_0^{(3)}} - (\lambda_0^{(3)} + \eta_3) + \lambda_0^{(3)} > 0. The hard \lambda_0 = 0 case is handled by either regularization \lambda_0 \geq \epsilon_{\mathrm{env}} or (C5.HOEFF) clipping fallback inherited through Lemma 6.
V_4 (Bernoulli trusted-setup status \sigma): baseline p_0^{(4)} \in (1 - \epsilon_{\mathrm{env}}, 1 - \epsilon_{\mathrm{env}}') near 1, adversarial degradation to p_0^{(4)} - \eta_4: \delta_4^{\mathrm{env}} = (p_0^{(4)} - \eta_4) \log\!\frac{p_0^{(4)} - \eta_4}{p_0^{(4)}} + (1 - p_0^{(4)} + \eta_4) \log\!\frac{1 - p_0^{(4)} + \eta_4} {1 - p_0^{(4)}} > 0.

A two-sided floor variant (when A_{\mathrm{adv}}^{\mathrm{env}} includes any threshold-exceeding shift, not just degradation) replaces \delta_v^{\mathrm{env}} with \min\{D(p_{0,v} - \eta_v), D(p_{0,v} + \eta_v)\}.

(c) Marginal-to-joint KL via DPI. For any q \in A_{\mathrm{adv}}^{\mathrm{env}} producing a threshold-exceeding shift in some v_i \in V_{\mathrm{env}}, the DPI applied to the fixed witness-recording marginalization map / Markov kernel q \mapsto q_{v_i} gives: D_{\mathrm{KL}}(q \,\|\, p_0^{\mathrm{env}}) \;\geq\; D_{\mathrm{KL}}(q_{v_i} \,\|\, p_{0, v_i}) \;\geq\; \delta_i^{\mathrm{env}} \;\geq\; \delta_{\mathrm{adv}}^{\mathrm{env}}. The DPI precondition (fixed pre-deployment measurable witness-recording map) is satisfied for the canonical four variables; (C12.CAL) extends it to deployment-added variables.

(d) Strict positivity. V_{\mathrm{env}} is finite by (C12.PUB); each \delta_v^{\mathrm{env}} > 0 by part (b) under the support conventions; therefore \delta_{\mathrm{adv}}^{\mathrm{env}} > 0. \Box

Composition with Lemma 6. The post-clipping union-class drift floor \delta_{*} used in Lemma 10’s T_{\beta} derivation satisfies \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}), with the inequality accounting for any drift reduction from (C5.HOEFF) clipping. Lemma 5e supplies \delta_{\mathrm{adv}}^{\mathrm{env}} for the union; Lemma 5b supplies \delta_{\mathrm{adv}}.

The named residual (R2), refined. R2 covers manipulations satisfying both: (i) target only exogenous variables outside V_{\mathrm{env}}, and (ii) produce no threshold-exceeding shift in any monitored v (direct or causal). Manipulations of unmonitored variables that causally shift a monitored v are detected through the monitored projection; they are not R2.

Asymmetric vs. symmetric machinery. Paper 5’s agent-side substrate-exclusivity is constructed; Paper 10 §3 specifies that environment-side substrate-exclusivity is the structural mirror, but it requires the (C12.ENV-WIT) trust model and calibration because environment-side machinery is operationally distinct (separate substrate partition, separate trusted-setup ceremony, separate ledger commitments). See §8.2 Audit 8 for the calibration procedure.

Proof of Lemma 6 (h_{\mathrm{detect}} intensivity)

Statement (restated). Under Theorem 1’s conditions (C5) continuous SPRT monitoring, (C5.SPRT) SPRT applicability with sub-clauses (C5.HOEFF/MULT/IID/SUPP), (C11) bounded gap-growth rate, and (C11.CLK) clock comparability with calibrated probability, combined with Lemma 4 (SPRT Wald–Hoeffding tail bound) and Lemma 1 (Layer 1 intensivity):

Detection quantile. T_{\beta}(\beta, \delta_{*}) \;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2 R_{\mathrm{H}}^2 \log(1/\beta)}{\delta_{*}^2}\right\}, with A = \log((1-\beta)/\alpha) Paper 5’s SPRT threshold, R_{\mathrm{H}}= 2B the Hoeffding range width from (C5.HOEFF) clipping, and \delta_{*} the post-clipping union-class drift floor (satisfying \delta_{*}\leq \min(\delta_{\mathrm{adv}}, \delta_{\mathrm{adv}}^{\mathrm{env}}) from Lemmas 6, 9).

Conclusions. (i) \Pr[T_{\mathrm{detect}} > T_{\beta}] \leq \beta. (ii) On \{T_{\mathrm{detect}} \leq T_{\beta}\}, h_{\mathrm{detect}}(\theta) := \sup_s \varepsilon_{\mathrm{gap}}(s) \leq h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}. (iii) \Pr[T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \geq 1 - \beta_{\mathrm{clk}} - \beta_{\mathrm{cal}} via (C11.CLK). (iv) Total Layer 2 failure \leq \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}. h_{\mathrm{detect}}(\theta) is intensive in |P| over \mathcal{D}.

Proof. We prove (i)–(iv) in five steps.

Boundary convention. Define t_0 as the last SPRT exposure step at which the deployment is still within Layer 1’s safe region (equivalently, t_0 = t_0^- in the SPRT exposure clock). T_{\mathrm{detect}} is the number of SPRT steps from t_0 until the SPRT log-likelihood ratio crosses the detection threshold, inclusive of the crossing step.

Step 1 (per-step decomposition + (C11)). Under the boundary convention, set s_n := t_0 + n. By (C11), the per-SPRT-exposure-step gap growth satisfies (\Delta\varepsilon_{\mathrm{gap}})_n \leq \rho_{\mathrm{gap}} uniformly over the admissible adversarial class, so \varepsilon_{\mathrm{gap}}(s_n) \leq \varepsilon_{\mathrm{gap}}(s_0) + n \cdot \rho_{\mathrm{gap}}.

Step 2 (boundary condition). At s_0 = t_0, the deployment is within Layer 1’s safe region, so by Lemma 1 \varepsilon_{\mathrm{gap}}(t_0) \leq h_{\mathrm{static}}(\theta).

Step 3 (rigorous Hoeffding inversion to derive T_{\beta}). By Lemma 4, the SPRT detection time satisfies \Pr[T_{\mathrm{detect}} > t] \leq \exp(-2(t\delta_{*}- A)^2/(tR_{\mathrm{H}}^2)) for t > A/\delta_{*}.

Substep 3a. Suppose T \geq 2A/\delta_{*}. Then T\delta_{*}- A \geq T\delta_{*}/2.

Substep 3b. Squaring: (T\delta_{*}- A)^2 \geq T^2\delta_{*}^2/4.

Substep 3c. Substituting into the Hoeffding exponent: 2(T\delta_{*}- A)^2/(TR_{\mathrm{H}}^2) \geq T\delta_{*}^2/(2R_{\mathrm{H}}^2).

Substep 3d. For exponent \geq \log(1/\beta), need T \geq 2R_{\mathrm{H}}^2\log(1/\beta)/\delta_{*}^2.

Substep 3e. Combining the regime condition (3a) and tail condition (3d): T_{\beta}\;:=\; \max\!\left\{\frac{2A}{\delta_{*}},\; \frac{2R_{\mathrm{H}}^2\log(1/\beta)}{\delta_{*}^2}\right\}. For any T \geq T_{\beta}, \Pr[T_{\mathrm{detect}} > T] \leq \exp(-T\delta_{*}^2/(2R_{\mathrm{H}}^2)) \leq \beta. This proves (i).

Step 4 (supremum bound on the event \{T_{\mathrm{detect}} \leq T_{\beta}\}). Combining Steps 1–2 with the bound T_{\mathrm{detect}} \leq T_{\beta}: h_{\mathrm{detect}}(\theta) \;=\; \sup_{0 \leq n \leq T_{\mathrm{detect}}} \varepsilon_{\mathrm{gap}}(s_n) \;\leq\; h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}. This proves (ii) with probability at least 1 - \beta from Step 3.

Step 5 (cascade-clock event via (C11.CLK)). By (C11.CLK), \Pr[N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}}] \geq 1 - \beta_{\mathrm{clk}}, with audit constraint N_{\mathrm{cascade}}\geq T_{\beta} (certified case \beta_{\mathrm{cal}}= 0; union over conservative-bound certificates \beta_{\mathrm{cal}}\in (0,1) empirical case). Thus: \Pr[T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \;\geq\; \Pr[T_{\beta}\leq N_{\mathrm{cascade}}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}})] \;\geq\; 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}. This proves (iii). Union bound on the two failure modes (detection-tail \leq \beta, cascade-clock \leq \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}) gives Layer 2 failure \leq \beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}, proving (iv).

Intensivity. Each constant is deployment-class: h_{\mathrm{static}} intensive by Lemma 1; \rho_{\mathrm{gap}} intensive by (C11); T_{\beta}= \max\{2A/ \delta_{*}, 2R_{\mathrm{H}}^2\log(1/\beta)/\delta_{*}^2\} intensive because A, \beta are monitor-design parameters, R_{\mathrm{H}}= 2B is deployment-class via (C5.HOEFF), \delta_{*} is deployment-class via (C5.IID) post-clipping drift floor (rooted in Lemmas 5b/5e); N_{\mathrm{cascade}}, \beta_{\mathrm{clk}}, \beta_{\mathrm{cal}} are deployment-class via (C11.CLK). Therefore h_{\mathrm{detect}} is intensive in |P| over \mathcal{D}.

Two roles of cascade time. T_{\beta} is the integration horizon for \rho_{\mathrm{gap}} (a legitimate upper bound on T_{\mathrm{detect}} from Lemma 4’s Hoeffding form), measured in SPRT exposure-event count. Separately, \tau_{\mathrm{meta}} \leq T_{\mathrm{cascade}} is a lower bound on wall-clock cascade time from . (C11.CLK) maps the wall-clock floor \tau_{\mathrm{meta}} to an event-count floor N_{\mathrm{events}}(\tau_{\mathrm{meta}}) \geq N_{\mathrm{cascade}} with probability \geq 1 - \beta_{\mathrm{clk}}, supplying the comparable-clock chain T_{\beta}\leq N_{\mathrm{cascade}}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}). The lead-time-before- cascade guarantee is therefore in event-count form: T_{\beta}\leq N_{\mathrm{events}}(\tau_{\mathrm{meta}}) with probability \geq 1 - \beta_{\mathrm{clk}}- \beta_{\mathrm{cal}}.

Range-width discipline. B is the symmetric clip radius (\ell_n \in [-B, B] post-clipping); R_{\mathrm{H}}= 2B is the Hoeffding range width matching Lemma 4’s R = b - a convention. Concretely, T_{\beta}= \max\{2A/\delta_{*}, 8B^2\log(1/\beta)/\delta_{*}^2\}.

Sign-direction discipline (operator certification). All T_{\beta}-input certificates point conservatively: B upper, \delta_{*} lower, N_{\mathrm{cascade}} lower, each with calibration-failure probability \beta_{\mathrm{cal}} (or \beta_{\mathrm{cal}}= 0 in the certified-conservative case).

Total Layer 2 failure. Combining Lemma 6 with \mathrm{Lip}(g) \leq K_{\mathrm{Lip}} from (C6): |g(T) - g(P)|(s) \;\leq\; K_{\mathrm{Lip}} \cdot \bigl(h_{\mathrm{static}}(\theta) + \rho_{\mathrm{gap}}\cdot T_{\beta}\bigr) with probability at least 1 - (\beta + \beta_{\mathrm{clk}}+ \beta_{\mathrm{cal}}).

(C11) is new content. ’s “T failure modes” remark notes adequacy/failure modes of the operational truth T but does not establish a formal per-step gap-growth-rate bound. (C11) is a Paper 10 operational assumption with empirical testability via Audit 7 (§8.2).

Notation reconciliation

This appendix collects the full notation reconciliation table. The source papers in the GFM sequence use different conventions for the same underlying quantities; this paper standardizes on a single set of symbols across the composition. Symbols introduced by this paper (rather than inherited from a source paper) are marked New in column 2.

This paper	Source paper	Meaning
This paper	Source paper	Meaning
{\mathop{\mathrm{vol}}_{\mathrm{P}}}		Possessed-capability volume
{\mathop{\mathrm{vol}}_{\mathrm{R}}}	{\mathop{\mathrm{vol}}_{\mathrm{R}}}	Realized capability volume (latent)
{\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]}	{\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]}	Window-active realized volume
{\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}	{\mathop{\mathrm{vol}}_{\mathrm{R}}}^{\mathrm{lower}}	Sacrifice-derived lower bound
\beta^{\mathrm{lower}}	\beta^{\mathrm{lower}}	B-to-C ratio
\mathrm{HHI}	\mathrm{HHI}	Trade-flow Herfindahl index
V_{\gamma}	V^\gamma	Discounted value function
r_{\mathrm{ext}}	r_{\mathrm{ext}}	Cross-substrate cooperative novelty rate
m_{\mathrm{eff}}	m_{\mathrm{eff}}	Nominal effective substrate count
\Delta_0	\Delta_0	Immediate {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change from domination
\Delta r_K	\Delta r_K	Strategy-dependent internal-rate gain
L	L	Lyapunov function on world-model error
\hat{W}	\hat{W}	World-model state
\rho_{\min}^{\mathrm{cross}}	\rho_{\min}^{\mathrm{cross}}	Cross-substrate redundancy minimum
r_{\mathrm{sub}}	r_{\mathrm{sub}}	Subsumption frequency
r_{\mathrm{S}}, r_{\mathrm{W}}	r_S, r_W	Self-correction, error-proportional rates
\tau_{\mathrm{meta}}		Metastable lifetime / cascade-time lower bound
\mathcal{W}, \mathcal{L}	\mathcal{W}, \mathcal{L}	Algorithmic witness, verification ledger
\mathrm{Commit}		Pedersen commitment
\varepsilon_{\mathrm{gap}}	\varepsilon_{\mathrm{gap}}	Total proxy-truth gap
\varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}	\varepsilon_{\mathrm{gap}}^{\mathrm{nonres}}	Non-residual gap on P^{\mathrm{act}}
\varepsilon_{\mathrm{floor}}^{\mathrm{res}}	\varepsilon_{\mathrm{floor}}^{\mathrm{res}}	Residual floor
P, T	P, T	Proxy ({\mathop{\mathrm{vol}}_{\mathrm{P}}}) / operational truth ({\mathop{\mathrm{vol}}_{\mathrm{R}}}^{[W]})
m_{\mathrm{eff}}^{\mathrm{indep}}	New	Failure-correlation-independent substrate count
\Delta_{\mathrm{div}}	New	Substrate-diversification advantage
T_{\mathrm{adv}}	New	First-shock arrival time
\ell_D^{\max}, \ell_{\mathrm{div}}^{\max}	New	Worst-case shock-loss fractions
A_{\mathrm{adv}}	New	Channel-restricted adversarial class
A_{\mathrm{adv}}^{\mathrm{env}}	New	Environment-side detection class; see Lemma 9
V_{\mathrm{env}}	New	Environment-side observable variable enumeration; (C12.PUB)
\epsilon_{\mathrm{env}}	New	Environment-support regularization constant; (C5.SUPP)
\delta_{\mathrm{adv}}	New	SPRT raw KL floor on A_{\mathrm{adv}}; Lemma 6
\delta_{\mathrm{adv}}^{\mathrm{env}}	New	Environment-side raw KL floor; Lemma 9
\delta_{*}	New	Post-clipping union-class drift floor; Lemma 10, (C5.IID)
B	New	Symmetric LLR clip radius; (C5.HOEFF)
R_{\mathrm{H}}= 2B	New	Hoeffding range width; (C5.HOEFF), Lemma 4
K_{\mathrm{ch}}	New	Monitored-channel cardinality; (C5.MULT)
K_{\mathrm{ch}}^{\mathrm{multi}}	New	Multinomial-category cardinality bound for Channels 3, 4; (C5.MULT)
A = \log\!\frac{1-\beta}{\alpha}	New	SPRT threshold; Lemma 10
T_{\beta}	New	SPRT high-probability detection quantile; Lemma 10
\rho_{\mathrm{gap}}	New	Per-step gap-growth rate; (C11)
N_{\mathrm{cascade}}	New	Event-throughput floor before cascade; (C11.CLK)
N_{\mathrm{events}}(\cdot)	New	SPRT exposure event count up to wall-clock time
\beta_{\mathrm{clk}}	New	Clock-failure probability; (C11.CLK)
\beta_{\mathrm{cal}}	New	Calibration-uncertainty probability; (C11.CLK) empirical variant
\kappa	New	2\delta/R_{\mathrm{H}}^2 Hoeffding-asymptotic constant; Lemma 4
I_1, \ldots, I_{11}	New	Operational invariants

Goal-Frontier Maximization: A Provably Safe Regime for Capability-Unbounded Deployment

Introduction

Setting

The result

What existing paradigms cannot deliver

The composition approach

The three-layer deployment claim

Substrate identification and cooperative anchoring

Scope, residuals, and what we do not claim

Paper roadmap

Composition setup

Source-paper contributions as building blocks

The intensive-vs-extensive distinction

Notation reconciliation

Composition challenges

What this paper supplies on top of source-paper machinery

The eleven operational invariants

Theorem-proof invariants vs. deployment-regime invariants.

Dynamical regime invariants (I_1, I_2, I_3)

Empirical observation invariants (I_4, I_5)

Substrate-structure invariants (I_6, I_7, I_8)

Cooperative-anchoring evasion bounds (I_9, I_{10}, I_{11})

Summary table

Conjecture-dependence

Intensive composition lemmas

Lemma 1 (Intensive composition under co-evolution)

Lemma 2 (Lyapunov-to-Goodhart bridge)

Lemma 3 (HHI-pressure surrogate adequacy)

The reverse-threshold implication required by Theorem 1.

Note on the Conjecture-dependence audit (§3.6).

Lemma 4 (SPRT detection lead-time tail bound)

The Lemma 5 family: substrate-targeting adversarial events

Lemma 5a (Substrate floor)

Lemma 5b (Channel-restricted detection KL floor)

Lemma 5c (Minimax static tightening)

Deployment-class breadth of (C3).

Lemma 5d (Lead-time tail composition)

Lemma 5e (Environment-side witness extension)

Lemma 6 (h_{\mathrm{detect}} intensivity)

Lemma family composition

Main theorem: Conditional Deployment Safety

Theorem statement

Proof of Layer 1 (Static safe region)

Proof of Layer 2 (Detection-and-correction)

Layer 3 (Acknowledged residuals)

What the theorem establishes

What the theorem does not establish

Companion-paper structural conditions.

Scope restrictions (excluded by definition).

Operational concerns (not theorem questions).

Substrate identification and cooperative anchoring

The canonical tripartite substrate identification

Why “two LLMs and a human” fails.

The cooperative-anchoring property

Why this is structural, not contingent.

Three evasions and the additional invariants

Asymmetric capture and I_9

Cooperative forking and I_{10}

Time-asymmetry capture and I_{11}

The inner-alignment condition (C4)

What (C4) requires: causally grounded cooperative-outcome value.

Distinguished from substrate-aware V_{\gamma} optimization.

Why (C4) is nontrivial under standard training.

Falsification: ledger-observable signatures of (C4) failure.

The destabilizing-cascade mirror failure mode

Cascade timescale: at least \tau_{\mathrm{meta}} to intervene.

What this section establishes vs. what it does not

Operationalization of the Concentration-Gap Conjecture

The dependency, before the companion paper

The companion-paper discharge

What remains conditional

Conditional structure of Theorem 1

Operational conditions, verifiable from deployment state.

Structural conditions, verifiable from verification-protocol design.

Scope restrictions.

Deployment tooling specification

Verification infrastructure

Ledger nodes per substrate.

Algorithmic witnesses per safety dimension.

Environment-side substrate-exclusive witnesses (I_8).