The goal-frontier maximization (GFM) sequence establishes a computable alignment objective with structural safety properties and endogenous anti-monopolar pressure. The preceding papers treat agent evaluation as a statistical signal-processing problem: trust-weighted votes, correlational contraction detection, max-aggregated risk reports. This paper introduces a structural causal model for capability dynamics that upgrades scorpion detection and risk evaluation from statistical to causal identification, and sharpens the remaining agent-evaluation mechanisms, under explicitly stated structural assumptions. Do-calculus contraction attribution sharpens scorpion detection under additive-SCM, causal-sufficiency, and exogenous-orthogonality conditions. Pathway-conditional residuals formalize risk-trust with L^2 convergence guarantees. A log-linear opinion pool replaces max-aggregation without discarding independence information. A probabilistic dependency-risk model sharpens the minimax substrate bound into a per-step expected cost that compounds with growth, gated by an adversarial balance condition. The common thread: a GFM actor needs a causal model of other agents, and the SCM provides the structural backbone that each mechanism draws on.
Goal-frontier maximization (GFM) proposes alignment through a single
objective: maximize the volume of the jointly achievable capability
space {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G)
This paper is implementation machinery. Papers 1–3 establish what is
maximized ({\mathop{\mathrm{vol}}_{\mathrm{P}}}(G)),
prove structural safety properties (self-balancing, anti-monopolar
pressure, structural alignment), and characterize the conditions under
which those properties hold
The preceding papers treat every agent-evaluation question as a
statistical signal-processing problem. The foundational paper’s scorpion
detection
This paper introduces a structural causal model for capability dynamics and uses it to upgrade scorpion detection and risk evaluation from statistical to causal identification and to sharpen the remaining agent-evaluation mechanisms. The common thread: a GFM agent needs a causal model of other agents to make good decisions about their harm (who caused this contraction?), credibility (is this risk claim structurally sound?), and adversarial exposure (what substrate-correlated threats do we face?).
A structural causal model for capability dynamics with do-calculus contraction attribution, showing a conditional L^2 convergence advantage over correlational detection under causal sufficiency, additive-SCM structure, and exogenous orthogonality (Section 2).
L^2 EWMA dynamics for risk-trust {T^{\mathrm{risk}}}_j over pathway-conditional structural verification residuals, and a prior-corrected trust-weighted log-linear opinion pool that recovers Bayesian updating in the no-tempering limit, replacing the max aggregation operator, with a two-gate architecture separating attention allocation from action decisions via the actor’s own forward causal risk evaluation (Section 3).
Causal scorpion detection under a Gaussian model, committed to a sequential probability ratio test on least-favorable simple hypotheses with Type I/II error control, and a high-probability bound on non-stationary scorpion evasion bandwidth under a stationary-epoch model (Section 4).
A probabilistic dependency-risk model gated by an adversarial balance condition c > \max_i (f_i \cdot c_i) that admits contagion dynamics, with per-step expected costs that compound at the same rate as growth and a quantitative substrate-count lower bound m > c_{\max}/c that sharpens the minimax m \geq 2 (Section 5).
The foundational paper’s scorpion detection (Proposition 2 of
Definition 1 (Capability Dynamics SCM). A structural causal model for capability dynamics is a tuple \mathcal{M}= (U, V, F, P(U)) where:
U: exogenous variables representing agent dispositions, environmental conditions, and stochastic factors not determined by the model.
V = \{G_t, \pi_t^{(1)}, \ldots, \pi_t^{(n)}, \Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}_t\}: endogenous variables representing the capability poset state, each agent’s action at time t, and the resulting {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change.
F: structural equations mapping parent variables to children in the causal directed acyclic graph (DAG). The DAG encodes which agents’ actions causally influence which capability changes.
P(U): prior distribution over exogenous variables.
When the variance-reduction results of this paper are invoked (Proposition 1(c) and downstream), we additionally assume exogenous orthogonality: the exogenous noise components entering different agents’ structural equations, and the residual h, are mutually independent conditional on G_t. That is, if U_j \subseteq U are the exogenous variables appearing in a_j’s structural equation, U_k \subseteq U those appearing in a_k’s, and U_h \subseteq U those appearing in the residual h(G_t, U) (), then U_j \perp\!\!\!\perp U_k \mid G_t for j \neq k and U_j \perp\!\!\!\perp U_h \mid G_t for all j. This rules out shared exogenous shocks that would induce covariance between agents’ contributions (or between an agent’s contribution and the environment residual) even under a correct causal DAG.
The GFM actor’s world model \mathcal{W} (Definition 1 of
The SCM extends \mathcal{W} by making the causal structure explicit. In the foundational paper, \mathcal{W} is an opaque prediction model: given actions, it predicts {\mathop{\mathrm{vol}}_{\mathrm{P}}}-changes. The SCM decomposes predictions into causal pathways, enabling counterfactual queries: “what would have happened if agent a_j had acted differently?”
Definition 1 specifies a single-step SCM: the endogenous variables \{G_t, \pi_t^{(1)}, \ldots, \pi_t^{(n)}, \Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}_t\} describe one time slice, and the structural equations F map actions at step t to the {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change at step t. The do-intervention \mathrm{do}(\pi_j = \cdot) operates within this single-step structure: “other variables respond according to F” means other endogenous variables in the same time slice are recomputed under the intervened structural equations, not that the system evolves forward in time. Multi-step effects (an action at t that causes a cascade at t+1) are captured by instantiating a new single-step SCM at each subsequent time step with an updated poset state G_{t+1}; the causal attribution \mathrm{CA}(a_j, t) (Definition 2) is the single-step effect.
Definition 2 (Causal Contraction Attribution).
For an observed {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change
\Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_t) at time t, the causal contribution of agent
a_j is: \begin{equation}
\mathrm{CA}(a_j, t) = \Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}\bigl(G_t \mid \mathrm{do}(\pi_j =
\pi_j^{\mathrm{obs}})\bigr)
- \Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}\bigl(G_t \mid
\mathrm{do}(\pi_j = \mathrm{skip})\bigr)
\end{equation} where \mathrm{do}(\pi_j
= \cdot) denotes the do-calculus intervention
The causal attribution \mathrm{CA}(a_j,
t) isolates a_j’s contribution
from confounders: simultaneous actions by other agents, environmental
changes, and background dynamics that the correlational signal in
Proposition 2 of
The one-at-a-time \mathrm{CA}(a_j,
t) of sums
to the total {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change
only when the SCM’s structural equation for \Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}
is additive in agent actions: there exist functions g_j and a residual h such that \begin{equation}
\Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_t)
= \sum_j g_j\bigl(\pi_j^{\mathrm{obs}}, G_t, U\bigr)
+ h(G_t, U)
\end{equation} where h does not
depend on any \pi_j. Under this
additive-SCM assumption, the one-at-a-time intervention \mathrm{do}(\pi_j = \pi_j^{\mathrm{obs}})
vs. \mathrm{do}(\pi_j = \mathrm{skip})
isolates g_j exactly, and \begin{equation}
\Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_t) = \sum_j
\mathrm{CA}(a_j, t) + h(G_t, U).
\end{equation} The additive-SCM assumption is stronger than (and
is not implied by) conditional independence of agent actions given G_t: two independently drawn actions can
still interact multiplicatively in a non-additive structural equation
such as \Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}= \pi_1 \cdot \pi_2, in which
case the one-at-a-time \mathrm{CA} does
not sum to the total. When interaction effects are present and a
non-additive SCM is required, an exact decomposition needs a
Shapley-style attribution rule that averages \mathrm{CA} over all orderings of agent
inclusion
Proposition 1 (Convergence Advantage of Causal Attribution). Let a_j be an agent with expected causal contraction \mu_j = \mathbb{E}[\mathrm{CA}(a_j, t)] < 0 (a scorpion). Assume the observation sequences for both detectors are i.i.d. within a stationary strategy epoch, or weakly dependent with summable autocovariance. Under the SCM \mathcal{M}, the following information-theoretic sample-complexity comparison holds (independent of the specific sequential test used; see Section 4 for the SPRT-specific operational rates):
The correlational detector (Proposition 2(a) of
The causal detector using \mathrm{CA}(a_j, t) has sample complexity \Omega(\sigma_{\mathrm{do}}^2 / \mu_j^2), where \sigma_{\mathrm{do}}^2 is the variance of the causal attribution signal after conditioning on the do-intervention. The ratio \sigma_{\mathrm{do}}^2 / \sigma^2 \leq 1 (under the conditions of part (c)) is the structural advantage factor.
Under causal sufficiency (the SCM \mathcal{M} captures all common causes of a_j’s actions and the confounder signals), the additive-SCM assumption (), and exogenous orthogonality (Definition 1), we have \mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{confounders} \mid G_t) = 0 and therefore \sigma^2 - \sigma_{\mathrm{do}}^2 = \mathrm{Var}(\text{confounders} \mid G_t) \geq 0, so \sigma_{\mathrm{do}}^2 \leq \sigma^2, with strict inequality whenever confounders contribute nonzero variance. Without causal sufficiency (or without the additive-SCM structure), the variance identity gives \sigma^2 - \sigma_{\mathrm{do}}^2 = \mathrm{Var}(\text{confounders} \mid G_t) + 2\,\mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{confounders} \mid G_t), whose sign depends on the covariance term. The inequality \sigma_{\mathrm{do}}^2 \leq \sigma^2 therefore requires the full triple (causal sufficiency, additive SCM, exogenous orthogonality), or the weaker condition \mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{confounders} \mid G_t) \geq -\tfrac{1}{2} \mathrm{Var}(\text{confounders} \mid G_t). Under a general (non-sufficient) SCM with adversarial confounder correlation, causal attribution can in principle achieve worse convergence than the correlational detector; characterizing populations in which this occurs is outside the scope of this paper.
Proof. (a) The correlational detector’s observable signal is
the trust-weighted aggregate \hat{\Delta}(\pi_t) = \sum_{k \in \mathcal{O}} T_k
\cdot R_k(\pi_t) (Equation 10 of
(b) The causal detector evaluates \mathrm{CA}(a_j, t) directly by comparing the observed {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change against the counterfactual under \mathrm{do}(\pi_j = \mathrm{skip}). The variance of this signal is \sigma_{\mathrm{do}}^2 = \mathrm{Var}(\mathrm{CA}(a_j, t)), which excludes the confounding variance.
(c) The three assumptions do distinct work; we separate them.
Step 1: Additive SCM \Rightarrow \mathrm{CA}(a_j) = g_j exactly. Under the additive-SCM assumption (), the continuous {\mathop{\mathrm{vol}}_{\mathrm{P}}}-change decomposes as \Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}_{\mathrm{total}}(t) = \mathrm{CA}(a_j, t) + \mathrm{confounders}(t), where the confounder term aggregates all other agents’ contributions \{g_k\}_{k \neq j} and the residual h(G_t, U), and \mathrm{CA}(a_j, t) = g_j(\pi_j^{\mathrm{obs}}, G_t, U) - g_j(\mathrm{skip}, G_t, U) is exactly a_j’s separable contribution. Without additivity, \mathrm{CA}(a_j) captures only the diagonal term and misses interaction effects. By the variance identity for sums: \sigma^2 = \mathrm{Var}(\mathrm{CA}(a_j)) + \mathrm{Var}(\mathrm{confounders}) + 2\,\mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{confounders}).
Step 2: Causal sufficiency \Rightarrow confounders are SCM-known. Causal sufficiency ensures the DAG \mathcal{M} captures all common causes of a_j’s actions and the confounder signals. Every variable contributing to the covariance term appears explicitly in the structural equations, so the covariance can in principle be computed from the DAG. Without causal sufficiency, latent common causes contribute unmodeled covariance that may have either sign.
Step 3: Exogenous orthogonality \Rightarrow \mathrm{Cov} term vanishes. Under exogenous orthogonality (Definition 1), the noise U_j driving g_j is independent of both the noise \{U_k\}_{k \neq j} driving \{g_k\}_{k \neq j} and the noise U_h driving the residual h, conditional on G_t. Since the confounders term is \sum_{k \neq j} g_k + h, each cross-covariance \mathrm{Cov}(g_j, g_k \mid G_t) = 0 and \mathrm{Cov}(g_j, h \mid G_t) = 0. This gives \mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{confounders} \mid G_t) = 0, and therefore \sigma^2 - \sigma_{\mathrm{do}}^2 = \mathrm{Var}(\mathrm{confounders}) \geq 0 as stated. Without exogenous orthogonality, the covariance term is retained and the inequality \sigma_{\mathrm{do}}^2 \leq \sigma^2 holds only when \mathrm{Var}(\mathrm{confounders}) + 2\,\mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{confounders}) \geq 0; the inequality is strict whenever confounders have positive variance and the covariance term does not dominate.Â
Part (a) of Proposition 1 treats the correlational detector as if \mu_j appeared directly as a component of the mean of the aggregate signal \hat{\Delta}. This is an idealization: the actual mean of \hat{\Delta} is the sum of all agents’ mean effects plus background dynamics, not \mu_j alone. A correlational estimator that recovered \mu_j from \hat{\Delta} would need an additional auxiliary procedure, such as regressing \hat{\Delta} against an indicator for a_j’s action times, which incurs its own bias when confounders are correlated with a_j’s action schedule. The O(\sigma^2 / \mu_j^2) bound therefore describes the best-case correlational detector: one that has access to a_j’s action schedule and uses it as a conditioning variable. A weaker correlational detector that only observes \hat{\Delta} without conditioning has a strictly worse (and generally incomputable) rate. The causal detector does not need this auxiliary procedure because \mathrm{CA}(a_j, t) already isolates a_j’s contribution by construction.
In populations with many simultaneously acting agents, confounding variance can dominate the correlational signal. A single scorpion among 100 simultaneously acting agents contributes \sim1% of the signal variance; the correlational detector needs \sim100\times more observations than the causal detector to achieve the same confidence. The causal detector’s advantage grows linearly with the number of simultaneously acting confounders.
Proposition 1 assumes that the SCM \mathcal{M} is accurate enough to support do-calculus attribution, and that the structural assumptions of part (c) (causal sufficiency and additive-SCM structure) hold. In practice the SCM is learned from observational data, partially specified, and adversarially exposed: agents can strategically align their actions to mask causal structure, and new agents or delegation chains can emerge during the trajectory. A framework that depended on the SCM being perfectly learned would be brittle. Instead, the framework is robust because causal attribution is one channel among several in a multi-channel attribution system, and the system’s overall attribution accuracy degrades gracefully as any single channel’s reliability degrades.
Multi-channel attribution is the standard methodology in adversarial
intrusion analysis and intelligence tradecraft. The Diamond Model of
intrusion analysis
The GFM sequence already supplies several attribution channels, distributed across papers:
Behavioral prediction residual. The trust model
of
Correlational co-occurrence. The trust-weighted
\hat{\Delta} signal of
Capability gating. Papers 2 and 3 of the GFM
sequence
Causal contraction attribution (this paper, Definition 2). Moderately expensive to fake, since the SCM demands counterfactual consistency, and strictly informative under the conditions of Proposition 1(c). Vulnerable to SCM misspecification.
SPRT scorpion detection (this paper, Section 4). Moderately expensive to fake under stationary epochs; vulnerable to fast adaptation as characterized by the evasion bandwidth bound.
Table 1 summarizes the channels by evasion cost and required assumptions.
| Channel | Cost to fake | Paper | Key assumptions |
|---|---|---|---|
| Capability gating | Very high | 2, 3 | Trusted, complete capability graph |
| Causal attribution | Moderate | 4 | Causal sufficiency, additive SCM, exog. orthogonality |
| Behavioral residual | Moderate | 1 | Predictive-model access |
| SPRT detection | Moderate | 4 | Gaussian, stationary epochs |
| Correlational co-occ. | Low | 1 | (None beyond observation) |
An attribution decision should aggregate evidence across channels, weighted by each channel’s calibrated reliability, rather than rely on the output of any single channel. (The aggregation is Bayesian in spirit: each channel contributes evidence toward or against a suspect. This paper does not formalize the joint hypothesis space or channel-dependence model; the full fusion rule is an open direction identified below.) The multi-channel framing has three architectural consequences:
Capability gating is a hard filter, not a prior. An agent that cannot structurally reach capability c_i cannot have caused its contraction, and no amount of causal or correlational evidence should override that. The correct aggregation is a gate-then-aggregate architecture: apply the capability gate first to produce the candidate suspect set, then aggregate the remaining channels over the candidates.
No single channel must be perfect. Each channel has its own reliability, and the aggregate’s accuracy depends on the joint reliability profile rather than on the maximum-reliability channel. This is the property that makes multi-channel attribution robust in the threat-intelligence setting and that makes the GFM actor robust to imperfect structure learning (Section 6).
Channel weights should be calibrated from measurable
proxies, not fixed at design time. The causal channel’s weight
should scale with an SCM-confidence estimate derived from
structure-learning diagnostics; the behavioral channel’s weight should
scale with a population-level reliability measure derived from the
per-agent trust scores T_k of
Paper 4’s contribution to this framework is the causal channel: a principled, do-calculus-based attribution signal with a convergence advantage over the correlational baseline under stated structural conditions. It is not the attribution system, and the paper does not claim that the causal channel is load-bearing. The overall attribution robustness comes from the multi-channel structure, and from capability gating in particular as the structural floor.
The companion paper
Definition 3 (Structural Verification Residual).
For a risk claim (S, \mathcal{P}, |\Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}|, p) from agent a_j (Definition 4 of
Unlike the behavioral prediction residual r_k(t) (Equation 25 of
Definition 4 (Risk-Trust Dynamics). The risk-trust factor {T^{\mathrm{risk}}}_j evolves through an exponentially weighted moving average of squared structural verification residuals: \begin{align} \sigma_j^{\mathrm{risk},2}(t) &= \alpha_{\mathrm{risk}} \cdot \sigma_j^{\mathrm{risk},2}(t-1) + (1 - \alpha_{\mathrm{risk}}) \cdot \|r_j^{\mathrm{risk}}(t)\|^2 \\ {T^{\mathrm{risk}}}_j(t) &= \frac{1}{1 + \beta_{\mathrm{risk}} \cdot \sigma_j^{\mathrm{risk},2}(t)} \end{align} where \alpha_{\mathrm{risk}} \in (0, 1) controls the decay rate (tracking speed vs. stability), \beta_{\mathrm{risk}} > 0 controls the sensitivity (how rapidly trust degrades with inconsistency), and \|\cdot\| is the Euclidean norm on the two-dimensional residual.
The structure mirrors the foundational paper’s behavioral trust
exactly (s 33 and 38 of
New agents enter with \sigma_j^{\mathrm{risk},2}(0) =
\sigma_0^{\mathrm{risk},2} (a prior reflecting baseline
uncertainty about risk-claim quality), producing initial risk-trust
{T^{\mathrm{risk}}}_j(0) = 1/(1 +
\beta_{\mathrm{risk}} \cdot
\sigma_0^{\mathrm{risk},2}). The effective cooling period is
\tau_{\mathrm{risk}} = \lceil \log \delta /
\log
\alpha_{\mathrm{risk}} \rceil (same formula as the behavioral
cooling period, Equation 71 of
Proposition 2 (Risk-Trust L^2 Convergence). Let X_j(t) = \|r_j^{\mathrm{risk}}(t)\|^2 and assume a_j’s risk-claim strategy is stationary with \mu_j^{\mathrm{risk}} = \mathbb{E}[X_j] < \infty and \nu_j^{\mathrm{risk}} = \mathrm{Var}[X_j] < \infty. Then the cumulative inconsistency estimator satisfies: \begin{align} \lim_{t \to \infty} \mathbb{E}\bigl[\sigma_j^{\mathrm{risk},2}(t)\bigr] &= \mu_j^{\mathrm{risk}} \qquad \text{(asymptotic unbiasedness)} \\ \limsup_{t \to \infty} \mathrm{Var}\bigl[\sigma_j^{\mathrm{risk},2}(t)\bigr] &\leq \frac{1 - \alpha_{\mathrm{risk}}} {1 + \alpha_{\mathrm{risk}}} \cdot C_{\mathrm{dep}} \cdot \nu_j^{\mathrm{risk}} \qquad \text{(bounded stationary variance)} \end{align} where C_{\mathrm{dep}} = 1 under i.i.d. squared residuals, and C_{\mathrm{dep}}(\alpha_{\mathrm{risk}}) = 1 + 2 \sum_{h=1}^{\infty} \alpha_{\mathrm{risk}}^h \cdot \mathrm{Corr}(X_j(t), X_j(t+h)) under weak dependence with summable autocovariance. The \alpha_{\mathrm{risk}}^h weights arise from the EWMA’s geometric weighting: the cross-terms in the variance expansion carry the product of the EWMA coefficients at lag h. Since |\mathrm{Corr}| \leq 1 and \sum_h \alpha_{\mathrm{risk}}^h = \alpha_{\mathrm{risk}} / (1 - \alpha_{\mathrm{risk}}), we have the universal bound C_{\mathrm{dep}} \leq (1 + \alpha_{\mathrm{risk}}) / (1 - \alpha_{\mathrm{risk}}). Consequently, the variance estimator \sigma_j^{\mathrm{risk},2}(t) concentrates in L^2 around \mu_j^{\mathrm{risk}}. The risk-trust factor {T^{\mathrm{risk}}}_j(t) concentrates around the nominal fixed point {T^{\mathrm{risk}}}_j^* = 1 / (1 + \beta_{\mathrm{risk}} \mu_j^{\mathrm{risk}}) in distribution (via the continuous mapping theorem), with the caveat that \mathbb{E}[{T^{\mathrm{risk}}}_j(t)] may exceed {T^{\mathrm{risk}}}_j^* by a Jensen bias term of order O(\beta_{\mathrm{risk}}^2 \cdot \mathrm{Var}[\sigma_j^{\mathrm{risk},2}]) due to the concavity of x \mapsto 1/(1 + \beta_{\mathrm{risk}} x). The deviation is controlled by \alpha_{\mathrm{risk}}: as \alpha_{\mathrm{risk}} \to 1, the stationary variance vanishes, the Jensen bias vanishes, and {T^{\mathrm{risk}}}_j(t) \to {T^{\mathrm{risk}}}_j^* in L^2. Structurally accurate reporters (small \mu_j^{\mathrm{risk}}) concentrate around high {T^{\mathrm{risk}}}_j^*; alarmists and chronic underreporters concentrate around low {T^{\mathrm{risk}}}_j^*.
Proof. Unrolling the EWMA recursion gives \sigma_j^{\mathrm{risk},2}(t) = (1 - \alpha_{\mathrm{risk}}) \sum_{s=0}^{t-1} \alpha_{\mathrm{risk}}^s X_j(t-s) + \alpha_{\mathrm{risk}}^t \sigma_j^{\mathrm{risk},2}(0). Taking expectations under stationarity, \mathbb{E}[\sigma_j^{\mathrm{risk},2}(t)] = (1 - \alpha_{\mathrm{risk}}) \sum_{s=0}^{t-1} \alpha_{\mathrm{risk}}^s \mu_j^{\mathrm{risk}} + \alpha_{\mathrm{risk}}^t \sigma_j^{\mathrm{risk},2}(0) \to \mu_j^{\mathrm{risk}} as t \to \infty, establishing . Under the i.i.d. assumption, \begin{align*} \mathrm{Var}[\sigma_j^{\mathrm{risk},2}(t)] &= (1 - \alpha_{\mathrm{risk}})^2 \sum_{s=0}^{t-1} \alpha_{\mathrm{risk}}^{2s} \cdot \nu_j^{\mathrm{risk}} + o(1) \\ &\longrightarrow \frac{(1 - \alpha_{\mathrm{risk}})^2} {1 - \alpha_{\mathrm{risk}}^2} \cdot \nu_j^{\mathrm{risk}} = \frac{1 - \alpha_{\mathrm{risk}}} {1 + \alpha_{\mathrm{risk}}} \cdot \nu_j^{\mathrm{risk}}, \end{align*} giving C_{\mathrm{dep}} = 1 . Under weak dependence with summable autocovariance, the cross-terms in the EWMA variance expansion carry \alpha_{\mathrm{risk}}^h weights from the geometric kernel, giving C_{\mathrm{dep}}(\alpha_{\mathrm{risk}}) = 1 + 2 \sum_{h=1}^{\infty} \alpha_{\mathrm{risk}}^h \cdot \mathrm{Corr}(X_j(t), X_j(t+h)), which is finite because \alpha_{\mathrm{risk}}^h decays geometrically and the autocovariance is summable. The concentration statement for \sigma_j^{\mathrm{risk},2}(t) around \mu_j^{\mathrm{risk}} in L^2 follows directly. For {T^{\mathrm{risk}}}_j(t), the continuous mapping theorem applied to x \mapsto 1 / (1 + \beta_{\mathrm{risk}} x), which is Lipschitz on [0, \infty), gives convergence in distribution to {T^{\mathrm{risk}}}_j^*. The Jensen bias arises because \mathbb{E}[f(X)] \geq f(\mathbb{E}[X]) for concave f; a second-order Taylor expansion gives the bias as O(\beta_{\mathrm{risk}}^2 \cdot \mathrm{Var}[\sigma_j^{\mathrm{risk},2}]), which vanishes as \alpha_{\mathrm{risk}} \to 1.Â
Fixed-\alpha EWMA does not converge
almost surely to a deterministic limit, and for an adversarial-tracking
setting this is the correct behavior. Each new residual contributes
weight (1 - \alpha_{\mathrm{risk}})
regardless of how much history has accumulated, so the estimator never
collapses to a point mass and retains nonzero stationary variance. A
Robbins-Monro stochastic approximation with step size \alpha_t \to 0 satisfying \sum_t \alpha_t = \infty and \sum_t \alpha_t^2 < \infty
The EWMA dynamics converge on structural verification quality, not on outcome correctness. An agent whose risk claims are structurally plausible (the pathway exists in \mathcal{M}, the probability and magnitude estimates are calibrated against the actor’s own model) accumulates low \|r^{\mathrm{risk}}\|^2 and converges to high {T^{\mathrm{risk}}}_j^* regardless of whether the risk materializes. This partially resolves the Wamura problem: risk-trust converges even when the counterfactual is never observed, because convergence depends on structural agreement, not outcome observation. The resolution is partial because an agent whose claims are structurally plausible but whose probability estimates are systematically biased (always 10% higher than the actor’s model) accumulates moderate \|r^{\mathrm{risk}}\|^2 and converges to moderate {T^{\mathrm{risk}}}_j^*, reflecting the bias without resolving it.
The max operator in Equation 7 of
The companion paper
Definition 5 (Log-Odds Risk Aggregation). Let \pi_0 \in (0, 1) be a shared prior on the probability that the exercise pathway will be executed within the planning horizon, and let \ell_0 = \log(\pi_0 / (1 - \pi_0)) be its log-odds. The actor’s self-assessment \mathcal{R}_{\mathrm{self}} and each communicated risk estimate \hat{\mathcal{R}}_j from agent a_j are posterior probabilities derived from the shared prior \pi_0 plus their respective observations (definition update, see above). The aggregated posterior is: \begin{equation} \log \frac{\mathcal{R}_{\mathrm{agg}}}{1 - \mathcal{R}_{\mathrm{agg}}} = \ell_0 + \underbrace{\left(\log \frac{\mathcal{R}_{\mathrm{self}}}{1 - \mathcal{R}_{\mathrm{self}}} - \ell_0\right)}_{\text{actor's log-likelihood-ratio}} + \sum_j {T^{\mathrm{risk}}}_j \cdot \underbrace{\left(\log \frac{\hat{\mathcal{R}}_j}{1 - \hat{\mathcal{R}}_j} - \ell_0\right)}_{\text{reporter $a_j$'s log-likelihood-ratio}} \end{equation} where each parenthesized term is the log-likelihood-ratio contributed by that source beyond the shared prior, obtained by subtracting \ell_0 from their reported posterior log-odds. The \mathcal{R} values are normalized to [0, 1] as probabilities of the exercise pathway being executed within the planning horizon. The shared prior \pi_0 is an operational coordination requirement: every reporter’s posterior must be calibrated against the same \pi_0 for the subtraction \log(\hat{\mathcal{R}}_j / (1 - \hat{\mathcal{R}}_j)) - \ell_0 to correctly extract the log-likelihood ratio. Establishing \pi_0 requires a protocol (e.g., broadcasting it as a system parameter at population initialization), not just a mathematical assumption. Under the neutral bootstrap prior \pi_0 = 0.5 (\ell_0 = 0), the prior-correction terms vanish and reduces to a naive sum of posterior log-odds; under any other shared prior, the correction is required to avoid double-counting the prior.
The log-odds aggregation has three advantages over the max operator:
Agreement strengthens evidence. Multiple independent low-trust agents reporting the same risk produce a collective signal proportional to the number of agreeing reporters, weighted by their individual {T^{\mathrm{risk}}}_j values.
Trust-weighted log-linear opinion pool.
is a trust-tempered
log-linear opinion pool
Graceful degradation. When a single agent has {T^{\mathrm{risk}}}_j \gg \sum_{k \neq j} {T^{\mathrm{risk}}}_k, the aggregation approximates max behavior: the dominant reporter’s assessment controls the aggregate. The max operator emerges as a limiting case, not an imposed design choice.
During bootstrap, all {T^{\mathrm{risk}}}_j values start at the neutral prior. Under log-odds aggregation, n agents reporting the same risk at neutral trust produce a collective signal n \times {T^{\mathrm{risk}}}_{\mathrm{neutral}} times the individual log-odds, proportional to n. Under the max operator, the same n agents produce only {T^{\mathrm{risk}}}_{\mathrm{neutral}} times the maximum individual log-odds, independent of n. The log-odds aggregation therefore learns faster during bootstrap, precisely when the actor’s own model is least reliable and external evidence is most valuable.
The log-odds aggregation amplifies agreement among reporters, which is correct when reporters are independent but exploitable by coordinated conspirators: n agents who agree in advance on a fabricated risk claim produce a collective signal proportional to n, potentially overwhelming the actor’s self-assessment and triggering restriction of a capability that was never actually dangerous. This is the risk-communication analogue of report-bombing: coordinated low-quality reports exploiting an aggregation mechanism designed for independent evidence.
Any fixed correlation-detection heuristic is game-able once the
adversary knows the parameters — a standard observation in adversarial
detection settings — and the ROC trade-off is unfavorable in this domain
because both error types are costly: false positives (restricting a safe
capability) and false negatives (ignoring a real risk) both cause {\mathop{\mathrm{vol}}_{\mathrm{P}}}-contraction.
We therefore adopt the activation energy principle from the foundational
paper
The risk aggregation \mathcal{R}_{\mathrm{agg}} (Definition 5) does not directly trigger restriction. It produces a salience signal: the aggregated risk level determines the priority at which a capability tuple enters the actor’s investigation queue. Action requires passing two independent gates:
Salience gate. The aggregated risk \mathcal{R}_{\mathrm{agg}}(S; a_k, t) exceeds an attention threshold \theta_{\mathrm{risk}}, placing the capability tuple in the investigation queue.
Verification gate. The actor invokes the forward
causal risk-evaluation procedure (Section 4.3 of
Neither gate alone is sufficient. A coordinated report-bomb raises salience (gate 1) but fails independent verification (gate 2) because the fabricated pathway does not exist in the actor’s causal model. A genuine risk reported by a single low-trust agent may fail gate 1 (low salience) but would pass gate 2 if the actor investigated. The two-gate architecture trades a bounded false-negative rate (genuine risks from low-trust reporters may be investigated slowly) for strong robustness against coordinated manipulation (fabricated risks cannot bypass independent verification regardless of how many conspirators report them).
A fixed correlation metric (temporal clustering, behavioral similarity among reporters, trust-history patterns) is game-able by a sufficiently informed adversary: adversaries who know the detection threshold can space their reports, diversify their trust-building histories, and avoid clustering. The two-gate architecture avoids this arms race by not relying on detecting coordination at all. The defense is the actor’s own intelligence applied through the causal model, not a statistical filter on the report stream.
The foundational paper’s Proposition 2
Proposition 3 (Causal Scorpion Detection). Under the SCM \mathcal{M} (Definition 1), a GFM actor can detect scorpion agents through two upgraded channels:
Causal contraction attribution. \mathrm{CA}(a_j, t) (Definition 2) replaces the
correlational \hat{\Delta} signal from
Proposition 2(a) of
The flag condition is instead a sequential probability ratio test
(SPRT)
Deception detection (unchanged). The
behavioral trust factor T_j detects
agents whose reported capability changes diverge from observations,
through the same prediction-residual dynamics as the foundational paper
(s 25, 33, 38 of
Proof. Channel (a): By the SCM, \mathrm{CA}(a_j, t) isolates a_j’s causal contribution from confounders (Definition 2). Within a stationary epoch of a_j’s strategy, \{\mathrm{CA}(a_j, t)\} is i.i.d. with mean \mu_j and variance \sigma_{\mathrm{do}}^2; for a scorpion \mu_j \leq -\epsilon < 0. The auxiliary EWMAs \mu_j^{\mathrm{causal}}(t) and \sigma_j^{\mathrm{causal},2}(t) inherit the L^2 guarantees of Proposition 2: asymptotically unbiased estimators of \mu_j and \sigma_{\mathrm{do}}^2 with bounded stationary variance. They are not used for the detection decision, so we do not require almost-sure convergence.
The SPRT is applied to the least-favorable simple pair H_0: \mu_j = 0 versus H_1: \mu_j = -\epsilon (see channel (a) above for the reduction from composite to simple hypotheses). Under the Gaussian model, the per-observation expected log-likelihood ratio under the true mean \mu_j is \mathbb{E}_{\mu_j}[\log(p_{H_1}/p_{H_0})] = \epsilon(|\mu_j| - \epsilon/2) / \sigma_{\mathrm{do}}^2. By Wald’s identity, the expected detection time under true \mu_j is \mathbb{E}[\tau_{\mathrm{detect}}] \approx \log((1 - \beta_{\mathrm{err}}) / \alpha_{\mathrm{err}}) \cdot \sigma_{\mathrm{do}}^2 / [\epsilon(|\mu_j| - \epsilon/2)], which at the design point |\mu_j| = \epsilon reduces to O(\sigma_{\mathrm{do}}^2 / \epsilon^2) and for |\mu_j| \gg \epsilon improves to O(\sigma_{\mathrm{do}}^2 / (\epsilon |\mu_j|)). The SPRT Type I error for the simple pair is bounded by \alpha_{\mathrm{err}} and Type II by \beta_{\mathrm{err}}; the least-favorable argument ensures these bounds hold for the composite problem.
Channel (b): The behavioral trust dynamics (Appendix B of
The foundational paper notes that “a non-stationary scorpion that
adapts its strategy in response to detection pressure can evade both
channels indefinitely” (Proposition 2 of
Proposition 4 (Evasion Bandwidth). A scorpion a_j with per-epoch mean causal contraction |\mu_j| that changes strategy every \tau_{\mathrm{adapt}} steps produces at most \tau_{\mathrm{adapt}} observations per strategy epoch. Let \bar{\tau}(\mu_j) = O\bigl(\sigma_{\mathrm{do}}^2 / [\epsilon(|\mu_j| - \epsilon/2)]\bigr) be the expected detection time per epoch from the SPRT under the Gaussian model with design parameter \epsilon (Proposition 3). Then:
If \tau_{\mathrm{adapt}} \geq k \cdot \bar{\tau}(\mu_j) for some k > 1, the scorpion is detected within each epoch with probability at least 1 - 1/k (by Markov’s inequality on the SPRT stopping time).
If \tau_{\mathrm{adapt}} < \bar{\tau}(\mu_j), the expected accumulated evidence at the epoch boundary is below the SPRT threshold. While this does not by itself guarantee evasion with any specific probability, it establishes that the expected number of observations is insufficient for detection, so the SPRT is unlikely to trigger within the epoch.
Proof. Within each epoch of length \tau_{\mathrm{adapt}}, the scorpion’s strategy is stationary and the causal-attribution sequence is i.i.d. \mathcal{N}(\mu_j, \sigma_{\mathrm{do}}^2). The SPRT statistic accumulates log-likelihood-ratio evidence at expected rate \epsilon(|\mu_j| - \epsilon/2) / \sigma_{\mathrm{do}}^2 per observation under the Gaussian model (Proposition 3), giving expected detection time \bar{\tau}(\mu_j) = \sigma_{\mathrm{do}}^2 \cdot \log((1 - \beta_{\mathrm{err}}) / \alpha_{\mathrm{err}}) / [\epsilon(|\mu_j| - \epsilon/2)].
(a) By Markov’s inequality, P(\tau_{\mathrm{detect}} > k \bar{\tau}) \leq 1/k for any k > 1. When \tau_{\mathrm{adapt}} \geq k \bar{\tau}(\mu_j), the epoch provides enough observations that the SPRT crosses the detection threshold with probability at least 1 - 1/k.
(b) An epoch boundary resets the SPRT statistic (the hypotheses are epoch-local). If \tau_{\mathrm{adapt}} < \bar{\tau}(\mu_j), the expected accumulated evidence at the epoch boundary is below the SPRT threshold. Since the number of observations is less than the expected detection time, the SPRT is unlikely to trigger within the epoch: the evidence accumulation rate is insufficient for reliable detection.Â
The evasion bandwidth \tau_{\mathrm{adapt}}^* = O\bigl(\sigma_{\mathrm{do}}^2 / [\epsilon(|\mu_j| - \epsilon/2)]\bigr) is the characteristic timescale: a scorpion that adapts faster than \tau_{\mathrm{adapt}}^* evades with non-negligible probability per epoch, while one that adapts slower than k \cdot \tau_{\mathrm{adapt}}^* is detected with probability at least 1 - 1/k. A fast-adapting scorpion (\tau_{\mathrm{adapt}} \ll \tau_{\mathrm{adapt}}^*) evades but is constrained to small contractions per epoch (otherwise \mu_j is large and \tau_{\mathrm{adapt}}^* shrinks). A slow-adapting scorpion (\tau_{\mathrm{adapt}} \gg \tau_{\mathrm{adapt}}^*) is detected within each epoch with high probability. The product \mu_j \cdot \tau_{\mathrm{adapt}} (total contraction per epoch) is bounded: a scorpion can either cause large damage slowly (and be detected) or small damage quickly (and evade), but not large damage quickly.
The time-averaged undetected contraction rate for an evading scorpion is bounded by |\mu_j|: within each epoch the per-step contraction is |\mu_j|, and the evasion constraint \tau_{\mathrm{adapt}} < \bar{\tau}(\mu_j) bounds the epoch length. The appendix proof (§8.1) establishes this via the SPRT accumulation rate. The key trade-off is controlled by the SPRT design parameter \epsilon: a scorpion whose per-step contraction satisfies |\mu_j| \leq \epsilon requires at least \bar{\tau}(\epsilon) = O(\sigma_{\mathrm{do}}^2 / \epsilon^2) observations to detect (the worst case at the design point), while a scorpion with |\mu_j| \gg \epsilon is detected faster in proportion to |\mu_j| / \epsilon. The SPRT design parameter \epsilon therefore sets the resolution: reducing \epsilon detects smaller contractions but requires more observations per epoch. The bound tightens as the causal model improves (\sigma_{\mathrm{do}} decreases) and as the scorpion’s adaptation budget \tau_{\mathrm{adapt}} grows: a scorpion that adapts frequently (small \tau_{\mathrm{adapt}}) can only afford very small per-step contractions, while a scorpion that adapts slowly can sustain larger contractions per step but for correspondingly fewer steps before detection.
The companion paper
The section has three parts. First, a probabilistic substrate threat model (Section 5.1) that decomposes dependency risk into event probability, target selection, and cascade effects. Second, an expected per-step risk comparison (Proposition 5) that sharpens the minimax bound into an adversarial balance condition c > \max_i (f_i \cdot c_i), a single inequality that gates when diversification strictly beats concentration and that explicitly admits contagion dynamics (pandemics, cascading failures, memetic attacks) as the motivating threat class. Third, a risk-adjusted trajectory analysis (Section 5.3) showing that the per-step cost compounds at the same discount rate as growth, so the diversification-versus-concentration decision becomes a direct comparison of two sums in the value function.
Definition 6 (Substrate Threat Model). A substrate threat model is a distribution over adversarial events, parametrized by:
p_{\mathrm{adv}}: per-step probability that a substrate-targeting event occurs.
q_i = p(\mathcal{S}_i \text{ targeted} \mid \text{event occurs}): the conditional probability that substrate class \mathcal{S}_i is targeted, given that an event occurs. For an uninformed adversary, q_i = 1/m (uniform); for an optimal adversary, q_i = \mathbf{1}[i = \arg\max_j (|\Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_j)| \cdot c_j)], i.e., the adversary targets the substrate with maximal effective damage (volume contribution times cascade fraction).
c_i \in [0, 1]: the cascade fraction for substrate \mathcal{S}_i, defined as the expected fraction of \mathcal{S}_i’s total volume lost when a substrate-targeting event hits \mathcal{S}_i. “Cascade” refers to the propagation dynamics (within-substrate impairment spreading from the initial strike through the SCM’s structural equations) and “fraction” constrains c_i to [0, 1] as a proportion of \mathcal{S}_i’s volume. The cascade fraction aggregates the initial impairment and all subsequent within-substrate propagation: if a direct attack impairs 30% of the substrate and triggers a contagion that takes out another 40%, then c_i = 0.7. The homogeneous single-substrate baseline uses a single c in place of the per-substrate c_i.
In this threat model |\Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_i)| denotes
substrate \mathcal{S}_i’s total
contribution to {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G)
(i.e., the maximum possible contraction if \mathcal{S}_i were fully eliminated, not the
expected contraction from a targeting event). The expected contraction
is |\Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_i)| \cdot c_i:
substrate size times cascade fraction, with the two quantities varied
independently. This convention matches Proposition 4 of
The threat model decomposes dependency risk into three components:
occurrence probability, target selection, and cascading effects. The
minimax analysis of
The companion paper
Proposition 5 (Expected Risk Comparison). Under
the substrate threat model (Definition 6), the
expected per-step dependency risk for each strategy is: \begin{align}
\mathbb{E}[\mathcal{R}_{\mathrm{dep}}^{\mathrm{single}}]
&= p_{\mathrm{adv}} \cdot
{\mathop{\mathrm{vol}}_{\mathrm{P}}}(G) \cdot c \\
\mathbb{E}[\mathcal{R}_{\mathrm{dep}}^{\mathrm{mixed}}]
&= p_{\mathrm{adv}} \cdot \sum_i q_i \cdot
|\Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_i)| \cdot
c_i
\end{align} where c and c_i are the cascade fractions from
Definition 6. For a single-substrate
population (m = 1), any
substrate-targeting event affects a fraction c of the entire population. For a
cross-substrate population (m \geq 2):
\begin{equation}
\mathbb{E}[\mathcal{R}_{\mathrm{dep}}^{\mathrm{single}}] -
\mathbb{E}[\mathcal{R}_{\mathrm{dep}}^{\mathrm{mixed}}]
= p_{\mathrm{adv}} \cdot
\left({\mathop{\mathrm{vol}}_{\mathrm{P}}}(G) \cdot c
- \sum_i q_i \cdot |\Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_i)| \cdot c_i\right)
\end{equation} Under worst-case adversarial
targeting (q_i = \mathbf{1}[i =
\arg\max_j (|\Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_j)|
\cdot c_j)]), this gap is positive whenever p_{\mathrm{adv}} > 0, the population
satisfies substrate capability non-redundancy (Proposition 4 of
Proof. Under an adversarial target-selection model, q_i = \mathbf{1}[i = i^*] where i^* = \arg\max_j (|\Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_j)| \cdot c_j).
Substituting into gives \mathbb{E}[\mathcal{R}_{\mathrm{dep}}^{\mathrm{mixed}}]
= p_{\mathrm{adv}} \cdot \max_i \bigl(|\Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}}(\mathcal{S}_i)|
\cdot c_i\bigr)
= p_{\mathrm{adv}} \cdot {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G)
\cdot \max_i (f_i \cdot c_i), while \mathbb{E}[\mathcal{R}_{\mathrm{dep}}^{\mathrm{single}}]
=
p_{\mathrm{adv}} \cdot {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G) \cdot
c. The gap is positive iff c >
\max_i (f_i \cdot c_i). Substrate capability non-redundancy
(Proposition 4 of
The adversarial balance condition admits contagions by construction,
which is essential because contagion-driven threats (pandemics,
cascading infrastructure failures, financial contagion, memetic attacks)
are precisely the class where per-substrate cascade fractions can
approach saturation (c_i \to 1) on the
targeted substrate. Smaller substrates are expected to reach sigmoid
saturation faster than larger ones because contagion propagation time
scales sub-linearly with population size in standard epidemiological
models
Two instructive limits bracket the regime of interest:
Full-saturation contagion (c_i = c = 1): the condition reduces to 1 > f_{\max}, which is exactly the
substrate-capability-non-redundancy assumption from
Unbalanced partition under contagion amplification (c_i > c, f_{\max} large): the condition can fail. For example, c = 0.5, c_{\max} = 1.0, f_{\max} = 0.6 gives 0.5 \not> 0.6 and the gap reverses. In this regime diversification provides no dependency-risk advantage over the single-substrate baseline: the adversary still targets the dominant substrate, and the contagion saturates it more thoroughly than in the homogeneous case. The trust model should detect this regime and avoid unbalanced partitions when contagion risks dominate; the bound flags the edge case rather than assuming it away.
The analysis assumes cascade confinement: c_i measures only the fraction of substrate \mathcal{S}_i lost when \mathcal{S}_i is targeted, not propagation into other substrates. Cross-substrate cascades (a memetic attack on \mathcal{S}_1 that triggers a cascade in \mathcal{S}_2 through inter-substrate communication channels) are outside the scope of the current bound and require a separate treatment beyond the current SCM.
Corollary 1 (Balanced-Partition Substrate Count). Under a balanced substrate partition with f_i = 1/m for all i, the adversarial balance condition of Proposition 5 is equivalent to \begin{equation} m > \frac{c_{\max}}{c}, \end{equation} where c_{\max} = \max_i c_i is the worst-case per-substrate cascade fraction.
Proof. With f_i = 1/m, the adversarial balance condition c > \max_i (f_i \cdot c_i) reduces to c > \max_i (c_i / m) = c_{\max} / m, equivalent to .Â
Corollary 1 sharpens the minimax bound of
Moderate contagion (c = 0.5, c_{\max} = 1.0): m > 2, so m \geq 3. Binary diversification is insufficient.
High contagion (c = 0.1, c_{\max} = 1.0): m > 10, so m \geq 11. The substrate-count requirement grows inversely with the homogeneous cascade fraction.
Low contagion (c \approx c_{\max} \approx 0.3, localized failure modes): m > 1, recovering the minimax bound.
Practically: the minimum number of substrates required for diversification to strictly help depends on the measured contagion intensity, not just on the structural claim that m \geq 2. A binary partition is correct for threat classes where the homogeneous cascade already saturates (c \to 1), but underprovisions for contagion-driven threats where smaller substrates saturate faster than the homogeneous baseline. For unbalanced partitions, the operational form f_{\max} < c / c_{\max} is the direct test: it bounds the largest substrate’s share rather than specifying a count, and is the condition the trust model should check when assessing a given partition’s dependency-risk posture.
The per-step expected risk gap compounds at the same rate as growth,
unlike the one-time minimax penalty in
For a population with p_{\mathrm{adv}} =
0.01 (1% per-step probability of substrate-targeting event) and
cascade fraction c = 0.5 (half the
targeted substrate’s capabilities are lost in a typical event), \Delta_{\mathrm{risk}} = 0.01 \times
\Delta_{\mathrm{div}}
\times 0.5 where \Delta_{\mathrm{div}} is the substrate
diversification advantage from Proposition 4 of
Table 2 shows how \Delta_{\mathrm{risk}} / {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G) varies over plausible parameter ranges. The per-step risk gap scales linearly with p_{\mathrm{adv}} and c, and the compounded effect over the planning horizon is \Delta_{\mathrm{risk}} / (1 - \gamma). The 80% upper-bound entry corresponds to a 5% adversarial rate with high contagion (c = 0.8) and substantial divergence (\Delta_{\mathrm{div}} = 0.20), representing a compromised-substrate scenario rather than a typical operating point. A more representative regime is p_{\mathrm{adv}} = 0.01, c = 0.5, \Delta_{\mathrm{div}} = 0.10, which yields a 5% compounded gap, material but containable by the verification architecture of Section 3.4.
| p_{\mathrm{adv}} | c | \Delta_{\mathrm{div}}/{\mathop{\mathrm{vol}}_{\mathrm{P}}} | \Delta_{\mathrm{risk}}/{\mathop{\mathrm{vol}}_{\mathrm{P}}} (per step) | Compounded at \gamma = 0.99 |
|---|---|---|---|---|
| 0.001 | 0.3 | 0.10 | 3 \times 10^{-5} | 0.3\% |
| 0.01 | 0.3 | 0.10 | 3 \times 10^{-4} | 3.0\% |
| 0.01 | 0.5 | 0.10 | 5 \times 10^{-4} | 5.0\% |
| 0.01 | 0.5 | 0.20 | 1 \times 10^{-3} | 10.0\% |
| 0.01 | 0.8 | 0.10 | 8 \times 10^{-4} | 8.0\% |
| 0.05 | 0.5 | 0.10 | 2.5 \times 10^{-3} | 25.0\% |
| 0.05 | 0.8 | 0.20 | 8 \times 10^{-3} | 80.0\% |
The per-step risk gap \Delta_{\mathrm{risk}} is amplified by the
cross-substrate cooperative capabilities analyzed in
The adversarial balance condition c > \max_i (f_i \cdot c_i) converts the companion paper’s minimax bound into an expected per-step cost \Delta_{\mathrm{risk}} that compounds as \Delta_{\mathrm{risk}} / (1 - \gamma) in the discounted value function, making dependency risk a first-class term in the horizon calculation rather than a one-time structural penalty. The condition is explicit about contagion dynamics and inverts (Corollary 1) into a quantitative bound on the required substrate count: m > c_{\max}/c for balanced partitions, or f_{\max} < c / c_{\max} for arbitrary partitions. This sharpens the minimax m \geq 2 into a measurement-driven requirement that scales with contagion intensity. The regime where the condition fails (unbalanced partitions under contagion amplification) is a regime where diversification genuinely does not help, which the trust model can detect and route around.
The causal attribution channel of Section 2 requires the
actor to know or learn the causal DAG over agent actions and capability
changes. Standard observational structure learning
Because causal attribution is one channel among several (Section 2.4), and capability gating via
the poset of
The structural assumptions underlying Proposition 1(c) are partially testable through a residual-covariance diagnostic: if \mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{CA}(a_k) \mid G_t) exceeds the level the current SCM predicts, the SCM is misspecified (possible causes include a missing edge, latent common causes, exogenous-orthogonality failure, non-additivity, or nonstationarity), and the causal channel’s reliability should degrade accordingly. Integrating such a diagnostic into a calibrated SCM-confidence scalar c_{\mathcal{M}}(t) \in [0,1] that weights the causal channel’s contribution to the multi-channel aggregate, so that declining sufficiency down-weights causal attribution without burning specific agents’ trust, is an architectural direction we identify but do not formalize here. The full adversarial-robust structure-learning problem, including the calibration procedure and its interaction with the behavioral and capability channels, is deferred to a companion paper.
The one-at-a-time causal attribution \mathrm{CA}(a_j, t) (Definition 2) decomposes the total
{\mathop{\mathrm{vol}}_{\mathrm{P}}}-change
exactly only under additive SCM structure (). In multi-agent
populations, coordinated attacks produce non-additive interactions: an
attack requiring m agents to cooperate
has \Delta
{\mathop{\mathrm{vol}}_{\mathrm{P}}} that depends on their joint
action, and the one-at-a-time intervention \mathrm{do}(\pi_j = \mathrm{skip}) produces
non-unique individual attributions: under pure complementarities (\Delta {\mathop{\mathrm{vol}}_{\mathrm{P}}}= \pi_1
\cdot \pi_2), marginal attribution may over-count or double-count
each coordinator’s contribution; under other non-additive structures the
attribution allocates the interaction term arbitrarily across agents.
The fundamental issue is that the one-at-a-time decomposition is not
well-defined when the structural equation is non-additive, and no
single-agent marginal faithfully represents the joint effect. The exact
fix is Shapley-style attribution
The pathway-conditional structural verification residual
(Definition 3) measures divergence between
the claim and the actor’s own assessment of the claimed pathway. If the
actor’s world model is itself poorly calibrated, the residual reflects
model error rather than claim quality. An agent whose risk assessments
are accurate but disagree with a miscalibrated actor will accumulate
high \|r^{\mathrm{risk}}\|^2 and
concentrate around low {T^{\mathrm{risk}}}_j^*, producing a false
negative. This is mitigated by the self-trust mechanism of
Of the three structural assumptions underlying Proposition 1(c), exogenous orthogonality is the most likely to fail in deployment. Shared environmental shocks (power grid state, information broadcasts, market movements, weather, time-of-day effects on a shared population) induce covariance between agents’ contributions even under a correctly specified causal DAG. Conditioning on G_t does not remove time-localized shocks that affect multiple agents’ actions simultaneously. When the assumption fails, the variance reduction \sigma_{\mathrm{do}}^2 \leq \sigma^2 no longer holds and the causal detector may perform worse than the correlational baseline (as acknowledged in Proposition 1(c)). The residual-covariance diagnostic described above provides detection: empirically elevated \mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{CA}(a_k) \mid G_t) signals orthogonality failure, and the SCM-confidence scalar c_{\mathcal{M}}(t) should down-weight the causal channel accordingly. The multi-channel aggregate then shifts weight toward capability gating and behavioral channels, which do not depend on exogenous orthogonality. Importantly, orthogonality failure is detectable rather than silent. The residual covariance term \mathrm{Cov}(\mathrm{CA}(a_j), \mathrm{CA}(a_k) \mid G_t) is an estimable quantity: it can be estimated from repeated observations at similar capability-graph states, using the same attribution data the actor already collects for the SPRT detector. When the estimated covariance exceeds the level predicted under exogenous orthogonality, the assumption has failed and the failure is empirically visible. This turns the most fragile of the three structural assumptions into the most monitorable one: a better epistemic position than “fragile but unobservable.” The natural integration point is the SCM-confidence scalar c_{\mathcal{M}}(t) described above: elevated residual covariance is a direct input to c_{\mathcal{M}} updating, triggering the multi-channel aggregate to down-weight the causal channel before the degraded variance reduction produces misleading detection results. Characterizing the populations and environments in which exogenous orthogonality approximately holds — and deriving a partial-independence correction for the variance reduction when it does not — is an open problem.
The trust-weighted log-linear opinion pool (Definition 5) recovers Bayesian updating in the no-tempering limit under conditional independence of agents’ observations given the true risk level. If reporters’ assessments are correlated (they share information sources, or one reporter’s assessment influences others), the aggregation overweights the shared signal. In practice, partial independence is the norm: agents with different observation channels and different world models produce partially independent assessments, and the log-linear pool is an approximation whose error grows with the correlation. Characterizing the approximation error under partial independence, and deriving a correction term from a reporter-network model, is deferred to future work.
Three of this paper’s mechanisms share a common vulnerability: the adversary can observe and exploit fixed defense parameters.
Verification-gate denial of service (Section 3.4). The salience gate is cheap (aggregate and threshold); the verification gate is computationally expensive (forward causal evaluation on the actor’s own SCM). An adversary who floods the actor with structurally plausible false risk claims can exhaust the verification budget without any single claim bypassing independent verification. The two-gate architecture prevents coordinated amplification of false claims from reaching action, but leaves the actor vulnerable to attention exhaustion via structurally plausible decoys. The defense direction is prioritization: the verification gate should process candidates in order of {T^{\mathrm{risk}}}_j-weighted expected contraction magnitude, so high-trust reporters’ claims are verified first and low-trust flooding imposes only queue delay, not budget exhaustion. Formalizing the verification-gate capacity model and its interaction with the salience threshold is deferred.
The adversary knows \epsilon (Section 4). The SPRT design parameter \epsilon sets the minimum detectable
contraction magnitude. A well-informed adversary who can infer \epsilon from the actor’s behavior (by
probing with small contractions and observing flag rates) will calibrate
|\mu_j| just below \epsilon and sit in the undetected regime
permanently. The evasion-bandwidth bound’s “undetected contraction rate
bounded by |\mu_j|” is only informative
if |\mu_j| is bounded by something
other than the design parameter itself. The defense direction is to
treat \epsilon as a nuisance parameter
with an informative prior p(\epsilon)
encoding the actor’s belief about the adversary’s calibration
capability, and to marginalize. Because the evasion-bandwidth bound
\tau_{\mathrm{adapt}}(\epsilon) =
O\!\left(\sigma_{\mathrm{do}}^2 /
[\epsilon(|\mu_j| - \epsilon/2)]\right) is highly nonlinear in
\epsilon: small-\epsilon draws produce very long detection
times. The marginalized bound \begin{equation}
\mathbb{E}_{p}[\tau_{\mathrm{adapt}}(\epsilon)]
= \int_{\epsilon_{\min}}^{\epsilon_{\max}}
\frac{\sigma_{\mathrm{do}}^2}
{\epsilon\,(|\mu_j| - \epsilon/2)}\,
p(\epsilon)\,d\epsilon
\end{equation} is finite for any prior supported on [\epsilon_{\min}, \epsilon_{\max}] with 0 < \epsilon_{\min} and \epsilon_{\max} < 2|\mu_j| (the detection
regime where the SPRT drift is positive; the upper cutoff requires the
defender to have prior beliefs about plausible scorpion contraction
magnitudes, since |\mu_j| is precisely
what the SPRT is trying to detect), and the tail bound \Pr_{p}(\tau_{\mathrm{adapt}} > T) is the
prior mass on the \epsilon-region where
\tau_{\mathrm{adapt}}(\epsilon) > T,
a set that shrinks as T grows. A
log-uniform prior p(\epsilon) \propto
1/\epsilon on [\epsilon_{\min},
\epsilon_{\max}] is a natural special case: it is the
maximum-entropy prior over \log
\epsilon on bounded support
The defense is not free. Under log-uniform p(\epsilon) the integrand of is dominated by the 1/\epsilon^2 tail near \epsilon_{\min}, giving \begin{equation} \mathbb{E}_{p}[\bar{\tau}] \;\approx\; \frac{\sigma_{\mathrm{do}}^2} {|\mu_j|\,\epsilon_{\min}\, \log(\epsilon_{\max}/\epsilon_{\min})} \end{equation} for \epsilon_{\min} \ll |\mu_j|. Relative to fixed \epsilon = \epsilon_{\max} (the least-delay choice), the expected detection time expands by a factor \epsilon_{\max}/[\epsilon_{\min}\, \log(\epsilon_{\max}/\epsilon_{\min})]. For plausible operational values \epsilon_{\min} = 0.01, \epsilon_{\max} = 0.1, this factor is approximately 4.3, material but bounded. The parameter \epsilon_{\min} is thus a first-class policy variable: reducing it extends the range of detectable contractions but proportionally expands expected detection delay. The Markov-inequality bound (Proposition 4(a)) remains valid under randomization, but the reference timescale \bar{\tau} inflates by the same factor, so the high-probability detection guarantee degrades proportionally. The defender trades evasion-bandwidth robustness against a quantifiable expansion in expected detection time.
T_j / {T^{\mathrm{risk}}}_j independence (Section 3). The behavioral trust T_j and risk-trust {T^{\mathrm{risk}}}_j evolve independently by design, optimizing for the case of honest specialists (good capability reporter, poor risk forecaster, or vice versa). A strategic reporter can exploit the independence: operating as a known-unreliable capability reporter (low T_j) while generating structurally plausible risk claims that accumulate high {T^{\mathrm{risk}}}_j, thereby driving the verification-gate queue despite a poor behavioral track record. A two-sided clamp on risk-trust addresses both failure modes: \begin{equation} T^{\mathrm{risk}}_{\min} \;\leq\; {T^{\mathrm{risk}}}_j \;\leq\; \max\bigl(f(T_j),\; T^{\mathrm{risk}}_{\min}\bigr) \end{equation} for some monotone f and a design parameter T^{\mathrm{risk}}_{\min} > 0. The lower bound is the floor: after each EWMA update the realized {T^{\mathrm{risk}}}_j is clipped from below at T^{\mathrm{risk}}_{\min}, so neither an adversary driving T_j toward zero nor a run of correlated environmental noise on an honest agent can push risk-trust below the floor, preventing self-reinforcing exclusion. The upper bound is the coupling cap: when behavioral trust is high enough that f(T_j) \geq T^{\mathrm{risk}}_{\min}, the cap limits how far risk-trust can exceed the behaviorally-warranted level, closing the low-T_j exploit. This is structurally analogous to the adversarial-balance condition c > \max_i(f_i \cdot c_i) of Section 5: a structural constraint that weakens both failure modes without claiming to resolve either tightly. The parameter T^{\mathrm{risk}}_{\min} itself carries a trade-off: too large re-admits the low-T_j exploit (the cap lifts so far that risk-trust can remain high despite poor behavioral trust); too small approaches the unbounded case. The two-sided form does not eliminate the robustness-versus-fairness tension; it bounds its worst-case expression on both sides.
The unifying theme is that fixed, observable parameters of the defense create an adversarial surface. The structural defense direction across all three is randomization and coupling: randomize the parameters the adversary needs to know (gate priority, \epsilon, channel weights) and couple the trust channels so that gaming one does not leave the others unaffected.
The log-odds aggregation (Definition 5) requires every reporter to calibrate against a shared prior \pi_0, and the paper notes this requires a coordination protocol. The choice of \pi_0 is itself a manipulation surface: a reporter who can influence \pi_0 (by proposing it, by coordinating on a biased value, or by exploiting a protocol weakness) can tilt the aggregation without submitting a false report. This paper does not develop a prior-robust solution; the adversarial robustness of \pi_0 establishment is an open problem that interacts with the adversarial parameter exposure pattern above. We note that per-agent prior randomization, the natural analog of the randomized-\epsilon defense, would break the shared-prior condition on which the log-odds aggregation’s Bayes-optimality depends (Definition 5), so prior-robustness likely requires re-deriving the aggregation rule under heterogeneous priors rather than perturbing \pi_0.
The structural verification residual (Definition 3) uses the Euclidean norm on a two-dimensional vector (probability difference, normalized magnitude difference), which implicitly treats both components as equally important. A probability error of 0.1 and a magnitude error of 0.1 \cdot {\mathop{\mathrm{vol}}_{\mathrm{P}}}(G_t) enter symmetrically into the trust update but represent different kinds of miscalibration. A weighted norm with weights tuned to the actor’s utility over probability-versus-magnitude errors would be more principled, but would require the actor to specify a preference ordering over error types that the framework does not assume. The equal-weight Euclidean norm is a neutral default; domain-specific actors with a preference for probability accuracy over magnitude accuracy (or vice versa) should adjust the norm accordingly.
The dependency-risk analysis (Section 5) assumes cascade fractions c_i measure only within-substrate propagation, with cross-substrate cascades outside the scope of the current bound. In the motivating threat classes (memetic attacks, financial contagion, supply-chain disruption), cross-substrate propagation is exactly the concerning pathway: a memetic attack on silicon agents that triggers panic-driven behavior in biological agents through inter-substrate communication channels is precisely a cross-substrate cascade, and the adversarial balance condition does not cover it. Modeling cross-substrate cascade propagation requires extending the SCM’s structural equations to include inter-substrate edges, which produces a coupled cascade model whose analysis is substantially more complex than the within-substrate case.
This gap is also a coupling point with the structural diversification
guarantees developed elsewhere in the framework.
Proposition 4’s SPRT bounds are per-epoch, and an epoch boundary resets the SPRT statistic. The epoch-declaration protocol is a structural vulnerability: if the actor declares epoch boundaries based on detected non-stationarity, an adversary can trigger epoch resets by inducing structural transitions. If epoch boundaries are time-based (every \tau steps), an adversary can synchronize strategy shifts to epoch boundaries so the SPRT never accumulates sufficient evidence. The defensive direction is a hybrid protocol: time-based epoch boundaries provide a baseline accumulation window that the adversary cannot shorten, while non-stationarity detection triggers auxiliary monitoring (e.g., a parallel SPRT with carry-over statistics) rather than resetting the primary accumulator. This mitigates the most direct epoch-manipulation attacks but does not eliminate the vulnerability, since any fixed-window protocol still gives the adversary a known accumulation budget to calibrate against. The hybrid construction does, however, make the adversary’s inversion problem harder: the detection trigger is a function of SPRT threshold accumulation, which depends on \sigma_{\mathrm{do}}^2, \epsilon, and the observed statistics sequence; the observed sequence is itself a function of both the actor’s prior monitoring history and the adversary’s current strategy. The joint distribution over (time-based budget phase, detection-trigger sensitivity) is therefore harder for the adversary to invert than the pure time-based case, unless it can reconstruct the actor’s internal monitoring state from observable history. The residual vulnerability is that an adversary with high-bandwidth observation of the actor’s flag decisions can estimate this joint distribution over many epochs, gradually reducing its entropy. “Hybrid” thus narrows the adversarial surface rather than closing it, consistent with the adversarial-parameter-exposure theme: fixed parameters leak to observation; hybrid parameters leak more slowly. Formalizing this hybrid protocol and its adversarial robustness is deferred.
The preceding GFM papers define what to maximize ({\mathop{\mathrm{vol}}_{\mathrm{P}}}(G)), prove structural safety properties, and establish anti-monopolar pressure. This paper addresses how the actor reasons about other agents: providing a causal model that upgrades scorpion detection and risk evaluation from statistical to causal identification and sharpens the remaining evaluation mechanisms.
Causal attribution (Section 2) upgrades the foundational paper’s correlational scorpion detection to causal identification through do-calculus counterfactuals. The convergence advantage over correlational detection is proportional to confounding variance, which is substantial in populations with many simultaneously acting agents. Non-stationary scorpion bounds (Section 4) characterize the evasion bandwidth: how fast a scorpion must adapt to remain undetected, and the maximum undetected contraction rate.
Risk-trust dynamics (Section 3) provide formal EWMA update equations for {T^{\mathrm{risk}}}_j, with a two-gate architecture separating attention allocation (log-odds aggregation) from action decisions (the actor’s own causal verification). This prevents coordinated report-bombing from being amplified into action while preserving sensitivity to genuine independent agreement. Probabilistic dependency risk (Section 5) converts minimax substrate bounds into per-step expected costs that compound at the same rate as growth, providing the quantitative risk model needed for risk-adjusted planning.
Together with the foundational framework
Teague Lasser identified the statistical-to-causal upgrade as the unifying contribution, proposed the log-odds risk aggregation and the two-gate attention/verification architecture, directed all revisions, and made final editorial decisions. Responsible for the paper’s intellectual direction and all claims made.
Claude Opus 4.6 (Anthropic) drafted the formal exposition: the structural causal model for capability dynamics, the do-calculus contraction attribution, the L^2 risk-trust EWMA dynamics, the pathway-conditional structural verification residual, the SPRT-based causal scorpion detection and evasion bandwidth bounds, and the probabilistic dependency-risk model with the adversarial balance condition.
GPT 5.4 (OpenAI) served as technical reviewer, identifying formal gaps in the risk-trust convergence claims, the causal attribution assumptions, the log-odds aggregation’s cross-paper type consistency, and the dependency-risk quantitative analysis — in particular the need to replace the pointwise cascade-fraction assumption with the adversarial balance condition that admits contagion dynamics.
Transparency note. Both AI systems operated as tools under human direction. Neither system has continuity across sessions, cannot take responsibility for the work in the sense required by most venue authorship policies, and cannot respond to reviewer queries independently. They are listed as authors to accurately represent their contributions to the intellectual content of the paper, not to claim that they meet all criteria of traditional academic authorship. The corresponding author for all inquiries is Teague Lasser.
Propositions 1 (Convergence Advantage), 2 (Risk-Trust Convergence), 3 (Causal Scorpion Detection), and 5 (Expected Risk Comparison) are proved in the main text. This appendix provides the remaining proofs.
Proof. Within each strategy epoch of length \tau_{\mathrm{adapt}}, the scorpion’s
behavior is stationary and the causal-attribution sequence \{\mathrm{CA}(a_j, t)\} is i.i.d. with mean
\mu_j < 0 and variance \sigma_{\mathrm{do}}^2 (the same assumption
as Proposition 3). The SPRT decision rule
tests the least-favorable simple pair H_0:
\mu_j = 0 versus H_1: \mu_j =
-\epsilon under the Gaussian model. The per-observation expected
log-likelihood ratio under the true mean \mu_j is \mathbb{E}_{\mu_j}[\log(p_{H_1}/p_{H_0})] =
\epsilon(|\mu_j| - \epsilon/2) / \sigma_{\mathrm{do}}^2. By
Wald’s identity
Part (a): detection. By Markov’s inequality, P(\tau_{\mathrm{detect}} > k \bar{\tau}(\mu_j)) \leq 1/k for any k > 1. When \tau_{\mathrm{adapt}} \geq k \bar{\tau}(\mu_j), the epoch provides enough observations that the SPRT crosses the detection threshold with probability at least 1 - 1/k.
Part (b): evasion. On an epoch boundary the null and alternative hypotheses change (the new epoch has a new (\mu_j, \sigma_{\mathrm{do}}^2)), so the SPRT statistic is reset to zero: evidence accumulated under one pair of hypotheses is not informative about a different pair. The auxiliary EWMA monitors (s –) retain a fraction \alpha^{\tau_{\mathrm{adapt}}} of prior evidence as their output, but the SPRT decision rule itself is memoryless across regime changes. If \tau_{\mathrm{adapt}} < \bar{\tau}(\mu_j), the expected accumulated evidence at the epoch boundary is below the SPRT threshold: the evidence accumulation rate is insufficient for reliable detection within the epoch.
The maximum undetected contraction per epoch is |\mu_j| \cdot \tau_{\mathrm{adapt}}. For a scorpion to evade detection within an epoch, it must have \tau_{\mathrm{adapt}} < \bar{\tau}(\mu_j). The per-step undetected contraction is bounded by |\mu_j|, and the scorpion can sustain this for at most \tau_{\mathrm{adapt}} steps before changing strategy. At the design point |\mu_j| = \epsilon, the per-step contraction is at most \epsilon. The time-averaged undetected contraction rate is therefore at most |\mu_j|, achievable only by a scorpion that never needs to pause for strategy adaptation.Â
The gate’s strength is bounded by the fidelity of the maintained capability model: hidden delegation, latent substitute capabilities, or a misspecified poset can defeat it.↩︎