Multi-Agent Ensemble Intelligence

Abstract

The trading system runs 72 signal-emitting agents across three timescales, each agent producing a directional view, conviction, and horizon for every traded pair on a fixed cadence. Agents are organised as features, not as standalone strategies — they feed into the architecture's three-tier doctrine: Tier 1 standalone strategies that pass a deflation gauntlet trade live capital; Tier 2 features (state-encoded signals like volatility regime and CME positioning extremity) gate or size those strategies; Tier 3 is a meta-labeler that, in principle, allocates across Tier 1 sleeves. The current production state is three Tier 1 strategies (FV-Fast residual mean reversion, Elasticity-Reversal, and Range Trading), per-system attribution at the execution layer, and a Tier 3 meta-labeler artifact running in observational shadow only after losing a head-to-head A/B test against simpler HRP weighting. This paper documents what is live, what is shadow, what has been killed, and the doctrine that decides which class anything belongs to.

What replaced the April architecture

The April 2026 picture of this system was "23 agents fed into a Bayesian learner that produced one ensemble score per pair, conditioned on regime classification". That picture is no longer accurate in any of its three load-bearing parts:

The agent count grew to 72 after the 2026-05-06 retrain (which added per-pair state agents like vol_regime and cot_extremity) and the subsequent kill campaign that retired regime-broadcast lanes.
The Bayesian learner as a single output was replaced by a two-tier architecture in which agents do not directly compete for the trading decision. The Tier 1 strategies trade their own signals; agents inform Tier 2 (feature engineering) and Tier 3 (meta-labeler / portfolio overlay).
Regime classification as an entry gate was retired 2026-05-03. The original 30–40% Sharpe lift attributed to per-pair vol / VIX / DXY / dispersion gating did not survive the deflation gauntlet under the new doctrine. State-encoded successors (vol_regime, cot_extremity) survive as feature-only signals; entry gating and per-regime parameter tables are gone.

See the archived [Regime Classification](./regime-classification) whitepaper for the retired design.

Three-tier architecture

Tier 1 — Standalone strategies (trade their own signal)

A Tier 1 strategy has a single mechanism, a single set of parameters, and survives the v3 deflation gauntlet on out-of-sample data. It trades its own signal directly through the central execution router. The current live roster:

Strategy	Mechanism	Horizon	Status
`fv_fast_30m`	Residual mean reversion on USD-pair vs DXY basket	Intraday (30m bar, 1m intra-bar stops)	Armed live 2026-05-15
`er_1h`	Cross-asset elasticity reversal (FX residual to ES driver)	Hourly	Pending re-arm
`rt_1h`	Range / Donchian fade	Hourly	Pending re-arm

The re-arm of ER and RT is gated on a 24-hour clean window for FV-Fast under the per-system attribution architecture (see §Per-system attribution).

Tier 2 — Features (gate or size Tier 1)

A Tier 2 candidate is an agent that does not stand alone but improves a Tier 1 strategy when included as a feature. The current admitted Tier 2 set has two members:

vol_regime — per-pair ATR percentile rank as a state. Admitted 2026-05-05 after passing all four falsifiers in state-encoded form. Directional encoding was killed (per-pair sign flipped 24/25); the surviving encoding is atr_pct_rank only.
cot_extremity — CME positioning extremity per pair. Admitted 2026-05-06 with seven-pair effective universe (NZD and ZAR dropped on disaggregated COT gap).

A third Tier 2 candidate, supertanker_squeeze, is pre-registered as a Tier 1 directional candidate (theory-locked single config) with admission gated on 2026-05-06-forward shadow data only — no in-sample tuning permitted.

Tier 3 — Meta-labeler (portfolio overlay)

The Tier 3 meta-labeler reads conviction streams from all 72 agents plus the Tier 2 features and produces a per-bar P(positive return) for each candidate trade. A Phase 1 model — frozen artifact meta_labeler_v1_2026_05_08 — was trained on 21 active agents and tested against an HRP (Hierarchical Risk Parity) baseline applied directly to the three Tier 1 sleeves.

Result of the 2026-05-11 A/B test (annualised Sharpe, two windows):

Strategy	85-day window	248-day window
Equal-Weight Tier 1	+4.83	+3.97
Inverse-Vol Tier 1	+7.94	+4.76
HRP-static Tier 1	+10.04	+5.29
HRP-rolling Tier 1 (warm)	n/a	+5.83
Meta-labeler v1	+2.22	+2.19

The meta-labeler lost on both windows. Under the architecture's Tier 3 admittance rule (must produce a marginal Sharpe lift of ≥ +1.5 over the best simpler allocator), the v1 artifact fails by a wide margin and is not promoted to live capital. It continues to run on mini in shadow mode for observational research; the dashboard at /shadow/meta-labeler makes its research-only status explicit.

A Phase 2 retrain is scheduled for 2026-06-06 with the canonical Lopez de Prado meta-labeling role: Tier 1 conviction streams as features (the obvious gap in v1, which only saw the 21-agent panel). The retrain is research-track, not deployment-track.

The 72-agent roster

Agents are organised by emission cadence:

Fast (sub-hourly, ~44 agents) — Bollinger bands at multiple parameterisations, RSI variants, MACD variants, MA crossovers, ROC, stochastics, residual / OFI / jump detectors, vol asymmetry, ER flow.
Medium (hourly, ~26 agents) — fair-value model, CME volume, CME deep, DTCC flow, DTCC put-call, DTCC deep, gold/USD, copper/AUD, oil/CAD, oil/rates divergence, gold/DXY divergence, equity/FX, risk appetite, risk parity, VIX, bonds, DXY, candlestick patterns, demark, doji star, head-and-shoulders, pivots, fibonacci, donchian, keltner, ichimoku.
Slow / state (~2 agents) — vol_regime, cot_positioning, trend_state, corr_regime, regime_persistence, returns_dist, mr_probability, ou_params, momentum_factor, jump_detector, variance_ratio, cojump.

The list is intentionally heavy on technical-analysis-style agents at the fast end (those provide diverse short-horizon signals at near-zero compute cost) and weighted toward microstructure / cross-asset / positioning agents at the slow end (those carry the structural information the Tier 1 strategies actually trade against).

The roster is not stable. Since 2026-04 it has churned: agents added (vol_regime, cot_extremity, supertanker_squeeze, gold_dxy_divergence, oil_rates_divergence), agents killed by the deflation gauntlet (hmm_regime, bocpd_regime, trend_state_v2, nfci_state, lev_money_extremity, es_momentum, residual_acceleration_1m, eigenvalue_spread, adx_pct_rank, sleeve_dispersion). The kill record matters as much as the live record — most candidates do not survive the v3 gauntlet, and that is the point of the doctrine.

The v3 deflation doctrine

A candidate strategy or feature is not admitted on point Sharpe. It must pass an 8-gate gauntlet, all on out-of-sample data with no parameter tuning visible to the gauntlet:

1. DSR (Deflated Sharpe Ratio) — Sharpe inflation-corrected for the number of trials in the search.

2. PBO (Probability of Backtest Overfitting) — combinatorially purged cross-validation; a candidate that ranks well in-sample but poorly out-of-sample fails.

3. Monte Carlo permutation test — block-resampled returns must produce a null distribution that does not contain the observed Sharpe.

4. GARCH residual test — Sharpe must survive conditional-vol-aware residual analysis.

5. HMM regime null — Sharpe must not be an artifact of one regime in a hidden-Markov decomposition of the price series.

6. Walk-forward — multi-fold expanding-window Sharpe must be uniformly positive across folds with a minimum per-fold threshold.

7. Holdout replication — final held-out year (~25% of data) must replicate the Sharpe seen in the gauntlet folds.

8. DSR at K=3 — repeat (1) with the candidate's strongest three sub-strategies treated as a multiple-testing family.

Kills come from any single gate failure. The current Tier 1 admits (fv-fast 30m, ER 1h, RT 1h) pass all eight on the post-cutover frozen substrate. The state-encoded Tier 2 admits (vol_regime, cot_extremity) pass the relevant subset (4-gate for features). The April-era regime conditioning that produced the 30–40% Sharpe claims fails Gate 2 (PBO) and Gate 7 (holdout) under the current substrate and is no longer live.

Per-system attribution

The execution layer treats each Tier 1 strategy as an independent system with its own attribution. Every order placed through the central router carries an orderRef='{system}:{signal_id}' tag. The reconciler reads order status via a per-permId order registry — not IB's per-pair net position — so cross-system overlap on the same pair (which previously caused a zombie-position incident) is impossible by construction.

This architecture, shipped 2026-05-15, is what allowed FV-Fast to be re-armed without re-arming ER and RT. Under the old per-pair netting, they had to move as a block; under per-system attribution, each strategy graduates from shadow to live on its own clock.

What the dashboards show

Three pieces of this stack are externally visible:

Currency Heatmap — a per-pair short / medium / long bias derived from the agent ensemble. Useful as a pulse-of-the-stack indicator; not the trading decision.
Agent Alignment — a conviction grid showing which agent groups agree or disagree per pair. Disagreement across groups is itself a feature; high agreement on a low-Sharpe agent is a red flag in feedback to the gauntlet.
Meta-Labeler Phase 1 Shadow — the Tier 3 v1 artifact's per-tick recommendations and ledger, marked research-only after the 2026-05-11 A/B. The dashboard exists so the artifact's behaviour can be inspected; it does not size or gate live capital.

The live Tier 1 dashboards (FV-Fast, Elasticity-Reversal, Range Trading) and the Tier 1 paper shadow are separate; they show what the strategies are actually doing, not what the agent ensemble believes.

Open work

Phase 2 meta-labeler retrain (2026-06-06) with Tier 1 conviction as features. The Lopez de Prado canonical role: primary mechanism produces side, meta-labeler produces size.
Re-arm of ER 1h and RT 1h after the FV-Fast 24-hour clean window.
supertanker_squeeze pre-registered forward-only shadow continues to accumulate; no admission until ≥ 30 days of post-2026-05-06 ticks.
HRP-rolling (60-day window) is the preferred capital-allocation overlay once the Tier 1 paper shadow has produced enough continuous history to fill the warmup. Estimated mid-July.

Conclusion

The system is not a single Bayesian learner over 23 agents anymore. It is a tiered architecture in which 72 agents inform features, three Tier 1 strategies trade their own signals, and a Tier 3 meta-labeler tries — and so far fails — to beat the simplest possible weighting of the three sleeves. The deflation doctrine that decides what gets admitted is the load-bearing piece; without it, every one of the killed agents in the recent history would still be quietly siphoning capital.

EricL Analytics — updated May 2026