Multi-Agent Ensemble Intelligence


Abstract


The trading system runs 72 signal-emitting agents across three timescales, each agent producing a directional view, conviction, and horizon for every traded pair on a fixed cadence. Agents are organised as features, not as standalone strategies — they feed into the architecture's three-tier doctrine: Tier 1 standalone strategies that pass a deflation gauntlet trade live capital; Tier 2 features (state-encoded signals like volatility regime and CME positioning extremity) gate or size those strategies; Tier 3 is a meta-labeler that, in principle, allocates across Tier 1 sleeves. The current production state is three Tier 1 strategies (FV-Fast residual mean reversion, Elasticity-Reversal, and Range Trading), per-system attribution at the execution layer, and a Tier 3 meta-labeler artifact running in observational shadow only after losing a head-to-head A/B test against simpler HRP weighting. This paper documents what is live, what is shadow, what has been killed, and the doctrine that decides which class anything belongs to.


What replaced the April architecture


The April 2026 picture of this system was "23 agents fed into a Bayesian learner that produced one ensemble score per pair, conditioned on regime classification". That picture is no longer accurate in any of its three load-bearing parts:



See the archived [Regime Classification](./regime-classification) whitepaper for the retired design.


Three-tier architecture


Tier 1 — Standalone strategies (trade their own signal)


A Tier 1 strategy has a single mechanism, a single set of parameters, and survives the v3 deflation gauntlet on out-of-sample data. It trades its own signal directly through the central execution router. The current live roster:


StrategyMechanismHorizonStatus
`fv_fast_30m`Residual mean reversion on USD-pair vs DXY basketIntraday (30m bar, 1m intra-bar stops)**Armed live 2026-05-15**
`er_1h`Cross-asset elasticity reversal (FX residual to ES driver)HourlyPending re-arm
`rt_1h`Range / Donchian fadeHourlyPending re-arm

The re-arm of ER and RT is gated on a 24-hour clean window for FV-Fast under the per-system attribution architecture (see §Per-system attribution).


Tier 2 — Features (gate or size Tier 1)


A Tier 2 candidate is an agent that does not stand alone but improves a Tier 1 strategy when included as a feature. The current admitted Tier 2 set has two members:



A third Tier 2 candidate, supertanker_squeeze, is pre-registered as a Tier 1 directional candidate (theory-locked single config) with admission gated on 2026-05-06-forward shadow data only — no in-sample tuning permitted.


Tier 3 — Meta-labeler (portfolio overlay)


The Tier 3 meta-labeler reads conviction streams from all 72 agents plus the Tier 2 features and produces a per-bar P(positive return) for each candidate trade. A Phase 1 model — frozen artifact meta_labeler_v1_2026_05_08 — was trained on 21 active agents and tested against an HRP (Hierarchical Risk Parity) baseline applied directly to the three Tier 1 sleeves.


Result of the 2026-05-11 A/B test (annualised Sharpe, two windows):


Strategy85-day window248-day window
Equal-Weight Tier 1+4.83+3.97
Inverse-Vol Tier 1+7.94+4.76
HRP-static Tier 1+10.04+5.29
HRP-rolling Tier 1 (warm)n/a+5.83
Meta-labeler v1+2.22+2.19

The meta-labeler lost on both windows. Under the architecture's Tier 3 admittance rule (must produce a marginal Sharpe lift of ≥ +1.5 over the best simpler allocator), the v1 artifact fails by a wide margin and is not promoted to live capital. It continues to run on mini in shadow mode for observational research; the dashboard at /shadow/meta-labeler makes its research-only status explicit.


A Phase 2 retrain is scheduled for 2026-06-06 with the canonical Lopez de Prado meta-labeling role: Tier 1 conviction streams as features (the obvious gap in v1, which only saw the 21-agent panel). The retrain is research-track, not deployment-track.


The 72-agent roster


Agents are organised by emission cadence:



The list is intentionally heavy on technical-analysis-style agents at the fast end (those provide diverse short-horizon signals at near-zero compute cost) and weighted toward microstructure / cross-asset / positioning agents at the slow end (those carry the structural information the Tier 1 strategies actually trade against).


The roster is not stable. Since 2026-04 it has churned: agents added (vol_regime, cot_extremity, supertanker_squeeze, gold_dxy_divergence, oil_rates_divergence), agents killed by the deflation gauntlet (hmm_regime, bocpd_regime, trend_state_v2, nfci_state, lev_money_extremity, es_momentum, residual_acceleration_1m, eigenvalue_spread, adx_pct_rank, sleeve_dispersion). The kill record matters as much as the live record — most candidates do not survive the v3 gauntlet, and that is the point of the doctrine.


The v3 deflation doctrine


A candidate strategy or feature is not admitted on point Sharpe. It must pass an 8-gate gauntlet, all on out-of-sample data with no parameter tuning visible to the gauntlet:


1. DSR (Deflated Sharpe Ratio) — Sharpe inflation-corrected for the number of trials in the search.

2. PBO (Probability of Backtest Overfitting) — combinatorially purged cross-validation; a candidate that ranks well in-sample but poorly out-of-sample fails.

3. Monte Carlo permutation test — block-resampled returns must produce a null distribution that does not contain the observed Sharpe.

4. GARCH residual test — Sharpe must survive conditional-vol-aware residual analysis.

5. HMM regime null — Sharpe must not be an artifact of one regime in a hidden-Markov decomposition of the price series.

6. Walk-forward — multi-fold expanding-window Sharpe must be uniformly positive across folds with a minimum per-fold threshold.

7. Holdout replication — final held-out year (~25% of data) must replicate the Sharpe seen in the gauntlet folds.

8. DSR at K=3 — repeat (1) with the candidate's strongest three sub-strategies treated as a multiple-testing family.


Kills come from any single gate failure. The current Tier 1 admits (fv-fast 30m, ER 1h, RT 1h) pass all eight on the post-cutover frozen substrate. The state-encoded Tier 2 admits (vol_regime, cot_extremity) pass the relevant subset (4-gate for features). The April-era regime conditioning that produced the 30–40% Sharpe claims fails Gate 2 (PBO) and Gate 7 (holdout) under the current substrate and is no longer live.


Per-system attribution


The execution layer treats each Tier 1 strategy as an independent system with its own attribution. Every order placed through the central router carries an orderRef='{system}:{signal_id}' tag. The reconciler reads order status via a per-permId order registry — not IB's per-pair net position — so cross-system overlap on the same pair (which previously caused a zombie-position incident) is impossible by construction.


This architecture, shipped 2026-05-15, is what allowed FV-Fast to be re-armed without re-arming ER and RT. Under the old per-pair netting, they had to move as a block; under per-system attribution, each strategy graduates from shadow to live on its own clock.


What the dashboards show


Three pieces of this stack are externally visible:



The live Tier 1 dashboards (FV-Fast, Elasticity-Reversal, Range Trading) and the Tier 1 paper shadow are separate; they show what the strategies are actually doing, not what the agent ensemble believes.


Open work



Conclusion


The system is not a single Bayesian learner over 23 agents anymore. It is a tiered architecture in which 72 agents inform features, three Tier 1 strategies trade their own signals, and a Tier 3 meta-labeler tries — and so far fails — to beat the simplest possible weighting of the three sleeves. The deflation doctrine that decides what gets admitted is the load-bearing piece; without it, every one of the killed agents in the recent history would still be quietly siphoning capital.


EricL Analytics — updated May 2026