Agent Pipeline · 研究编排
How a small DAG of single-purpose agents turns the current macro regime into a reviewable strategy proposal — fully audited, offline, and stopped at a human gate.
This is the offline research pipeline (vN.3) behind the portfolio — NOT an auto-trader. It reuses the tested backtest engine (vN.1) and bounded search (vN.2), then layers a red-team critic and a human gate on top. Every provider call is logged to an audit trail; the pipeline writes ONLY under research/proposals/ and never touches the live book in data/.
The pipeline
Read it like an orchestrator-workers system: a control plane (the Orchestrator) drives a data plane of single-purpose agents left to right, every call is logged, and the only decision point is a human gate — the machine never writes to the live book itself.
Step by step
- 1RegimeAgentPURE read of data/regime.yaml → coarse-class views (no LLM, no RNG).
- 2SignalsReal cross-asset factor z-scores (momentum, defensive) from the price panel.
- 3HypothesisAgentRegime + signals → an explicit, falsifiable thesis (orientation, statement, falsification).
- 4Generator + SearchBuild a bounded vN.2 search space from the thesis, run the OOS walk-forward search.
- 5CriticAgentDSR gate (deflated Sharpe) + REAL asset-shock stress re-sim against the finalist.
- 6Curator + OrchestratorCompile drafts, write proposal + audit; hand off to a HUMAN.
Single-purpose agents
RegimeAgent
owns: the numbersAggregates the fine tactical matrix (OW=+1/N=0/UW=−1) up to coarse-class scores. Deterministic — owns no prose.
HypothesisAgent
owns: the thesisTurns the regime view + real signal z-scores into an explicit orientation per class, a statement, and 4–6 falsification conditions. Audited.
Generator + Search
owns: the candidatesDerives a bounded search space from the thesis (sign fixed, magnitude searched), then ranks trials by an out-of-sample objective.
CriticAgent
owns: the red teamA strict DSR gate (null/negative evidence is rejected) plus a real stress re-sim: r_group = Σ wᵢ·shockᵢ on the finalist’s own weights.
CuratorAgent
owns: the draftsCompiles base/bull/bear weights (each sums to 1, passes the constraints) and decision-time fields. Reuses the hypothesis’ statement + falsification.
Orchestrator
owns: provenanceHashes a reproducible proposal_id, replays the audit log, writes the 5 artifacts, appends the leaderboard. Writes ONLY under research/proposals/.
The pipeline writes ONLY under research/proposals/ — it never creates or edits anything under data/. A test asserts `git status data/` is unchanged across a run. A human reviews the drafts and, if accepted, manually copies weights into the live book.
No network is required. The default provider is deterministic and rule-based; the Anthropic SDK is imported only inside the Claude provider when a key is present. CI runs the deterministic path, so a proposal_id is reproducible.
Worked example (latest proposal)
proposal_id cc02400ae096 · provider rulebased · grid/seed · deterministic=true · code c2baf21 · data 4def167
Regime quadrant Q4 (growth momentum -0.519, inflation momentum +0.275). Overweight tilt orientation: commodities, rates. Underweight tilt orientation: equities. Coarse-class views are aggregated from the fine tactical_matrix (OW=+1/N=0/UW=-1, mean per coarse class); the sign sets the search bound orientation, the magnitude is searched.
- base_allocator
- 60_40
- tilt_strength
- 0.5
- OOS Sharpe
- 0.4870
- on
- 77 obs · 3 splits · 162 trials
- Deflated Sharpe
- 0.0000 (SR0 2.70)
- accept
- false
- stress basis
- finalist_asset_shock_resim
- stress flags
- inf2022
Audit trail · what the orchestrator actually ran
Every provider call the orchestrator drove for this proposal, in order, replayed from audit.jsonl. model is empty because this run used the deterministic rule-based provider — the Claude provider is only imported when a key is configured. RegimeAgent (a pure read) and the Orchestrator (writes artifacts only) make no provider call, so they do not appear here.
The logged calls
- 01hypothesis · regime_summary provider=rulebased model=—state the macro hypothesis from the regime view
- 02hypothesis · falsification provider=rulebased model=—falsification conditions for the hypothesis
- 03generator · search_space provider=rulebased model=—generate vN.2 search_spec for the current regime
- 04critic · critique provider=rulebased model=—critique the finalist from DSR + stress context
- 05curator · rationale provider=rulebased model=—decision rationale prose
Honesty
- The committed price history is thin and single-regime (~120 trading days, one Q4 macro regime). There is no bull/bear transition in-sample, so the regime tilt cannot be validated out-of-regime.
- Stress shocks are window-magnitude estimates (from each scenario’s benchmark line), not per-ETF actuals — framework validation only.
- These proposals are illustrative, NOT robust. Do not deploy on this evidence alone — which is exactly why everything stops at a human gate.