x-forecast · paper portfolio
Research · Beta · vN.3 · offline · human-gated

AI Agent 研究管线

一组单一职责的 AI agent 如何把当前宏观 regime 一步步变成一份可复核的策略提案 —— 全程留痕、离线运行、并在「人工把关」处强制停下。

这是组合背后的离线研究管线(vN.3)—— 不是自动交易器。它复用了经过测试的回测引擎(vN.1)与有界搜索(vN.2),再叠加一个「红队批判 + 人工把关」。每一次 provider 调用都写进审计日志;管线只写 research/proposals/,绝不碰 data/ 里的实盘账本。

管线流程

把它当「编排器-工人(orchestrator-workers)」系统来读:一个控制平面(Orchestrator)从左到右驱动数据平面里一组单一职责 agent,每次调用都留痕,而唯一的决策点是人工把关 —— 机器自己永不写入实盘账本。

%%{init: {"theme":"base","themeVariables":{"fontFamily":"ui-sans-serif, system-ui, -apple-system, Segoe UI, sans-serif","fontSize":"14px","background":"#faf9f5","lineColor":"#8a8576","primaryTextColor":"#141413"},"flowchart":{"htmlLabels":true,"nodeSpacing":50,"rankSpacing":58,"padding":16,"useMaxWidth":true,"curve":"basis"}}}%% flowchart TB IN(["data/regime.yaml + price panel<br/>offline inputs"]):::store subgraph CTRL["control plane"] ORCH["Orchestrator · the conductor<br/>drives every call<br/>logs audit · hashes proposal_id"]:::det end subgraph DP["single-purpose agents · data plane (offline)"] direction LR A1["RegimeAgent<br/>pure read → views<br/>no LLM · no RNG"]:::det A2["Signals<br/>factor z-scores"]:::det A3["HypothesisAgent<br/>falsifiable thesis<br/>LLM-optional"]:::llm A4["Generator + Search<br/>bounded OOS search"]:::det A5["CriticAgent · red team<br/>DSR gate + stress re-sim<br/>produces accept flag"]:::redteam A6["CuratorAgent<br/>base / bull / bear drafts"]:::det A1 --> A2 --> A3 --> A4 --> A5 --> A6 end ART[("research/proposals/ID/<br/>5 artifacts + audit.jsonl<br/>never writes data/")]:::store GATE{"HUMAN GATE<br/>reviews drafts + verdict"}:::human LIVE(["human manually copies<br/>weights → data/ live book"]):::term ARCH(["archived · zero deploy"]):::term IN --> A1 A6 --> ART --> GATE GATE -- approve --> LIVE GATE -- decline --> ARCH ORCH -. drives + logs .-> DP classDef det fill:#efede4,stroke:#a39e8f,color:#141413; classDef llm fill:#dbe8f4,stroke:#6a9bcc,color:#234a68,stroke-width:2px,stroke-dasharray:5 3; classDef redteam fill:#f6ddd0,stroke:#d97757,color:#8a3a1d,stroke-width:2px; classDef store fill:#ece8dc,stroke:#b0aea5,color:#57534b; classDef human fill:#e1e8d3,stroke:#788c5d,color:#3c4a28,stroke-width:2px; classDef term fill:#faf9f5,stroke:#cdc9bc,color:#57534b; style CTRL fill:#f4f2ea,stroke:#dcd8cb,color:#57534b; style DP fill:#f4f2ea,stroke:#dcd8cb,color:#57534b;
图 A · 编排拓扑。节点颜色标记每一步的信任级别(见下方图例);虚线箭头是控制+记日志,实线箭头是数据流。
确定性 — 规则引擎,无 LLM、无随机 LLM 可选 — 默认规则引擎;配 key 才接入 Claude 红队 — 想方设法否决的批判者 存储 / 工件 — 全程留痕,只写 research/proposals/ 人工把关 — 唯一决策点;机器永不自动接受

逐步拆解

  1. 1
    RegimeAgent
    纯读取 data/regime.yaml → 大类资产视图(无 LLM、无随机)。
  2. 2
    Signals · 信号
    从价格面板算真实跨资产因子 z-score(动量、防御)。
  3. 3
    HypothesisAgent
    regime + 信号 → 一份显式、可证伪的假设(方向 / 陈述 / 证伪条件)。
  4. 4
    Generator + 搜索
    从假设构造有界 vN.2 搜索空间,跑样本外 walk-forward 搜索。
  5. 5
    CriticAgent · 红队
    DSR 闸门(去通胀 Sharpe)+ 用 finalist 自己权重做真实资产冲击压力重模拟。
  6. 6
    Curator + 编排
    编译草案、写 proposal + 审计;交给「人」复核。

单一职责 agent

RegimeAgent

负责: 数字

把细粒度战术矩阵(OW=+1/N=0/UW=−1)聚合成大类资产得分。完全确定性,不产出任何文字。

HypothesisAgent

负责: 假设

把 regime 视图 + 真实信号 z-score 变成每个大类的显式方向、一段陈述、以及 4–6 条证伪条件。全程留痕。

Generator + 搜索

负责: 候选策略

从假设导出有界搜索空间(方向固定、幅度搜索),再按样本外目标对 trial 排名。

CriticAgent

负责: 红队

严格的 DSR 闸门(零/负证据直接否决)+ 真实压力重模拟:用 finalist 自己的权重算 r_group = Σ wᵢ·shockᵢ。

CuratorAgent

负责: 草案

编译 base/bull/bear 权重(各自和为 1、通过约束)与决策时点字段。复用假设的陈述 + 证伪条件。

Orchestrator

负责: 溯源

哈希出可复现的 proposal_id、回放审计日志、写 5 份工件、追加 leaderboard。只写 research/proposals/。

不变量 1 · 人工把关

管线只写 research/proposals/ —— 永不创建或修改 data/ 下任何东西。有测试断言一次运行前后 `git status data/` 完全不变。由人复核草案,接受后才手动把权重抄进实盘账本。

不变量 2 · 离线

不需要联网。默认 provider 是确定性的规则引擎;只有在配置了 key 时,Claude provider 内部才会导入 Anthropic SDK。CI 跑确定性路径,所以 proposal_id 可复现。

Worked example · 最新一份真实 proposal

proposal_id cc02400ae096 · provider rulebased · grid/seed · deterministic=true · code c2baf21 · data 4def167

regime → 假设

Regime quadrant Q4 (growth momentum -0.519, inflation momentum +0.275). Overweight tilt orientation: commodities, rates. Underweight tilt orientation: equities. Coarse-class views are aggregated from the fine tactical_matrix (OW=+1/N=0/UW=-1, mean per coarse class); the sign sets the search bound orientation, the magnitude is searched.

commodities: OWcredit: Nequities: UWrates: OW
生效的信号因子: defensive, momentum
入选策略
base_allocator
60_40
tilt_strength
0.5
OOS Sharpe
0.4870
样本
77 obs · 3 splits · 162 trials
红队裁决
Deflated Sharpe
0.0000 (SR0 2.70)
是否接受
false
压力测试口径
finalist_asset_shock_resim
压力 flag
inf2022
这一次,红队用 finalist 自己的权重重模拟了全部 5 个历史情景,flag 了 inf2022;严格 DSR 闸门给出 accept=false。这正是系统按设计运作 —— 在薄、单一 regime 的数据上,它本就应当拒绝盖章。草案仍停在人工把关处。

审计轨迹 · 编排器实际跑了什么

编排器为这份 proposal 驱动的每一次 provider 调用,按发生顺序回放自 audit.jsonl。model 为空,是因为这次跑的是确定性规则引擎 —— 只有配置了 key 时才会导入 Claude provider。RegimeAgent(纯读取)与 Orchestrator(只写工件)不发起 provider 调用,所以不在此列。

%%{init: {"theme":"base","themeVariables":{"fontFamily":"ui-sans-serif, system-ui, -apple-system, Segoe UI, sans-serif","fontSize":"16px","background":"#faf9f5","actorBkg":"#efede4","actorBorder":"#a39e8f","actorTextColor":"#141413","actorLineColor":"#c4c0b2","signalColor":"#8a8576","signalTextColor":"#3c382f","noteBkg":"#dbe8f4","noteBorderColor":"#6a9bcc","noteTextColor":"#234a68","activationBkgColor":"#f6ddd0","activationBorderColor":"#d97757","sequenceNumberColor":"#faf9f5","labelBoxBkgColor":"#e1e8d3","labelBoxBorderColor":"#788c5d","labelTextColor":"#3c4a28","loopTextColor":"#3c4a28"},"sequence":{"useMaxWidth":false,"actorMargin":60,"boxMargin":14,"noteMargin":12,"messageMargin":42,"mirrorActors":true}}}%% sequenceDiagram autonumber participant O as Orchestrator participant R as RegimeAgent participant S as Signals participant H as HypothesisAgent participant G as Generator+Search participant C as CriticAgent participant U as CuratorAgent participant L as audit.jsonl participant Hum as Human Note over O,L: offline · deterministic by default · provider=rulebased · model=none O->>R: read regime to coarse views R-->>O: views (no provider call, not logged) O->>S: compute factor z-scores S-->>O: momentum / defensive O->>H: state hypothesis + falsification H-->>L: log regime_summary, falsification H-->>O: thesis + 4 to 6 falsifiers O->>G: bounded OOS walk-forward search G-->>L: log search_space G-->>O: finalist params O->>C: critique (DSR + stress re-sim) C-->>L: log critique → accept=false, flag inf2022 C-->>O: verdict O->>U: compile drafts U-->>L: log rationale U-->>O: base / bull / bear drafts O->>O: hash proposal_id · write 5 artifacts O->>Hum: hand off drafts + verdict Note over Hum: machine NEVER auto-accepts alt human approves Hum->>Hum: manually copy weights → data/ else human declines Hum->>Hum: archive · zero deploy end
图 B · 同一次运行的时序 trace(可横向滚动看全 9 条泳道)。蓝色便签=运行模式;橙色激活条标出红队 critic;只有 4 个真正调用 provider 的 agent 写入 audit.jsonl;运行终止于人工把关(approve / decline)。

逐条调用日志

  1. 01
    hypothesis · regime_summary provider=rulebased model=—
    state the macro hypothesis from the regime view
  2. 02
    hypothesis · falsification provider=rulebased model=—
    falsification conditions for the hypothesis
  3. 03
    generator · search_space provider=rulebased model=—
    generate vN.2 search_spec for the current regime
  4. 04
    critic · critique provider=rulebased model=—
    critique the finalist from DSR + stress context
  5. 05
    curator · rationale provider=rulebased model=—
    decision rationale prose

诚实声明

  • 已提交的价格历史很薄、且单一 regime(~120 个交易日,一个 Q4 宏观 regime)。样本内没有 bull/bear 切换,所以 regime tilt 无法做跨 regime 验证。
  • 压力冲击是按情景 benchmark 估的区间量级,不是逐 ETF 的实测值 —— 仅作框架验证。
  • 这些 proposal 是示意性的、并不稳健。不要仅凭这点证据部署 —— 这正是一切都停在人工把关处的原因。
源码:research/agents/*(编排 + agent)、research/engine/*(vN.1 引擎 + 信号)、research/search/*(vN.2 搜索)。每份 proposal 产出 5 个工件 —— proposal.md、rebalance_draft.yaml、decision_draft.yaml、audit.jsonl、config.yaml —— 全部带可复现 proposal_id 提交到公开 repo。