吸收複雜性:金融LLM代理之互動原生知識駕馭機制
Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents
June 1, 2026
作者: Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita, Dmytro Kyrylenko, Sofiia Pidturkina, Julia Stadnyk
cs.AI
摘要
金融AI代理常因一個簡單的原因而失敗:它們讓用戶承擔了複雜性。用戶必須反覆陳述目標、風險偏好、投資組合背景、過往判斷以及不斷變化的市場假設,而代理則進行回答、檢索、操作,然後遺忘。在金融領域,這不僅僅是不方便。在市場分析、跟單交易審查及交易準備等任務中,被遺忘的背景脈絡與過時的記憶可能導致延遲、重複錯誤、審計追蹤薄弱,以及不安全的決策。
我們提出「交互原生知識套件」(Interaction-Native Knowledge Harness,InKH),這是一種專為金融大型語言模型代理設計的架構,能將複雜性吸收至系統內部。InKH 將用戶事件、市場事件、投資組合事件及工具事件轉化為結構化的操作知識。它在主要模型步驟之前,利用被動知識注入來組裝一個有界的運作背景緩衝區;採用低延遲檢索的時間圖記憶;提供可讀性強的維基審計界面以實現治理;並具備背景提取機制,包含成熟度、衰減度及寫入時失效判定。
我們在一個可複現、受控的合成基準測試上對 InKH 進行評估:使用 24 個隨機種子、4 輪測試、每輪 80 個回合及 6 個基線,共產生 46,080 組基線條件評估結果。InKH 在 900 毫秒延遲下達到了 0.815 的平均任務品質。與代理驅動的維基漫步記憶相比,它將延遲降低了 82.95%,令牌成本降低了 82.29%,過時知識使用率降低了 96.58%,同時品質提升了 0.108,可追溯性提升了 0.461。與未含失效機制的時間圖系統相比,它在服務成本相當的情況下,品質提升了 0.050,過時記憶使用率降低了 96.58%。
這些結果支持一個金融人工智慧的設計論點:當複雜性被系統吸收而非轉嫁給用戶時,採用率才會提升。該基準測試驗證的是架構層級的行為,而非即時交易績效。
English
Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just inconvenient. In tasks such as market analysis, copy-trading review, and trade preparation, forgotten context and stale memory can create latency, repeated errors, weak auditability, and unsafe decisions.
We propose the interaction-native knowledge harness (InKH), an architecture for financial LLM agents that absorbs complexity into the system. InKH converts user, market, portfolio, and tool events into structured operational knowledge. It uses passive knowledge injection to assemble a bounded working context buffer before the main model step, temporal graph memory for low-latency retrieval, a wiki audit surface for human-readable governance, and background extraction with maturity, decay, and write-time invalidation.
We evaluate InKH on a reproducible controlled synthetic benchmark with 24 random seeds, 4 rounds, 80 episodes per round, and 6 baselines, producing 46,080 baseline-conditioned evaluations. InKH achieves mean task quality of 0.815 at 900 ms latency. Compared with agent-driven wiki-walk memory, it reduces latency by 82.95 percent, token cost by 82.29 percent, and stale-knowledge usage by 96.58 percent, while improving quality by 0.108 and traceability by 0.461. Compared with a temporal-graph system without invalidation, it improves quality by 0.050 and reduces stale-memory usage by 96.58 percent with comparable serving cost.
The results support a design thesis for financial AI: adoption happens when complexity is absorbed by the system rather than transferred to the user. The benchmark validates architecture-level behavior, not live trading performance.