吸纳复杂性：面向金融大语言模型智能体的交互原生知识驾驭框架

摘要

金融AI智能体常因一个简单原因而失败：它们让用户承担复杂性。用户必须反复陈述目标、风险偏好、投资组合情境、过往判断及不断变化的市场假设，而智能体则执行应答、检索、操作并遗忘。在金融领域，这不仅是便利性问题——在市场分析、跟单交易审查及交易准备等任务中，被遗忘的情境和过时的记忆会引发延迟、重复错误、审计能力薄弱及不安全决策。我们提出交互原生知识整合框架（InKH），这是一种金融LLM智能体架构，将复杂性吸收至系统内部。InKH将用户、市场、投资组合及工具事件转化为结构化运行知识。它采用被动知识注入，在主模型步骤前构建有界工作上下文缓冲区；运用时序图记忆实现低延迟检索；构建维基审计界面支持人类可读治理；并实现背景提取机制，具备成熟度、衰减及写入时失效特性。我们在可复现的受控合成基准上对InKH进行评估，采用24个随机种子、4轮次、每轮80个回合及6个基线，生成46,080个基线条件评估。InKH在900毫秒延迟下达到0.815的平均任务质量。与智能体驱动的维基漫步记忆相比，延迟降低82.95%，令牌成本降低82.29%，过时知识使用率降低96.58%，同时质量提升0.108，可追溯性提升0.461。与无失效机制的时序图系统相比，质量提升0.050，过时记忆使用率降低96.58%，服务成本相当。研究结果支撑金融AI的设计核心理念：当复杂性被系统而非用户承担时，采用率才会提升。该基准验证的是架构层级的行为表现，而非实盘交易性能。

English

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just inconvenient. In tasks such as market analysis, copy-trading review, and trade preparation, forgotten context and stale memory can create latency, repeated errors, weak auditability, and unsafe decisions. We propose the interaction-native knowledge harness (InKH), an architecture for financial LLM agents that absorbs complexity into the system. InKH converts user, market, portfolio, and tool events into structured operational knowledge. It uses passive knowledge injection to assemble a bounded working context buffer before the main model step, temporal graph memory for low-latency retrieval, a wiki audit surface for human-readable governance, and background extraction with maturity, decay, and write-time invalidation. We evaluate InKH on a reproducible controlled synthetic benchmark with 24 random seeds, 4 rounds, 80 episodes per round, and 6 baselines, producing 46,080 baseline-conditioned evaluations. InKH achieves mean task quality of 0.815 at 900 ms latency. Compared with agent-driven wiki-walk memory, it reduces latency by 82.95 percent, token cost by 82.29 percent, and stale-knowledge usage by 96.58 percent, while improving quality by 0.108 and traceability by 0.461. Compared with a temporal-graph system without invalidation, it improves quality by 0.050 and reduces stale-memory usage by 96.58 percent with comparable serving cost. The results support a design thesis for financial AI: adoption happens when complexity is absorbed by the system rather than transferred to the user. The benchmark validates architecture-level behavior, not live trading performance.