複雑性の吸収：金融LLMエージェントのための対話ネイティブ知識ハーネス

要旨

金融AIエージェントは、しばしば単純な理由で失敗する。それは、複雑性をユーザーに負わせているからである。ユーザーは、目標、リスク選好、ポートフォリオのコンテキスト、過去の判断、変化する市場の前提を繰り返し再表明しなければならない一方で、エージェントは応答し、取得し、行動し、そして忘れてしまう。金融において、これは単なる不便さではない。市場分析、コピートレードのレビュー、取引準備といったタスクにおいて、忘れられたコンテキストや陳腐化したメモリは、レイテンシの発生、反復エラー、監査可能性の低下、安全性に欠ける意思決定を引き起こす可能性がある。本稿では、インタラクション・ネイティブ知識ハーネス（InKH）を提案する。これは、複雑性をシステム内部に吸収する金融LLMエージェント向けのアーキテクチャである。InKHは、ユーザー、市場、ポートフォリオ、ツールのイベントを構造化された運用知識に変換する。メインモデルステップの前に、受動的知識注入を用いて境界付き作業コンテキストバッファを構築し、低レイテンシ検索のための時間グラフメモリ、人間が読めるガバナンスのためのWiki監査面、そして成熟度、減衰、書き込み時無効化を伴うバックグラウンド抽出を実現する。我々は、24個のランダムシード、4ラウンド、ラウンドあたり80エピソード、6つのベースラインからなる、再現可能な制御された合成ベンチマーク上でInKHを評価し、46,080件のベースライン条件付き評価を生成した。InKHは、900ミリ秒のレイテンシで平均タスク品質0.815を達成した。エージェント駆動型Wikiウォークメモリと比較して、レイテンシを82.95%、トークンコストを82.29%、陳腐化知識の使用を96.58%削減し、品質を0.108、トレーサビリティを0.461向上させた。無効化機構を持たない時間グラフシステムと比較して、品質を0.050向上させ、陳腐化メモリの使用を96.58%削減し、サービスコストは同等であった。これらの結果は、金融AIにおける設計テーゼを裏付けている。すなわち、複雑性がユーザーに転嫁されるのではなくシステムに吸収されたときに普及が起こる、というものである。本ベンチマークは、アーキテクチャレベルの振る舞いを検証するものであり、実際の取引パフォーマンスを検証するものではない。

English

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just inconvenient. In tasks such as market analysis, copy-trading review, and trade preparation, forgotten context and stale memory can create latency, repeated errors, weak auditability, and unsafe decisions. We propose the interaction-native knowledge harness (InKH), an architecture for financial LLM agents that absorbs complexity into the system. InKH converts user, market, portfolio, and tool events into structured operational knowledge. It uses passive knowledge injection to assemble a bounded working context buffer before the main model step, temporal graph memory for low-latency retrieval, a wiki audit surface for human-readable governance, and background extraction with maturity, decay, and write-time invalidation. We evaluate InKH on a reproducible controlled synthetic benchmark with 24 random seeds, 4 rounds, 80 episodes per round, and 6 baselines, producing 46,080 baseline-conditioned evaluations. InKH achieves mean task quality of 0.815 at 900 ms latency. Compared with agent-driven wiki-walk memory, it reduces latency by 82.95 percent, token cost by 82.29 percent, and stale-knowledge usage by 96.58 percent, while improving quality by 0.108 and traceability by 0.461. Compared with a temporal-graph system without invalidation, it improves quality by 0.050 and reduces stale-memory usage by 96.58 percent with comparable serving cost. The results support a design thesis for financial AI: adoption happens when complexity is absorbed by the system rather than transferred to the user. The benchmark validates architecture-level behavior, not live trading performance.