복잡성 흡수: 금융 LLM 에이전트를 위한 상호작용-네이티브 지식 활용 프레임워크

초록

금융 AI 에이전트는 종종 단순한 이유로 실패한다. 즉, 사용자가 복잡성을 떠안도록 만든다는 점이다. 사용자는 목표, 위험 선호도, 포트폴리오 맥락, 과거 판단, 그리고 변화하는 시장 가정을 반복적으로 진술해야 하는 반면, 에이전트는 응답하고, 검색하고, 행동하고, 잊어버린다. 금융에서 이는 단순히 불편한 정도가 아니다. 시장 분석, 카피 트레이딩 검토, 거래 준비와 같은 작업에서 잊혀진 맥락과 오래된 메모리는 지연, 반복 오류, 취약한 감사 가능성, 그리고 안전하지 않은 결정을 초래할 수 있다. 이에 우리는 상호작용-네이티브 지식 하네스(InKH)를 제안한다. 이는 시스템 내로 복잡성을 흡수하는 금융 LLM 에이전트를 위한 아키텍처이다. InKH는 사용자, 시장, 포트폴리오, 그리고 도구 이벤트를 구조화된 운영 지식으로 변환한다. 수동적 지식 주입(passive knowledge injection)을 사용하여 주 모델 단계 전에 제한된 작업 맥락 버퍼(bounded working context buffer)를 조립하고, 저지연 검색을 위한 시간 그래프 메모리(temporal graph memory), 인간이 읽을 수 있는 거버넌스를 위한 위키 감사 표면(wiki audit surface), 그리고 성숙도, 소멸, 기록 시간 무효화(write-time invalidation)를 갖춘 배경 추출(background extraction)을 활용한다. 우리는 InKH를 24개의 무작위 시드, 4라운드, 라운드당 80개의 에피소드, 그리고 6개의 기준선(baseline)으로 구성된 재현 가능한 통제된 합성 벤치마크에서 평가하여 46,080개의 기준선 조건 평가(baseline-conditioned evaluation)를 생성했다. InKH는 900ms 지연 시간에서 평균 작업 품질 0.815를 달성했다. 에이전트 주도 위키 워크 메모리와 비교하여 지연 시간을 82.95%, 토큰 비용을 82.29%, 오래된 지식 사용률을 96.58% 줄였으며, 품질은 0.108, 추적 가능성은 0.461 향상시켰다. 무효화가 없는 시간 그래프 시스템과 비교하여 품질을 0.050 개선하고 오래된 메모리 사용률을 96.58% 줄였으며, 서비스 비용은 비슷한 수준을 유지했다. 이 결과는 금융 AI에 대한 설계 논제를 뒷받침한다. 즉, 복잡성이 사용자에게 전가되지 않고 시스템에 흡수될 때 채택이 이루어진다는 것이다. 이 벤치마크는 아키텍처 수준의 행동을 검증한 것이며, 실시간 거래 성능을 검증한 것은 아니다.

English

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just inconvenient. In tasks such as market analysis, copy-trading review, and trade preparation, forgotten context and stale memory can create latency, repeated errors, weak auditability, and unsafe decisions. We propose the interaction-native knowledge harness (InKH), an architecture for financial LLM agents that absorbs complexity into the system. InKH converts user, market, portfolio, and tool events into structured operational knowledge. It uses passive knowledge injection to assemble a bounded working context buffer before the main model step, temporal graph memory for low-latency retrieval, a wiki audit surface for human-readable governance, and background extraction with maturity, decay, and write-time invalidation. We evaluate InKH on a reproducible controlled synthetic benchmark with 24 random seeds, 4 rounds, 80 episodes per round, and 6 baselines, producing 46,080 baseline-conditioned evaluations. InKH achieves mean task quality of 0.815 at 900 ms latency. Compared with agent-driven wiki-walk memory, it reduces latency by 82.95 percent, token cost by 82.29 percent, and stale-knowledge usage by 96.58 percent, while improving quality by 0.108 and traceability by 0.461. Compared with a temporal-graph system without invalidation, it improves quality by 0.050 and reduces stale-memory usage by 96.58 percent with comparable serving cost. The results support a design thesis for financial AI: adoption happens when complexity is absorbed by the system rather than transferred to the user. The benchmark validates architecture-level behavior, not live trading performance.