AURA：情境化大语言模型智能体中面向隐性需求揭示的意图导向探测

摘要

像“林伟在哪”这类情境化查询，其含义往往超出字面内容：用户可能还想知道林伟是否有空、心情如何，或者现在是否值得打扰他。标准的工具使用智能体只会回答字面问题并就此停止。AURA在场景感知与工具使用之间插入了一个推理步骤，生成一个意图框架（IntentFrame），该框架包含对隐含需求的结构化估计，以及一个标量缺口分数（scalar gap score），用于控制每条查询的探测预算和工具选择。在一个包含100条查询、四个场景的隐含意图基准测试中，AURA在隐含需求覆盖率上优于ReAct风格的探测方法（Delta = +0.07, p < 10^-6）；其中三个场景单独表现显著，该提升在第二个骨干模型上得到复现，且提示消融实验表明，这一增益来自缺口校准而非答案记忆。在事实查询任务中，控制器在隐私敏感数据切片上以牺牲原始准确性为代价，实现了减少82%的探测次数以及零违禁工具违规。适用范围详见局限性部分。代码、模拟器和基准测试已发布在 https://github.com/innovation64/AURA。

English

A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AURA inserts an inference step between scene perception and tool use that produces an IntentFrame: a structured estimate of the implicit need with a scalar gap score that controls per-query probe budget and tool selection. On a 100-query four-scene implicit-intent benchmark, AURA improves implicit-need coverage over ReAct-style probing (Delta = +0.07, p < 10^-6); three of four scenes are individually significant, the gain reproduces on a second backbone, and a prompt ablation attributes the lift to gap calibration rather than answer memorisation. On factual lookup the controller trades raw accuracy for 82% fewer probes and zero forbidden-tool violations on a privacy-sensitive slice; scope conditions are detailed in Limitations. Code, simulator, and benchmark are released at https://github.com/innovation64/AURA.