AURA：意圖導向探測以揭示情境化LLM代理中的隱性需求

摘要

像「林偉在哪裡？」這類情境式查詢，所編碼的資訊往往超越其字面內容：使用者可能也想了解林偉是否有空、心情好不好，或者現在是否適合打擾他。標準的工具使用型代理只會回答字面問題便停止。AURA 在場景感知與工具使用之間插入一個推理步驟，生成一個「意圖框架」（IntentFrame）：這是一個對隱含需求的結構化估計，並附有標量差距分數（gap score），用以控制每次查詢的探測預算與工具選擇。在一個包含 100 項查詢、四個場景的隱含意圖基準測試中，AURA 在隱含需求覆蓋率上優於 ReAct 風格的探測方式（差異 Δ = +0.07，p < 10⁻⁶）；四個場景中有三個達個別顯著水準，此提升在另一個骨幹模型上獲得複現，且提示消融實驗顯示，這項提升來自差距校準（gap calibration），而非答案記憶。在事實查詢方面，控制器以犧牲原始準確率為代價，減少了 82% 的探測次數，並在隱私敏感區塊上實現零次違規工具使用；適用範圍詳見＜限制＞部分。程式碼、模擬器及基準測試已於 https://github.com/innovation64/AURA 釋出。

English

A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AURA inserts an inference step between scene perception and tool use that produces an IntentFrame: a structured estimate of the implicit need with a scalar gap score that controls per-query probe budget and tool selection. On a 100-query four-scene implicit-intent benchmark, AURA improves implicit-need coverage over ReAct-style probing (Delta = +0.07, p < 10^-6); three of four scenes are individually significant, the gain reproduces on a second backbone, and a prompt ablation attributes the lift to gap calibration rather than answer memorisation. On factual lookup the controller trades raw accuracy for 82% fewer probes and zero forbidden-tool violations on a privacy-sensitive slice; scope conditions are detailed in Limitations. Code, simulator, and benchmark are released at https://github.com/innovation64/AURA.