ChatPaper.aiChatPaper

系統2注意力(也許是你需要的東西)

System 2 Attention (is something you might need too)

November 20, 2023
作者: Jason Weston, Sainbayar Sukhbaatar
cs.AI

摘要

基於Transformer的大型語言模型(LLMs)中的軟性注意力容易將上下文中的無關信息納入其潛在表示中,進而對下一個標記的生成產生不利影響。為了幫助糾正這些問題,我們引入了系統2注意力(S2A),利用LLMs在自然語言中進行推理和按照指示操作的能力,以決定要關注的內容。S2A重新生成輸入上下文,僅包含相關部分,然後關注重新生成的上下文以引出最終回應。在實驗中,S2A在包含意見或無關信息、問答、數學文字問題和長文生成等三個任務上優於基於標準注意力的LLMs,其中S2A提高了事實性和客觀性,降低了諂媚行為。
English
Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.
PDF432December 15, 2024