ChatPaper.aiChatPaper

SIFT:透過貼紙將大型語言模型的推理能力根植於情境中

SIFT: Grounding LLM Reasoning in Contexts via Stickers

February 19, 2025
作者: Zihao Zeng, Xuyao Huang, Boxiu Li, Zhijie Deng
cs.AI

摘要

本文指出,在大型語言模型的推理過程中,對上下文語境的誤解可能成為一個顯著問題,這一現象從較小模型如Llama3.2-3B-Instruct到尖端模型如DeepSeek-R1均有體現。例如,在短語“每公斤10美元”中,LLMs可能未能識別“每”意指“每一”,從而導致計算錯誤。為此,我們引入了一種新穎的後訓練方法——**堅守事實(SIFT)**,以應對這一挑戰。SIFT利用增強的推理時計算能力,將LLM的推理過程錨定於上下文之中。SIFT的核心在於*標籤器*,它由模型自身生成,旨在明確強調上下文中的關鍵信息。基於精心設計的標籤器,SIFT會生成兩個預測結果——一個來自原始查詢,另一個則來自於結合了標籤器的查詢。若兩者存在差異,標籤器將通過*正向*優化(以更好地使提取的事實與查詢對齊)和*逆向*生成(以符合模型的內在傾向)進行序列化精煉,從而獲得更為忠實的推理結果。跨多種模型(從3B到100B+)和基準測試(如GSM8K、MATH-500)的研究均顯示出性能的持續提升。尤為突出的是,SIFT將DeepSeek-R1在AIME2024上的pass@1準確率從78.33%提升至**85.67%**,在開源社區中樹立了新的技術標杆。相關代碼已公開於https://github.com/zhijie-group/SIFT。
English
This paper identifies the misinterpretation of the context can be a significant issue during the reasoning process of large language models, spanning from smaller models like Llama3.2-3B-Instruct to cutting-edge ones like DeepSeek-R1. For example, in the phrase "10 dollars per kilo," LLMs might not recognize that "per" means "for each," leading to calculation errors. We introduce a novel, post-training approach called **Stick to the Facts (SIFT)** to tackle this. SIFT leverages increasing inference-time compute to ground LLM reasoning in contexts. At the core of SIFT lies the *Sticker*, which is generated by the model itself to explicitly emphasize the key information within the context. Given the curated Sticker, SIFT generates two predictions -- one from the original query and one from the query augmented with the Sticker. If they differ, the Sticker is sequentially refined via *forward* optimization (to better align the extracted facts with the query) and *inverse* generation (to conform with the model's inherent tendencies) for more faithful reasoning outcomes. Studies across diverse models (from 3B to 100B+) and benchmarks (e.g., GSM8K, MATH-500) reveal consistent performance improvements. Notably, SIFT improves the pass@1 accuracy of DeepSeek-R1 on AIME2024 from 78.33% to **85.67**%, establishing a new state-of-the-art in the open-source community. The code is available at https://github.com/zhijie-group/SIFT.

Summary

AI-Generated Summary

PDF313February 24, 2025