ChatPaper.aiChatPaper

原子指令鴻溝:指令微調的大型語言模型在處理簡單、自包含的指令時面臨挑戰

The Atomic Instruction Gap: Instruction-Tuned LLMs Struggle with Simple, Self-Contained Directives

October 20, 2025
作者: Henry Lim, Kwan Hui Lim
cs.AI

摘要

經過指令微調的大型語言模型(IT-LLMs)展現出強大的零樣本推理能力,然而它們執行簡單、自包含指令的能力仍未被充分探索,儘管這是複雜指令遵循的基礎。我們在修改後的MMLU和MMLU-Pro基準上評估了20個IT-LLMs,通過系統性地改變選項標籤的格式(字母、數字、羅馬數字)同時保持其意義不變,並在四種範式下進行分析:(1) 在有明確指令的情況下,標籤變化導致性能大幅波動(例如,羅馬數字與數字相比下降30.45%),揭示了指令格式偏見。(2) 在沒有指令的情況下,性能進一步下降(最多下降10.84%),且對標籤的敏感性加劇,強調了明確指導的重要性。(3) 當選項內容被移除時,模型無法超越隨機選擇基準,除非使用數字標籤,這表明對基本指令的遵循能力較弱。(4) 三樣本示例並未顯著提升模型的魯棒性或忠實度,生成分析顯示標籤錯誤持續存在,尤其是在非數字格式中。在不同模型規模下,更大的LLMs實現了更高的準確率,但在指令遵循上仍不一致。這些結果揭示了當前指令微調範式的不足,並強調了需要針對基本指令遵循的評估方法和訓練策略。
English
Instruction-tuned large language models (IT-LLMs) exhibit strong zero-shot reasoning, yet their ability to execute simple, self-contained instructions remains underexplored, despite this being foundational to complex instruction-following. We evaluate 20 IT-LLMs on modified MMLU and MMLU-Pro benchmarks, by systematically varying the format of option labels (alphabetic, numeric, Roman) while keeping their meaning identical under four paradigms, namely: (1) With explicit instructions, label changes cause large performance shifts (e.g., -30.45\% for Roman vs. numeric), revealing instruction-format bias. (2) Without instructions, performance drops further (up to -10.84\%) and label sensitivity intensifies, underscoring the role of explicit guidance. (3) When option contents are removed, models fail random-choice baselines except with numeric labels, suggesting weak adherence to atomic directives. (4) Three-shot exemplars yield no significant gains in robustness or fidelity, and generation analyses show persistent label errors, especially for non-numeric formats. Across model sizes, larger LLMs achieve higher accuracy but remain inconsistent in instruction adherence. These results expose the insufficiencies of current instruction-tuning paradigms and highlight the need for evaluation methods and training strategies that explicitly target atomic instruction-following.
PDF22October 23, 2025