TART: 작업에 구애받지 않는 추론을 위한 플러그 앤 플레이 트랜스포머 모듈

초록

대규모 언어 모델(LLMs)은 특정 작업에 대한 별도의 훈련 없이도 여러 작업을 수행할 수 있는 문맥 학습 능력을 보여줍니다. 이와 대조적으로, 전통적인 적응 방식인 미세 조정(fine-tuning)은 각 작업에 맞게 기본 모델을 수정합니다. 그러나 문맥 학습은 동일한 예제가 주어졌을 때에도 작업별 조정 방식에 비해 일관적으로 낮은 성능을 보입니다. 기존의 대부분의 접근 방식(예: 프롬프트 엔지니어링)은 이러한 성능 격차를 메우기 위해 LLM의 학습된 표현에 초점을 맞추지만, 우리의 분석은 LLM 표현이 좋은 예측을 하기에 충분한 정보를 포함하고 있음을 보여줍니다. 따라서 우리는 LLM의 추론 능력에 주목하고, 이 성능 격차가 단순한 확률적 추론 작업을 수행하지 못하는 데서 비롯된다는 것을 입증합니다. 이는 흥미로운 질문을 제기합니다: LLM은 실제로 작업에 구애받지 않는 방식으로 추론하는 법을 배울 수 있는가? 우리는 이에 대해 긍정적으로 답하며, 합성적으로 훈련된 Transformer 기반 추론 모듈을 사용하여 LLM의 추론 능력을 일반적으로 향상시키는 TART를 제안합니다. TART는 이 추론 모듈을 합성 로지스틱 회귀 작업만을 사용하여 작업에 구애받지 않는 방식으로 훈련하고, 추가적인 훈련 없이도 임의의 실세계 사전 훈련 모델과 결합합니다. 단일 추론 모듈로 TART는 다양한 모델 패밀리(GPT-Neo, Pythia, BLOOM), 모델 크기(100M - 6B), 작업(14개의 NLP 이진 분류 작업), 심지어 다른 모달리티(오디오 및 비전)에 걸쳐 성능을 향상시킵니다. 또한, RAFT 벤치마크에서 TART는 GPT-Neo(125M)의 성능을 BLOOM(176B)을 능가하고 GPT-3(175B)의 4% 이내로 향상시킵니다. 우리의 코드와 모델은 https://github.com/HazyResearch/TART에서 확인할 수 있습니다.

English

Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same examples. While most existing approaches (e.g., prompt engineering) focus on the LLM's learned representations to patch this performance gap, our analysis actually reveal that LLM representations contain sufficient information to make good predictions. As such, we focus on the LLM's reasoning abilities and demonstrate that this performance gap exists due to their inability to perform simple probabilistic reasoning tasks. This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and propose TART which generically improves an LLM's reasoning abilities using a synthetically trained Transformer-based reasoning module. TART trains this reasoning module in a task-agnostic manner using only synthetic logistic regression tasks and composes it with an arbitrary real-world pre-trained model without any additional training. With a single inference module, TART improves performance across different model families (GPT-Neo, Pythia, BLOOM), model sizes (100M - 6B), tasks (14 NLP binary classification tasks), and even across different modalities (audio and vision). Additionally, on the RAFT Benchmark, TART improves GPT-Neo (125M)'s performance such that it outperforms BLOOM (176B), and is within 4% of GPT-3 (175B). Our code and models are available at https://github.com/HazyResearch/TART .

TART: 작업에 구애받지 않는 추론을 위한 플러그 앤 플레이 트랜스포머 모듈

TART: A plug-and-play Transformer module for task-agnostic reasoning

초록

Support