내성적 조언: 상황 내 의사결정을 위한 대형 언어 모델

초록

대규모 언어 모델(LLM)의 등장은 자연어 처리 분야에 상당한 영향을 미치며, 다양한 과제에서 탁월한 성과를 입증해 왔습니다. 본 연구에서는 LLM이 의사결정 과정을 스스로 최적화할 수 있도록 돕기 위해 '내성적 팁(Introspective Tips)'을 활용합니다. LLM은 궤적을 내성적으로 검토함으로써 간결하고 유용한 팁을 생성하여 자신의 정책을 개선합니다. 우리의 방법은 에이전트의 과거 경험에서 학습하고, 전문가의 시범을 통합하며, 다양한 게임 간 일반화를 고려함으로써 소수 샷(few-shot) 및 제로 샷(zero-shot) 학습 상황에서 에이전트의 성능을 향상시킵니다. 특히, 이러한 개선은 LLM의 파라미터를 미세 조정하지 않고도 이루어지며, 대신 프롬프트를 조정하여 앞서 언급한 세 가지 상황에서의 통찰력을 일반화합니다. 우리의 프레임워크는 LLM을 문맥 내 의사결정에 활용할 때의 이점을 강조하고 지원합니다. TextWorld에서 100개 이상의 게임을 대상으로 한 실험은 우리의 접근 방식이 우수한 성능을 보임을 입증합니다.

English

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks. In this study, we employ ``Introspective Tips" to facilitate LLMs in self-optimizing their decision-making. By introspectively examining trajectories, LLM refines its policy by generating succinct and valuable tips. Our method enhances the agent's performance in both few-shot and zero-shot learning situations by considering three essential scenarios: learning from the agent's past experiences, integrating expert demonstrations, and generalizing across diverse games. Importantly, we accomplish these improvements without fine-tuning the LLM parameters; rather, we adjust the prompt to generalize insights from the three aforementioned situations. Our framework not only supports but also emphasizes the advantage of employing LLM in in-contxt decision-making. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.

내성적 조언: 상황 내 의사결정을 위한 대형 언어 모델

Introspective Tips: Large Language Model for In-Context Decision Making

초록

Support