반복적 그래프 정렬

초록

다양한 서술을 압축함으로써, LLM은 암기를 넘어서 일반화 가능한 인과 관계를 포착함으로써 지능을 달성합니다. 그러나 충분한 훈련 데이터 다양성 부족으로 인해 지역적 '표현 공백'에 시달리며, 특히 규칙에 엄격하게 정렬이 필요한 작업에서는 실제 세계에서의 유용성이 제한됩니다. 무거운 인간 주석에 의존하는 전통적 정렬 방법은 비효율적이고 확장이 어렵습니다. 최근의 자기 정렬 기술도 자주 자기 선택 기반 프롬프팅 및 암기 기반 학습에 의존하기 때문에 부족합니다. 이러한 문제를 해결하기 위해, 우리는 주석 없는 규칙 기반 정렬 알고리즘인 반복 그래프 정렬 (IGA)을 소개합니다. 교사 모델 (VLM)은 반복적 그래프 프롬프팅 (IGP)을 사용하여 논리적 그래프와 참조 답변을 생성합니다. 학생 모델 (LLM)은 이러한 참조와 자신의 응답을 정렬하려고 시도함으로써 지역적 지식 공백을 식별하고, 도우미 모델과 협력하여 다양한 답변을 생성합니다. 이러한 정렬된 응답은 반복적 지도 학습 미세 조정 (SFT)에 사용됩니다. 다섯 가지 규칙 기반 시나리오에서의 평가는 IGP의 효과를 입증하며, Claude Sonnet 3.5에서 73.12\%의 정렬 향상을 보여주고, Llama3-8B-Instruct는 86.20\%의 향상을 달성하여 규칙 기반 정렬에서 Claude Sonnet 3.5를 능가합니다.

English

By compressing diverse narratives, LLMs go beyond memorization, achieving intelligence by capturing generalizable causal relationships. However, they suffer from local 'representation gaps' due to insufficient training data diversity, limiting their real-world utility, especially in tasks requiring strict alignment to rules. Traditional alignment methods relying on heavy human annotations are inefficient and unscalable. Recent self-alignment techniques also fall short, as they often depend on self-selection based prompting and memorization-based learning. To address these issues, we introduce Iterative Graph Alignment (IGA), an annotation-free rule-based alignment algorithm. A teacher model (VLM) employs Iterative Graph Prompting (IGP) to create logical graphs and reference answers. The student model (LLM) identifies local knowledge gaps by attempting to align its responses with these references, collaborating with helper models to generate diverse answers. These aligned responses are then used for iterative supervised fine-tuning (SFT). Our evaluations across five rule-based scenarios demonstrate IGP's effectiveness, with a 73.12\% alignment improvement in Claude Sonnet 3.5, and Llama3-8B-Instruct achieving an 86.20\% improvement, outperforming Claude Sonnet 3.5 in rule-based alignment.