보물 찾기: 훈련 시점 마커를 활용한 롱테일 실시간 타겟팅

초록

현대 머신러닝의 가장 심오한 과제 중 하나는 드물고 소외된 특징들로 이루어진 롱테일(long-tail)에서도 우수한 성능을 발휘하는 것입니다. 대규모 범용 모델은 다양한 작업을 위해 훈련되지만, 고빈도 사용 사례에서 가장 잘 작동합니다. 훈련 후에는 훈련 코퍼스에서 소외된 특정 사용 사례에 대해 모델을 적응시키는 것이 어렵습니다. 특정 테스트 케이스에서 출력 품질을 극대화하기 위해 프롬프트 엔지니어링이나 소수 샘플(few-shot) 예제에 의존하는 것은 모델이 작은 변화에 매우 민감하거나 예측 불가능한 방식으로 반응하거나 성능 유지를 위해 고정된 시스템 프롬프트에 의존할 수 있어 실망스러울 수 있습니다. 본 연구에서는 "추론 시점에서 소외된 사용 사례에 대한 제어성과 성능을 모두 개선하기 위해 훈련 프로토콜을 최적화할 수 있는가?"라는 질문을 던집니다. 우리는 훈련과 추론 기법 간의 경계를 재검토하여 롱테일 성능을 개선함과 동시에 사용자에게 모델이 반응하도록 훈련된 일련의 제어 레버를 제공합니다. 우리는 데이터 특성과 작업 출처에 대한 상세한 분류 체계를 만들어 생성 속성을 명시적으로 제어하고 추론 시점에서 생성물을 암묵적으로 조건화합니다. 기본 모델을 미세 조정하여 이러한 마커를 자동으로 추론하도록 하여, 추론 시점에서 이를 선택적으로 사용할 수 있게 합니다. 이 원칙적이고 유연한 접근 방식은 특히 훈련 분포의 롱테일에 속하는 예제에서 성능이 현저히 개선되는 결과를 가져옵니다. 우리의 마커를 사용하여 개방형 생성 품질에서 평균 5.7%의 승률 상승을 관찰한 반면, 소외된 도메인에서는 9.1% 이상의 성능 향상을 보였습니다. 또한 CodeRepair와 같은 소외된 작업에서는 최대 14.1%의 상대적 상승을, 길이 지시 따르기 평가에서는 35.3%의 절대적 개선을 관찰했습니다.

English

One of the most profound challenges of modern machine learning is performing well on the long-tail of rare and underrepresented features. Large general-purpose models are trained for many tasks, but work best on high-frequency use cases. After training, it is hard to adapt a model to perform well on specific use cases underrepresented in the training corpus. Relying on prompt engineering or few-shot examples to maximize the output quality on a particular test case can be frustrating, as models can be highly sensitive to small changes, react in unpredicted ways or rely on a fixed system prompt for maintaining performance. In this work, we ask: "Can we optimize our training protocols to both improve controllability and performance on underrepresented use cases at inference time?" We revisit the divide between training and inference techniques to improve long-tail performance while providing users with a set of control levers the model is trained to be responsive to. We create a detailed taxonomy of data characteristics and task provenance to explicitly control generation attributes and implicitly condition generations at inference time. We fine-tune a base model to infer these markers automatically, which makes them optional at inference time. This principled and flexible approach yields pronounced improvements in performance, especially on examples from the long tail of the training distribution. While we observe an average lift of 5.7% win rates in open-ended generation quality with our markers, we see over 9.1% gains in underrepresented domains. We also observe relative lifts of up to 14.1% on underrepresented tasks like CodeRepair and absolute improvements of 35.3% on length instruction following evaluations.

보물 찾기: 훈련 시점 마커를 활용한 롱테일 실시간 타겟팅

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

초록

Support