OmniPred: 범용 회귀 분석기로서의 언어 모델

초록

광범위한 실험 설계 영역에서 회귀 분석은 주어진 매개변수 집합을 바탕으로 시스템 또는 모델의 결과 지표를 정확하게 예측하는 강력한 도구로 사용되어 왔지만, 전통적으로 특정 작업에만 적용 가능한 방법들로 제한되어 왔습니다. 본 논문에서는 다양한 실제 실험에서 얻은 (x, y) 평가 데이터에 대해 언어 모델을 범용적인 종단 간 회귀 분석기로 훈련시키는 OmniPred 프레임워크를 제안합니다. 세계 최대의 블랙박스 최적화 데이터베이스 중 하나인 Google Vizier에서 수집한 데이터를 사용한 광범위한 실험을 통해, 수학적 매개변수와 값을 텍스트로만 표현하더라도 언어 모델이 매우 정밀한 수치 회귀 분석을 수행할 수 있으며, 여러 작업에 걸쳐 훈련할 기회가 주어진다면 전통적인 회귀 모델을 크게 능가할 수 있음을 입증했습니다.

English

Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over (x,y) evaluation data from diverse real world experiments. Using data sourced from Google Vizier, one of the largest blackbox optimization databases in the world, our extensive experiments demonstrate that through only textual representations of mathematical parameters and values, language models are capable of very precise numerical regression, and if given the opportunity to train over multiple tasks, can significantly outperform traditional regression models.

OmniPred: 범용 회귀 분석기로서의 언어 모델

OmniPred: Language Models as Universal Regressors

초록

Support