대규모 시스템의 성능 예측을 위한 텍스트-텍스트 회귀 분석

초록

다양한 산업 분야에서 대규모 시스템의 지표 결과를 예측하는 것은 전통적인 표 형식 회귀 분석에 크게 의존하는 근본적인 문제입니다. 그러나 이러한 방법들은 구성 파일이나 시스템 로그와 같은 복잡한 시스템 데이터를 다룰 때 특징 공학이 실현 불가능한 경우가 많아 어려움을 겪습니다. 본 연구에서는 일반적이고 확장 가능한 대안으로 텍스트-텍스트 회귀 분석을 제안합니다. Google의 대규모 컴퓨팅 클러스터 스케줄링 시스템인 Borg에서 자원 효율성을 예측하기 위해 무작위 초기화로부터 학습된 60M 파라미터의 인코더-디코더 모델은 전체 시스템에 걸쳐 거의 완벽에 가까운 0.99(평균 0.9)의 순위 상관관계를 달성했으며, 표 형식 접근법보다 100배 낮은 평균 제곱 오차(MSE)를 보였습니다. 또한 이 모델은 단 500개의 소수 샘플 예제로도 새로운 작업에 쉽게 적응할 수 있으며, 복잡한 결과 분포의 밀도를 효과적으로 포착합니다. 제거 연구(ablation study)를 통해 인코더 사용, 시퀀스 길이 증가, 그리고 모델의 내재적 불확실성 정량화의 중요성이 강조되었습니다. 이러한 연구 결과는 현실 세계 결과에 대한 보편적 시뮬레이터 개발의 길을 열어줍니다.

English

In many industries, predicting metric outcomes of large systems is a fundamental problem, driven largely by traditional tabular regression. However, such methods struggle on complex systems data in the wild such as configuration files or system logs, where feature engineering is often infeasible. We propose text-to-text regression as a general, scalable alternative. For predicting resource efficiency on Borg, Google's massive compute cluster scheduling system, a 60M parameter encoder-decoder, trained from random initialization, achieves up to a near perfect 0.99 (0.9 average) rank correlation across the entire fleet, and 100x lower MSE than tabular approaches. The model also easily adapts to new tasks in only 500 few-shot examples and captures the densities of complex outcome distributions. Ablation studies highlight the importance of using encoders, increasing sequence length, and the model's inherent uncertainty quantification. These findings pave the way for universal simulators of real-world outcomes.

대규모 시스템의 성능 예측을 위한 텍스트-텍스트 회귀 분석

Performance Prediction for Large Systems via Text-to-Text Regression

초록

Support