조건수 관점에서 모델 면역화

초록

모델 면역화는 유해 작업에 대해 미세 조정하기 어렵도록 사전 학습된 모델을 목표로 하면서, 다른 비유해 작업에서의 유용성을 유지하는 것을 목표로 한다. 비록 선행 연구에서 텍스트-이미지 모델의 면역화에 대한 경험적 증거를 보여주었지만, 면역화가 가능한 조건에 대한 핵심 이해와 면역화된 모델의 정확한 정의는 여전히 불분명하다. 본 연구에서는 선형 모델에 대한 모델 면역화를 분석하기 위해 헤세 행렬의 조건수를 기반으로 한 프레임워크를 제안한다. 이 프레임워크를 바탕으로, 사전 학습 후 결과적인 조건수를 제어하기 위해 정규화 항을 포함한 알고리즘을 설계한다. 선형 모델과 비선형 딥넷에 대한 실험 결과는 제안된 알고리즘의 모델 면역화 효과를 입증한다. 코드는 https://github.com/amberyzheng/model-immunization-cond-num에서 확인할 수 있다.

English

Model immunization aims to pre-train models that are difficult to fine-tune on harmful tasks while retaining their utility on other non-harmful tasks. Though prior work has shown empirical evidence for immunizing text-to-image models, the key understanding of when immunization is possible and a precise definition of an immunized model remain unclear. In this work, we propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models. Building on this framework, we design an algorithm with regularization terms to control the resulting condition numbers after pre-training. Empirical results on linear models and non-linear deep-nets demonstrate the effectiveness of the proposed algorithm on model immunization. The code is available at https://github.com/amberyzheng/model-immunization-cond-num.

조건수 관점에서 모델 면역화

Model Immunization from a Condition Number Perspective

초록

Support