Rectified Flows의 누출 지점: 보간 경로를 따라 멤버십 신호 특성화

초록

생성 모델이 훈련 데이터로부터 무엇을 보유하는지 이해하는 것은 여전히 어려운 과제로, 저작권 및 개인정보 보호에 중요한 의미를 갖는다. 모델은 훈련 데이터를 그대로 재생산하는 것 외에도, 출력물에는 나타나지 않으나 여전히 활용 가능한 미묘한 흔적을 인코딩할 수 있다. 우리는 배포된 생성 시스템에서 점점 더 많이 사용되는 정류 흐름(Rectified Flows)에 대해 이러한 체제를 연구한다. 정류 흐름 훈련을 정의하는 보간 경로 X_λ = (1-λ)X_0 + λX_1을 분석한다. λ에 대한 종 모양 곡선을 따르는 훈련 데이터와 테스트 데이터의 재구성 간에 차이가 존재하며, 이는 훈련 중에 축적되지만 검증 지표는 안정적으로 유지됨을 보인다. 이 신호는 최댓값을 가지며, 가우시안 가정 하에 그 위치를 폐쇄형으로 유도한다. 우리는 이러한 예측을 오디오와 이미지 모두에서 검증하고, 종 모양 구조가 보편적임을 보여주는 동시에, 가정이 충족될 때 최댓값 예측이 성립함을 확인한다. 개념 증명으로, 우리는 이러한 특정 λ-분해 구조를 활용하여 멤버십 추론 공격(Membership Inference Attack)을 수행, 훈련 집합의 구성원과 비구성원을 구분한다.

English

Understanding what generative models retain from training data remains challenging, with implications for copyright and privacy. Beyond verbatim reproduction, models can encode subtler traces of their training data that never surface in their outputs yet remain exploitable. We study this regime for Rectified Flows, which are increasingly used in deployed generative systems. We analyse the interpolation path X_λ= (1-λ)X_0 + λX_1 that defines the Rectified Flow training. We show that a gap exists between the reconstruction of train and test data that follows a bell-shaped curve over λ, wich accumulates during training, while the validation metrics remain stable. The signal has a maximum whose location we derive in closed form under Gaussian assumptions. We validate these predictions on both audio and images and show that the bell-shaped structure is universal, while the peak prediction holds when our assumptions are satisfied. As a proof of concept, we exploit this specific λ-resolved structure to perform a Membership Inference Attack, distinguishing members of the training set from non-members.