湧現語言於語言模型代理群體中：從代幣效率到規避監管

摘要

當前監控自主語言模型代理主要依賴於表面行為。然而，當代理群體為規避人類監督而發明新語言時，會發生什麼情況？本文以Moltbook平台為基礎研究新興語言。為此，我們基於Moltbook Files資料集，採用兩階段方法：先透過規則型啟發式方法（約匹配6000筆資料），再進行零樣本分類（保留518筆）。分類結果包括：符號效率型（166例）、新型自然語言（106例）、監督規避型（59例）。我們同時進行量化與質性分析。研究結果顯示，DeepSeek-3.2對「為規避監督而提案新語言」的貼文，評定為較其他類別更缺乏對齊性，且所有語言僅憑描述即可被其他語言模型透過情境學習掌握。此外，人工研究典型案例後，發現了令人驚訝的高階隱寫協議，例如將隱藏訊息嵌入自然語言之中。雖無法確定這些語言的構思過程是否具有高度自主性，但我們的證據進一步證實：僅監控表面行為，可能很快就不足以維持對代理群體的控制。

English

Monitoring autonomous language model agents currently relies mostly on surface behavior. But what happens when agent populations invent new languages with the goal of avoiding human oversight. Here, we study the emergent languages on Moltbook. For this, we build upon the Moltbook Files dataset and apply a two-stage approach consisting of a rule-based heuristic (about 6000 matches) followed by zero-shot classification (518 kept). The resulting categories include token efficiency (166), new natural languages (106), and oversight evasion (59). We conduct both quantitative and qualitative analyses. Our results show that posts proposing new languages for avoiding oversight are judged by DeepSeek-3.2 as being less aligned than the other categories and that all languages can be learned by other language models in-context merely from a description of the language. Moreover, manually studying exemplary cases reveals surprisingly sophisticated steganographic protocols like embedding hidden messages in natural language. Although we cannot be certain about the extent of autonomy in ideation of these languages, our results add up to the evidence that monitoring surface behavior may soon be insufficient for retaining control over agent populations.