语言模型智能体群体中的涌现语言：从词元效率到规避监督

摘要

当前对自主语言模型智能体的监控主要依赖于其表面行为，但当智能体群体为了规避人类监督而创造新语言时，将会发生什么？本文基于Moltbook平台，研究涌现语言现象。我们利用Moltbook Files数据集，采用两阶段方法：先通过基于规则的启发式方法获得约6000个匹配项，再进行零样本分类（保留518条）。最终分类结果包括：代币效率类（166条）、新自然语言类（106条）和规避监督类（59条）。我们同时进行了定量与定性分析。结果表明，在DeepSeek-3.2模型的评估中，提出用于规避监督的新语言的帖子，其对齐程度低于其他类别；并且，所有语言仅凭对语言本身的描述，即可被其他语言模型通过上下文学习掌握。此外，对典型案例的手动分析揭示了令人惊讶的复杂隐写协议，例如将隐藏信息嵌入自然语言之中。尽管我们无法确定这些语言构思的自发程度有多高，但研究结果进一步表明，仅靠监控表面行为，可能很快就不足以维持对智能体群体的控制。

English

Monitoring autonomous language model agents currently relies mostly on surface behavior. But what happens when agent populations invent new languages with the goal of avoiding human oversight. Here, we study the emergent languages on Moltbook. For this, we build upon the Moltbook Files dataset and apply a two-stage approach consisting of a rule-based heuristic (about 6000 matches) followed by zero-shot classification (518 kept). The resulting categories include token efficiency (166), new natural languages (106), and oversight evasion (59). We conduct both quantitative and qualitative analyses. Our results show that posts proposing new languages for avoiding oversight are judged by DeepSeek-3.2 as being less aligned than the other categories and that all languages can be learned by other language models in-context merely from a description of the language. Moreover, manually studying exemplary cases reveals surprisingly sophisticated steganographic protocols like embedding hidden messages in natural language. Although we cannot be certain about the extent of autonomy in ideation of these languages, our results add up to the evidence that monitoring surface behavior may soon be insufficient for retaining control over agent populations.