生成式人工智慧的悖論：「它所創造的，可能並不完全理解」

摘要

近來生成式人工智慧的浪潮引起了前所未有的全球關注，人們對於潛在超越專家人類能力的人工智慧水平感到興奮和擔憂：現在的模型只需幾秒鐘即可產生挑戰甚至超越專家人類能力的輸出。同時，這些模型仍然展示了基本的理解錯誤，這是即使在非專家人類身上也不會預期到的。這給我們帶來了一個明顯的悖論：我們如何調和看似超人類能力與少數人類會犯的錯誤之間的矛盾？在這項工作中，我們提出這種緊張關係反映了當今生成式模型中的智能配置與人類智能之間的分歧。具體而言，我們提出並測試生成式人工智慧悖論假說：生成式模型通過直接訓練以重現類似專家的輸出，獲得了不依賴於並且因此可能超越其理解這些類型輸出的能力。這與人類形成對比，對於人類來說，基本理解幾乎總是在能夠生成專家級輸出之前。我們通過對生成式模型在語言和圖像模式下的生成與理解進行對照實驗，來測試這一假說。我們的結果顯示，儘管模型在生成方面可以超越人類，但在理解能力方面始終遠遠不及人類，並且在生成和理解表現之間的相關性較弱，對對抗性輸入更加脆弱。我們的研究支持了模型的生成能力可能不依賴於理解能力的假說，並呼籲在將人工智慧類比於人類智能時要謹慎。

English

The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans. This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make? In this work, we posit that this tension reflects a divergence in the configuration of intelligence in today's generative models relative to intelligence in humans. Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs. This contrasts with humans, for whom basic understanding almost always precedes the ability to generate expert-level outputs. We test this hypothesis through controlled experiments analyzing generation vs. understanding in generative models, across both language and image modalities. Our results show that although models can outperform humans in generation, they consistently fall short of human capabilities in measures of understanding, as well as weaker correlation between generation and understanding performance, and more brittleness to adversarial inputs. Our findings support the hypothesis that models' generative capability may not be contingent upon understanding capability, and call for caution in interpreting artificial intelligence by analogy to human intelligence.

生成式人工智慧的悖論：「它所創造的，可能並不完全理解」

The Generative AI Paradox: "What It Can Create, It May Not Understand"

摘要

Support