組合創造力：泛化能力的新疆域

摘要

人工智慧（AI）系統，尤其是大型語言模型（LLMs），正日益被應用於科學創意生成等創造性任務中，這構成了從訓練數據中進行泛化的一種形式，而現有的概念框架尚未對此進行探討。儘管其與組合泛化（CG）有相似之處，但組合創造力（CC）是一種開放式的能力。我們提出了一個理論框架和算法任務，用以評估輸出的新穎性和實用性程度，而非對照固定目標來評估其準確性或正確性，這與CC的開放性本質相悖。基於此，我們做出了幾項重要的實證貢獻：（1）我們首次洞察了LLMs創造力的規模化行為。（2）我們發現，在固定的計算預算下，存在著模型深度和寬度對創造能力的最佳化配置。（3）我們發現，LLMs在生成新穎科學創意方面表現出色，但在確保其實際可行性方面卻存在困難，這一“構想-執行”差距可能源於創造力算法中更為基礎的新穎性與實用性之間的權衡特性。重要的是，這種權衡即使在規模化後依然存在，這對LLMs在當前形式下的長期創造潛力提出了質疑。總之，我們的概念框架與實證發現為理解和提升現代AI模型的創造力奠定了基礎，架起了人類智慧與機器智慧之間的橋樑。

English

Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, bridging the gap between human and machine intelligence.

組合創造力：泛化能力的新疆域

Combinatorial Creativity: A New Frontier in Generalization Abilities

摘要

Support