组合式创造力：泛化能力的新疆域

摘要

人工智能（AI）系统，尤其是大型语言模型（LLMs），正越来越多地被应用于诸如科学创意生成等创造性任务中，这构成了一种现有概念框架尚未涉及的对训练数据的泛化形式。尽管与组合泛化（CG）有相似之处，组合创造力（CC）却是一种开放性的能力。不同于针对固定目标评估准确性或正确性——这与CC的开放性本质相悖——我们提出了一种理论框架和算法任务，通过输出的新颖性和实用性程度来评估它们。基于此，我们做出了几项重要的实证贡献：（1）我们首次洞察了LLMs创造力随规模扩展的行为。（2）我们发现，在固定的计算预算下，存在最优的模型深度和宽度以发挥创造力。（3）我们观察到，LLMs在生成新颖科学创意方面表现出色，但在确保其实际可行性方面却面临挑战，这一“构思-执行”差距可能源于创造力算法中更为基础的新颖性与实用性之间的权衡。重要的是，这种权衡即使在规模扩大时依然存在，对LLMs当前形态下的长期创造力潜力提出了质疑。综合来看，我们的概念框架和实证发现为理解和提升现代AI模型的创造力奠定了基础，弥合了人类与机器智能之间的鸿沟。

English

Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, bridging the gap between human and machine intelligence.

组合式创造力：泛化能力的新疆域

Combinatorial Creativity: A New Frontier in Generalization Abilities

摘要

Support