GaussianGPT：迈向自回归式3D高斯场景生成

摘要

当前三维生成建模领域的最新进展主要基于扩散模型或流匹配框架。我们另辟蹊径探索了一种完全自回归的替代方案，提出基于Transformer的GaussianGPT模型。该模型通过下一标记预测直接生成三维高斯分布，从而实现完整三维场景的生成。我们首先采用带向量量化的稀疏三维卷积自编码器将高斯图元压缩为离散潜空间网格，随后对生成的标记进行序列化处理，并利用具有三维旋转位置编码的因果Transformer进行建模，实现空间结构与外观特征的序列化生成。与基于扩散模型整体优化场景的方法不同，我们的框架通过逐步构建场景，天然支持场景补全、外延绘制、基于温度参数的可控采样以及灵活的生成范围设定。这种建模方式既发挥了自回归模型组合归纳偏置与可扩展性的优势，又兼容现代神经渲染流程的显式表征，将自回归Transformer定位为可控且上下文感知的三维生成范式的补充方案。

English

Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled using a causal transformer with 3D rotary positional embedding, enabling sequential generation of spatial structure and appearance. Unlike diffusion-based methods that refine scenes holistically, our formulation constructs scenes step-by-step, naturally supporting completion, outpainting, controllable sampling via temperature, and flexible generation horizons. This formulation leverages the compositional inductive biases and scalability of autoregressive modeling while operating on explicit representations compatible with modern neural rendering pipelines, positioning autoregressive transformers as a complementary paradigm for controllable and context-aware 3D generation.

GaussianGPT：迈向自回归式3D高斯场景生成

GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

摘要

Support