工業視覺模擬至真實中的先驗可用性：CAD引導與CAD不可用模式之綜述

摘要

工業視覺從模擬到現實（sim-to-real）常被描述為從合成影像轉移至真實影像，但實際工業部署通常涉及可用證據與所需決策之間更廣泛的落差。系統可能基於CAD渲染圖、模擬RGB-D觀測、正常參考影像、合成缺陷、預訓練特徵空間或語言提示建構，卻需在相異的感測器、光源、材質、夾具、校正、生產變異及罕見缺陷模式下部署。本文獻回顧將工業視覺的模擬到現實問題重新框架為一個由先驗可用性所組織的域差距問題。我們區分出三種情境：CAD可用情境，其中明確的物體幾何可支援渲染、校正、姿態估計、分割及測試時幾何驗證；CAD不可用情境，其中幾何被正常參考外觀、特徵分佈、教師-學生殘差、合成異常假設、基礎特徵或視覺-語言先驗所取代；以及邊界先驗情境，其中近似模型、模板、參考視角或語義對應僅保留部分CAD角色。此框架將基於CAD的檢測與6D姿態估計文獻，以及通常被分開回顧的工業異常與表面檢測文獻加以連結。為使分類具體化，我們使用T-LESS/BOP、MVTec AD與VisA上的經驗錨點。這些錨點顯示，單靠CAD渲染數量並不足以促成轉移；源域分佈設計、檢測器容量以及少量真實校正可能更為關鍵。它們也顯示，CAD在測試時透過遮罩、姿態與深度一致性創造出獨立的驗證通道，而無CAD的檢測則依賴於校準的正常性與特徵偏差。因此，本回顧反對單一的跨任務排行榜，而是提問何種先驗奠定了部署決策的基礎。

English

Industrial visual sim-to-real is often described as transferring from synthetic images to real images, but industrial deployment usually involves a broader mismatch between available evidence and required decisions. A system may be built from CAD renderings, simulated RGB-D observations, normal reference images, synthetic defects, pretrained feature spaces, or language prompts, yet deployed under different sensors, lighting, materials, fixtures, calibration, production variation, and rare defect modes. This review reframes industrial visual sim-to-real as a domain-gap problem organized by prior availability. We distinguish CAD-available settings, where explicit object geometry can support rendering, calibration, pose estimation, segmentation, and test-time geometric verification; CAD-unavailable settings, where geometry is replaced by normal-reference appearance, feature distributions, teacher-student residuals, synthetic anomaly assumptions, foundation features, or vision-language priors; and boundary-prior settings, where approximate models, templates, reference views, or semantic correspondences preserve only part of the CAD role. This framing connects CAD-based detection and 6D pose-estimation literature with industrial anomaly and surface-inspection literature that is usually reviewed separately. To make the taxonomy concrete, we use empirical anchors on T-LESS/BOP, MVTec AD, and VisA. The anchors show that CAD render count alone does not close transfer; source-distribution design, detector capacity, and small real calibration can matter more. They also show that CAD at test time creates a distinct verification channel through mask, pose, and depth consistency, whereas CAD-unavailable inspection relies on calibrated normality and feature deviation. The review therefore argues against a single cross-task leaderboard and instead asks what prior grounds the deployment decision.