Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection

📄 arXiv: 2603.03939v1 📥 PDF

作者: Radia Daci, Vito Renò, Cosimo Patruno, Angelo Cardellicchio, Abdelmalik Taleb-Ahmed, Marco Leo, Cosimo Distante

分类: cs.CV, cs.AI

发布日期: 2026-03-04

🔗 代码/项目: GITHUB


💡 一句话要点

提出CMDR-IAD以解决多模态工业异常检测问题

🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 多模态异常检测 无监督学习 跨模态映射 双分支重建 工业视觉检测 鲁棒性 深度学习

📋 核心要点

  1. 现有的无监督多模态异常检测方法在处理噪声深度和缺失模态时表现不佳,限制了其鲁棒性。
  2. CMDR-IAD通过双向2D↔3D跨模态映射和双分支重建,独立捕捉纹理和几何信息,增强了异常检测的可靠性。
  3. 在MVTec 3D-AD基准上,CMDR-IAD达到了97.3%的图像级AUROC,显示出其在实际工业条件下的强大有效性。

📝 摘要(中文)

多模态工业异常检测通过整合RGB外观与3D表面几何信息获益,但现有的无监督方法通常依赖于记忆库、教师-学生架构或脆弱的融合方案,导致在噪声深度、弱纹理或缺失模态下的鲁棒性不足。本文提出CMDR-IAD,一个轻量级且模态灵活的无监督框架,能够在2D+3D多模态及单模态(仅2D或仅3D)设置中实现可靠的异常检测。CMDR-IAD结合了双向2D↔3D跨模态映射以建模外观与几何一致性,并通过双分支重建独立捕捉正常纹理和几何结构。两部分融合策略整合了这些线索,确保在深度稀疏或低纹理区域也能稳定精确地定位异常。

🔬 方法详解

问题定义:本文旨在解决多模态工业异常检测中的鲁棒性不足问题,现有方法在处理噪声深度、弱纹理或缺失模态时常常表现不佳。

核心思路:CMDR-IAD框架通过双向2D↔3D跨模态映射和双分支重建,分别捕捉外观和几何信息,从而增强异常检测的稳定性和准确性。

技术框架:该框架包括两个主要模块:跨模态映射模块和双分支重建模块。跨模态映射模块负责建模外观与几何的一致性,而双分支重建模块则独立捕捉正常纹理和几何结构。

关键创新:CMDR-IAD的创新在于其无须记忆库的设计,采用了可靠性门控映射和自适应权重重建策略,显著提升了异常检测的鲁棒性和精度。

关键设计:在设计中,采用了特定的损失函数来平衡外观和几何偏差,并通过自适应权重调整融合策略,以确保在不同模态下的有效性。具体的网络结构和参数设置在实验中进行了优化。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

CMDR-IAD在MVTec 3D-AD基准上实现了97.3%的图像级AUROC和99.6%的像素级AUROC,超越了现有的最先进方法。此外,在实际的聚氨酯切割数据集中,3D-only变体也达到了92.6%的图像级AUROC,展示了其在真实工业条件下的强大有效性。

🎯 应用场景

该研究在工业视觉检测领域具有广泛的应用潜力,尤其适用于需要高鲁棒性和准确性的场景,如制造业的质量控制和异常检测。CMDR-IAD的模态灵活性使其能够适应不同的工业环境,未来可能推动智能制造的发展。

📄 摘要(原文)

Multimodal industrial anomaly detection benefits from integrating RGB appearance with 3D surface geometry, yet existing \emph{unsupervised} approaches commonly rely on memory banks, teacher-student architectures, or fragile fusion schemes, limiting robustness under noisy depth, weak texture, or missing modalities. This paper introduces \textbf{CMDR-IAD}, a lightweight and modality-flexible unsupervised framework for reliable anomaly detection in 2D+3D multimodal as well as single-modality (2D-only or 3D-only) settings. \textbf{CMDR-IAD} combines bidirectional 2D$\leftrightarrow$3D cross-modal mapping to model appearance-geometry consistency with dual-branch reconstruction that independently captures normal texture and geometric structure. A two-part fusion strategy integrates these cues: a reliability-gated mapping anomaly highlights spatially consistent texture-geometry discrepancies, while a confidence-weighted reconstruction anomaly adaptively balances appearance and geometric deviations, yielding stable and precise anomaly localization even in depth-sparse or low-texture regions. On the MVTec 3D-AD benchmark, CMDR-IAD achieves state-of-the-art performance while operating without memory banks, reaching 97.3\% image-level AUROC (I-AUROC), 99.6\% pixel-level AUROC (P-AUROC), and 97.6\% AUPRO. On a real-world polyurethane cutting dataset, the 3D-only variant attains 92.6\% I-AUROC and 92.5\% P-AUROC, demonstrating strong effectiveness under practical industrial conditions. These results highlight the framework's robustness, modality flexibility, and the effectiveness of the proposed fusion strategies for industrial visual inspection. Our source code is available at https://github.com/ECGAI-Research/CMDR-IAD/