cs.CV(2026-04-10)

📊 共 12 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 Large-Scale Universal Defect Generation: Foundation Models and Datasets 提出UniDG:一个大规模通用缺陷生成模型,解决缺陷生成数据匮乏问题。 foundation model multimodal
2 Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection 提出ImageProtector,通过视觉提示注入防御多模态大语言模型分析图像 large language model
3 Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization Mosaic:多视角集成优化,提升针对闭源VLM的多模态越狱攻击 multimodal
4 PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos PinpointQA:室内视频中小物体空间理解数据集与基准 large language model multimodal
5 Arbitration Failure, Not Perceptual Blindness: How Vision-Language Models Resolve Visual-Linguistic Conflicts 视觉语言模型并非感知盲区,而是仲裁失败:探究视觉-语言冲突的解决机制 multimodal visual grounding
6 SiMing-Bench: Evaluating Procedural Correctness from Continuous Interactions in Clinical Skill Videos SiMing-Bench:评估临床技能视频中持续交互的过程正确性 large language model multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
7 Learning Vision-Language-Action World Models for Autonomous Driving 提出VLA-World模型,融合预测想象与反思推理,提升自动驾驶的预见性和安全性。 reinforcement learning world model world models
8 Visually-Guided Policy Optimization for Multimodal Reasoning 提出VGPO,增强视觉引导的多模态推理能力,解决视觉信息利用不足问题 reinforcement learning multimodal
9 PhysInOne: Visual Physics Learning and Reasoning in One Suite PhysInOne:构建大规模物理场景数据集,促进AI系统物理推理能力 world model world models embodied AI
10 VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning VL-Calibration:解耦视觉-语言大模型推理中的置信度校准 reinforcement learning multimodal visual grounding

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
11 HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing 提出HM-Bench,用于评估多模态大语言模型在高光谱遥感图像理解中的能力。 HSI large language model multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
12 Envisioning the Future, One Step at a Time 提出基于稀疏轨迹扩散模型的开放场景未来预测方法,实现高效且逼真的长时序模拟。 motion prediction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页