cs.CV(2025-06-23)
📊 共 6 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1)
支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | OmniGen2: Exploration to Advanced Multimodal Generation | 提出OmniGen2以解决多模态生成任务的统一问题 | multimodal | ✅ | |
| 2 | Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations | 提出多模态框架以统一视觉理解与生成 | large language model multimodal | ||
| 3 | CaughtCheating: Is Your MLLM a Good Cheating Detective? Exploring the Boundary of Visual Perception and Reasoning | 提出CaughtCheating以解决多模态大语言模型的视觉推理挑战 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | MCN-SLAM: Multi-Agent Collaborative Neural SLAM with Hybrid Implicit Neural Scene Representation | 提出MCN-SLAM以解决多代理协作SLAM中的通信带宽问题 | distillation visual SLAM NeRF | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning | 提出Drive-R1以解决视觉语言模型在自动驾驶中的推理与规划问题 | motion planning reinforcement learning |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions | 综述多模态方法以推进对话头生成技术 | NeRF neural radiance field | ✅ |