cs.CV(2025-06-18)
📊 共 21 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱五:交互与反应 (Interaction & Reaction) (2)
支柱四:生成式动作 (Generative Motion) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories | 提出RA-NeRF以解决复杂轨迹下相机姿态估计问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 9 | BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion | 提出重建无关的在线框架以解决实时3D物体检测问题 | open-vocabulary open vocabulary embodied AI | ||
| 10 | RaCalNet: Radar Calibration Network for Sparse-Supervised Metric Depth Estimation | 提出RaCalNet以解决稀疏监督下的深度估计问题 | depth estimation monocular depth metric depth | ✅ | |
| 11 | MapFM: Foundation Model-Driven HD Mapping with Multi-Task Contextual Learning | 提出MapFM以解决高精度地图生成问题 | semantic map foundation model | ✅ | |
| 12 | Implicit 3D scene reconstruction using deep learning towards efficient collision understanding in autonomous driving | 提出基于深度学习的隐式3D场景重建方法以提升自动驾驶中的碰撞理解 | scene reconstruction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Show-o2: Improved Native Unified Multimodal Models | 提出Show-o2以提升多模态理解与生成能力 | flow matching multimodal | ✅ | |
| 14 | Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation | 提出弱监督部分对比学习以解决视觉语言导航中的动态视角问题 | contrastive learning embodied AI VLN | ||
| 15 | video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models | 提出video-SALMONN 2以解决视频描述与问答问题 | DPO large language model | ✅ | |
| 16 | FedWSIDD: Federated Whole Slide Image Classification via Dataset Distillation | 提出FedWSIDD以解决WSI分类中的隐私与资源异构问题 | predictive model distillation |
🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization | 提出HOIDiNi以解决人机交互生成中的真实感与物理准确性问题 | human-object interaction HOI | ||
| 18 | Privacy-Preserving Chest X-ray Classification in Latent Space with Homomorphically Encrypted Neural Inference | 提出同态加密神经推理框架以保护胸部X光图像隐私 | OMOMO |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects | 提出GenHOI以解决4D人机交互合成中的物体泛化问题 | motion synthesis contact-aware human-object interaction |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Unsupervised Pelage Pattern Unwrapping for Animal Re-identification | 提出几何感知纹理映射以解决动物重识别中的皮毛模式扭曲问题 | feature matching geometric consistency |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | FindingDory: A Benchmark to Evaluate Memory in Embodied Agents | 提出FindingDory基准以评估具身智能体的记忆能力 | manipulation |