cs.CV(2025-09-09)
📊 共 22 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗1)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
支柱一:机器人控制 (Robot Control) (1 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | HairGS: Hair Strand Reconstruction based on 3D Gaussian Splatting | HairGS:基于3D高斯溅射的头发丝重建方法 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 10 | SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting | SplatFill:提出深度引导的高斯溅射方法用于三维场景修复 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 11 | Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning | DiGS:通过直接SDF学习,从3D高斯模型中实现精确和完整的表面重建 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 12 | MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery | MCTED:一个为火星图像数字高程模型生成任务设计的机器学习数据集 | depth estimation monocular depth Depth Anything | ||
| 13 | Dynamic Scene 3D Reconstruction of an Uncooperative Resident Space Object | 针对非合作空间目标的动态场景三维重建,评估并优化现有算法性能。 | scene reconstruction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Tracing and Mitigating Hallucinations in Multimodal LLMs via Dynamic Attention Localization | 提出D-LEAF以解决多模态LLM中的幻觉问题 | DPO large language model multimodal | ||
| 15 | Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation | 提出ToothMCL,用于CBCT和IOS多模态对比预训练,提升牙齿分割精度。 | contrastive learning multimodal | ||
| 16 | SurgLaVi: Large-Scale Hierarchical Dataset for Surgical Vision-Language Representation Learning | SurgLaVi:构建大规模手术视觉-语言分层数据集,用于手术视觉-语言表征学习 | representation learning foundation model | ||
| 17 | Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search | Mini-o3:通过扩展推理模式和交互轮数,提升视觉搜索性能。 | reinforcement learning multimodal | ||
| 18 | MVAT: Multi-View Aware Teacher for Weakly Supervised 3D Object Detection | MVAT:多视角感知教师网络用于弱监督3D目标检测 | teacher-student distillation | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | EHWGesture -- A dataset for multimodal understanding of clinical gestures | EHWGesture:用于临床手势多模态理解的数据集 | spatiotemporal multimodal | ||
| 20 | APML: Adaptive Probabilistic Matching Loss for Robust 3D Point Cloud Reconstruction | 提出自适应概率匹配损失以解决3D点云重建问题 | spatiotemporal | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation | OnePoseViaGen:结合单图3D生成与生成域随机化的一阶段6D位姿估计 | manipulation domain randomization 6D pose estimation | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion | ScoreHOI:提出基于Score引导扩散的物理可信人-物交互重建方法 | physically plausible human-object interaction |