cs.CV(2025-09-09)

📊 共 22 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱八:物理动画 (Physics-based Animation) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 Visual Representation Alignment for Multimodal Large Language Models 提出VIRAL,通过视觉表征对齐提升多模态大模型在视觉任务上的性能 large language model foundation model multimodal
2 Two Stage Context Learning with Large Language Models for Multimodal Stance Detection on Climate Change 提出基于大语言模型的双阶段上下文学习框架,用于气候变化多模态立场检测。 large language model multimodal
3 GLEAM: Learning to Match and Explain in Cross-View Geo-Localization GLEAM:提出一种多视角地理定位框架,融合匹配与可解释推理。 large language model multimodal
4 CAViAR: Critic-Augmented Video Agentic Reasoning CAViAR:基于评论增强的视频Agent推理,提升复杂视频理解能力 large language model
5 Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images 提出Visual-TableQA,用于评估和提升视觉语言模型在表格图像上的推理能力。 multimodal
6 Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model 提出Point Linguist Model,通过桥接3D-语言大模型实现任意物体分割 large language model
7 XSRD-Net: EXplainable Stroke Relapse Detection XSRD-Net:用于可解释的中风复发检测,助力早期治疗规划 multimodal
8 Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation 揭示性别偏见基准测试中的虚假特征问题,并提出更可靠的评估方法。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
9 HairGS: Hair Strand Reconstruction based on 3D Gaussian Splatting HairGS:基于3D高斯溅射的头发丝重建方法 3D gaussian splatting 3DGS gaussian splatting
10 SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting SplatFill:提出深度引导的高斯溅射方法用于三维场景修复 3D gaussian splatting 3DGS gaussian splatting
11 Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning DiGS:通过直接SDF学习,从3D高斯模型中实现精确和完整的表面重建 3D gaussian splatting 3DGS gaussian splatting
12 MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery MCTED:一个为火星图像数字高程模型生成任务设计的机器学习数据集 depth estimation monocular depth Depth Anything
13 Dynamic Scene 3D Reconstruction of an Uncooperative Resident Space Object 针对非合作空间目标的动态场景三维重建,评估并优化现有算法性能。 scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
14 Tracing and Mitigating Hallucinations in Multimodal LLMs via Dynamic Attention Localization 提出D-LEAF以解决多模态LLM中的幻觉问题 DPO large language model multimodal
15 Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation 提出ToothMCL,用于CBCT和IOS多模态对比预训练,提升牙齿分割精度。 contrastive learning multimodal
16 SurgLaVi: Large-Scale Hierarchical Dataset for Surgical Vision-Language Representation Learning SurgLaVi:构建大规模手术视觉-语言分层数据集,用于手术视觉-语言表征学习 representation learning foundation model
17 Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Mini-o3:通过扩展推理模式和交互轮数,提升视觉搜索性能。 reinforcement learning multimodal
18 MVAT: Multi-View Aware Teacher for Weakly Supervised 3D Object Detection MVAT:多视角感知教师网络用于弱监督3D目标检测 teacher-student distillation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
19 EHWGesture -- A dataset for multimodal understanding of clinical gestures EHWGesture:用于临床手势多模态理解的数据集 spatiotemporal multimodal
20 APML: Adaptive Probabilistic Matching Loss for Robust 3D Point Cloud Reconstruction 提出自适应概率匹配损失以解决3D点云重建问题 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation OnePoseViaGen:结合单图3D生成与生成域随机化的一阶段6D位姿估计 manipulation domain randomization 6D pose estimation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
22 ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion ScoreHOI:提出基于Score引导扩散的物理可信人-物交互重建方法 physically plausible human-object interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页