cs.CV（2025-09-09）

📊 共 22 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗1) 支柱八：物理动画 (Physics-based Animation) (2 🔗1) 支柱一：机器人控制 (Robot Control) (1 🔗1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Visual Representation Alignment for Multimodal Large Language Models	提出VIRAL，通过视觉表征对齐提升多模态大模型在视觉任务上的性能	large language model foundation model multimodal
2	Two Stage Context Learning with Large Language Models for Multimodal Stance Detection on Climate Change	提出基于大语言模型的双阶段上下文学习框架，用于气候变化多模态立场检测。	large language model multimodal
3	GLEAM: Learning to Match and Explain in Cross-View Geo-Localization	GLEAM：提出一种多视角地理定位框架，融合匹配与可解释推理。	large language model multimodal	✅
4	CAViAR: Critic-Augmented Video Agentic Reasoning	CAViAR：基于评论增强的视频Agent推理，提升复杂视频理解能力	large language model
5	Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images	提出Visual-TableQA，用于评估和提升视觉语言模型在表格图像上的推理能力。	multimodal	✅
6	Point Linguist Model: Segment Any Object via Bridged Large 3D-Language Model	提出Point Linguist Model，通过桥接3D-语言大模型实现任意物体分割	large language model
7	XSRD-Net: EXplainable Stroke Relapse Detection	XSRD-Net：用于可解释的中风复发检测，助力早期治疗规划	multimodal
8	Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation	揭示性别偏见基准测试中的虚假特征问题，并提出更可靠的评估方法。	foundation model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
9	HairGS: Hair Strand Reconstruction based on 3D Gaussian Splatting	HairGS：基于3D高斯溅射的头发丝重建方法	3D gaussian splatting 3DGS gaussian splatting	✅
10	SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting	SplatFill：提出深度引导的高斯溅射方法用于三维场景修复	3D gaussian splatting 3DGS gaussian splatting
11	Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning	DiGS：通过直接SDF学习，从3D高斯模型中实现精确和完整的表面重建	3D gaussian splatting 3DGS gaussian splatting
12	MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery	MCTED：一个为火星图像数字高程模型生成任务设计的机器学习数据集	depth estimation monocular depth Depth Anything
13	Dynamic Scene 3D Reconstruction of an Uncooperative Resident Space Object	针对非合作空间目标的动态场景三维重建，评估并优化现有算法性能。	scene reconstruction

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Tracing and Mitigating Hallucinations in Multimodal LLMs via Dynamic Attention Localization	提出D-LEAF以解决多模态LLM中的幻觉问题	DPO large language model multimodal
15	Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation	提出ToothMCL，用于CBCT和IOS多模态对比预训练，提升牙齿分割精度。	contrastive learning multimodal
16	SurgLaVi: Large-Scale Hierarchical Dataset for Surgical Vision-Language Representation Learning	SurgLaVi：构建大规模手术视觉-语言分层数据集，用于手术视觉-语言表征学习	representation learning foundation model
17	Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search	Mini-o3：通过扩展推理模式和交互轮数，提升视觉搜索性能。	reinforcement learning multimodal
18	MVAT: Multi-View Aware Teacher for Weakly Supervised 3D Object Detection	MVAT：多视角感知教师网络用于弱监督3D目标检测	teacher-student distillation	✅

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	EHWGesture -- A dataset for multimodal understanding of clinical gestures	EHWGesture：用于临床手势多模态理解的数据集	spatiotemporal multimodal
20	APML: Adaptive Probabilistic Matching Loss for Robust 3D Point Cloud Reconstruction	提出自适应概率匹配损失以解决3D点云重建问题	spatiotemporal	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation	OnePoseViaGen：结合单图3D生成与生成域随机化的一阶段6D位姿估计	manipulation domain randomization 6D pose estimation	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion	ScoreHOI：提出基于Score引导扩散的物理可信人-物交互重建方法	physically plausible human-object interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页