cs.CV(2025-08-18)

📊 共 26 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱一:机器人控制 (Robot Control) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation 提出基于注视监督的多模态学习框架以提升胸部X光诊断与报告生成 contrastive learning multimodal
2 Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination 提出ARMed以解决医疗推理中的奖励崩溃问题 reinforcement learning large language model multimodal
3 Creative4U: MLLMs-based Advertising Creative Image Selector with Comparative Reasoning 提出Creative4U以解决广告创意图像选择的可解释性问题 reinforcement learning large language model multimodal
4 D2-Mamba: Dual-Scale Fusion and Dual-Path Scanning with SSMs for Shadow Removal 提出D2-Mamba以解决阴影去除问题 Mamba SSM
5 Matrix-game 2.0: An open-source real-time and streaming interactive world model 提出Matrix-Game 2.0以解决实时交互世界建模问题 world model distillation
6 CLoE: Curriculum Learning on Endoscopic Images for Robust MES Classification 提出CLoE框架以解决内窥镜图像MES分类中的标签噪声问题 curriculum learning
7 Multi-Level Knowledge Distillation and Dynamic Self-Supervised Learning for Continual Learning 提出多层次知识蒸馏与动态自监督学习以解决持续学习问题 distillation
8 Point upsampling networks for single-photon sensing 提出点上采样网络以解决单光子传感中的稀疏点云问题 Mamba state space model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
9 Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models 提出多模态连续思维链以解决多模态推理问题 multimodal chain-of-thought
10 Holistic Evaluation of Multimodal LLMs on Spatial Intelligence 提出EASI以全面评估多模态LLMs在空间智能上的表现 multimodal
11 Omni Survey for Multimodality Analysis in Visual Object Tracking 提出多模态视觉目标跟踪的全景调查以解决数据整合问题 multimodal
12 Multi-source Multimodal Progressive Domain Adaption for Audio-Visual Deception Detection 提出多源多模态渐进领域适应框架以解决音视频欺骗检测问题 multimodal
13 ViDA-UGC: Detailed Image Quality Analysis via Visual Distortion Assessment for UGC Images 提出ViDA-UGC以解决UGC图像质量评估不足问题 large language model multimodal chain-of-thought
14 Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection 提出DriftBench以解决GenAI驱动的新闻多样性对虚假信息检测的挑战 multimodal
15 DianJin-OCR-R1: Enhancing OCR Capabilities via a Reasoning-and-Tool Interleaved Vision-Language Model 提出DianJin-OCR-R1以解决OCR任务中的幻觉问题 large language model

🔬 支柱一:机器人控制 (Robot Control) (5 篇)

#题目一句话要点标签🔗
16 Foundation Model for Skeleton-Based Human Action Understanding 提出统一骨架基础模型以解决人类动作理解问题 humanoid humanoid robot representation learning
17 Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score 提出双对比去噪评分以解决文本到图像编辑问题 manipulation contrastive learning structure preservation
18 Precise Action-to-Video Generation Through Visual Action Prompts 提出视觉动作提示以解决动作到视频生成的精度与通用性问题 manipulation human-object interaction HOI
19 IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion 提出IGFuse以解决3D场景重建中的遮挡与覆盖问题 manipulation scene reconstruction
20 Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping 提出Odo以解决人形编辑中的形状保留问题 manipulation SMPL

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
21 InnerGS: Internal Scenes Rendering via Factorized 3D Gaussian Splatting 提出InnerGS以重建内部场景,解决传统方法的局限性 3D gaussian splatting 3DGS gaussian splatting
22 Quantifying and Alleviating Co-Adaptation in Sparse-View 3D Gaussian Splatting 提出新策略以缓解稀疏视图3D高斯点云的共适应问题 3D gaussian splatting 3DGS gaussian splatting
23 DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation 提出DMS以解决自监督深度估计中的视差模糊问题 depth estimation monocular depth
24 IntelliCap: Intelligent Guidance for Consistent View Sampling 提出IntelliCap以解决图像采集中的引导问题 3D gaussian splatting gaussian splatting splatting

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
25 EgoTwin: Dreaming Body and View in First Person 提出EgoTwin以解决第一人称视频生成与人体运动建模问题 motion generation egocentric first-person view

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
26 Compact Attention: Exploiting Structured Spatio-Temporal Sparsity for Fast Video Generation 提出紧凑注意力机制以加速视频生成 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页